Composability

Nearly every time I see something I like, it comes back to composability.

When I can use class methods in Objective-C and Ruby, it’s because the classes themselves have the same capability for virtual dispatch that objects do. When I can mess around with an AST of a language construct, or even redefine the same language as it’s being written, it’s because of reification, which lets concepts be objects.

When I’m happy that I can use ranges or decimal-representation numbers or regular expressions as literals, it’s because they are so obviously one type of built-in object that they deserve their own sort of literals on the same level that strings, arrays and numbers do.

When you take the leap from letting concepts be either hidden inside someone else’s code or opaque and turn them into objects, suddenly you can describe bigger structures or concepts. You can reason about them, arrange them, schedule them, make them aware of each other.

I’m not worried about the state of computer science per se, but the biggest breakthrough that you’ll have personally within the next ten years will all be directly attributable to composability, or the way in which libraries, styles of programming and programming languages change to enable it.

Hidden in Plain Sight

If I go back and look at all the programming languages I’ve ever used, there’s been latent functionality that I didn’t know how to use at the time I was learning it and spent my time primarily using it.

Visual Basic had objects — it wasn’t fully object-oriented, and I didn’t know about “polymorphism” until much later, but it did have objects. Perl (5) had map. PHP had– well, okay. C#, back in 2.0, had delegates. Ruby had good metaprogramming capabilities.

These are not objective measurements, but it happens to all of us. We see a concept flourish in a new language, maybe because of particular fitness, maybe because of culture, and your eyes are slowly opened to it. Then you look around and find it, or some variation of it, where you least expect it. Sometimes, you take up using the concept in the languages you thought you already knew.

Sometimes the eye-openers really do require a niche language to manifest itself in. Newspeak is an example of this. In Newspeak, every name is bound dynamically and there is no global scope. You cart around a “platform” object with the capabilities you need, and to constrain something, you just call it with a lesser platform object. I read about Newspeak and it introduced me to the idea of capability-based programming using objects and their messages themselves taken to the extreme.

(Newspeak is a Smalltalk-based language with syntax that’d make a logician cry tears of what I guess must be logic. Gilad Bracha sees the many contradictions and uphill battles with Newspeak and is intent on surmounting them, while still being able to conceive of more visions. For the dozens of people that must be using Newspeak right now, I hope it succeeds.)

Other times, the language itself is the eye-opener. I’m fond of saying that JavaScript before Gmail was a runtime for “DHTML”, which was a way in which analog watches could follow your mouse cursor and fail to block right-clicking. There are so many contradictions with JavaScript — the rough first specification and implementation with its prevalent drawbacks, the choice of a purist language with a traditional syntax and scent, the hijacking of useful functionality by completely useless moving divs, the rediscovering of that language a few years back by the programmers that hadn’t read the specification and its continued integration into other environments beyond the browser. JavaScript is one of the world’s most spread programming languages and completely avoids the common practices in a range of decisions, from prototype-based object inheritance to accidentally non-transitive double-equals equality.

The problem with coming to terms with eye-openers is that they are hidden in plain sight. You begin turning every rock, jumping around flailing your arms wildly, trying to catch the silent revolution in your own peripheral vision, as if it was shrouded by an SEP field. And sometimes, you have exhausted the vocabulary of your tool, but you won’t know.

What did you last learn that was very useful and turned out to have been hidden in plain sight?

Grammars in Perl 6

I don’t intend to make it a habit to post solely link posts here, but I felt I had to make an exception for today’s Perl 6 Advent Calendar post. (I don’t intend to limit myself to Perl 6 either, by the way, it’s just easy during the month of December.)

It’s about grammars in Perl 6, which are both groups of regexes and actually for all intents and purposes similar to parser grammars. People claimed that regular expressions were inadequate to solve parsing needs but just easy and approachable enough to make them more attractive than just writing a parser, and they were right. Perl 6 now does a fine job of attempting to solve this by rebuilding regular expressions into actual parsers, and making parsing easy by being a first-class citizen of the language.

How first-class? As shown by the final gift under last year’s tree, Perl 6’s own syntax is enshrined in a Perl 6 grammar. If it can parse Perl (if Perl 6 and not Perl 5), it’s probably good enough for most things you want to throw at it.

Transactional Data Versioning

One of my goals for setting up stmts was to provide an avenue to talk about the reasons I every now and again feel like building my own language. These reasons tend to boil down to ideas and concepts, and I often find myself talking about them with my friend Justin White, whose impressive work I will link to later. (I once said “I didn’t think of you, but I thought of particle effects, which is close enough”, which he wore as a status message for the better part of the following week.)

Just last week, we were talking about the difficulty of working with shared data during concurrency. Software Transactional Memory models tend to use a brute force “keep retrying” which feels unnerving, even if in a situation where actual transactionality (like account balances) is paramount, the access would instead be serialized and the state never shared as such. It comes down to an equation where you choose to delay your execution by queuing or locking (and get to perform the operation with the fresh data) or work with data that might go stale as you go.

Functional languages, and increasingly the rest of the world, solve this by treating everything as values. Every change is a new value; you never change an object, you build a new one with your modifications applied. So Justin had an idea: what if objects were still “just” objects, but their mutability were batched into versions? (I’ll call this model TDV for Transactional Data Versioning because that’s what I named my short-lived Tamagotchi.)

You’d enter a scope with an object and, through undisclosed methods, the current state of the object is stamped as the “previous” state. Any mutation to the object is recorded on-top and remembered as the “next” state. If, when exiting the scope successfully, the object has any mutations, the “next” state becomes the current state. (We didn’t go into implementation details, but this last part is trivially doable by atomically exchanging the states.)

{ // TDV initiated somehow
    foo.x = 42;
    foo.z = 9;
}
foo.x == 42; // true, but not true from another thread
// before line two in that block above

At the outset, this seemed like a cool enough idea. The concept of object identity could be retained, it’d be more obvious that code could fondle old data and it’d be okay for it to do that because you wouldn’t mistakenly receive “half an update”, and so on. But the more we dug, the more issues started popping up.

So if you write to the object and then read back that field, do you get the version in progress?

{ // TDV initiated somehow
    foo.x = 42;
    42 == foo.x; // ???
}

You might say yes. Well, what happens on deeper scope, or method calls?

assert(foo.x == 98);
{ // TDV initiated somehow
    foo.x = 42;
    var y = foo.x; // is y 42 (next') or 98 (prev)?
    { // TDV initiated somehow, again
        var i = foo.x; // is i 42 (next') or 98 (prev)?
        foo.x = 37;
        var j = foo.x; // is j 37 (next''), 42 (next') or 98 (prev)?
    }

    // at this point, if the previous block was a method call,
    // you could not rely on anything before the call being
    // retained.

    var k = foo.x; // is k 37 (next'' having been committed),
    // 98 (prev) or 42 (next')?
}
var q = foo.x; // is q 37 (next'' due to being the latest assignment)
// or 42 (next' due to being the latest scope to close)?

It turns out that you’d end up with an unmanageable tree of new versions, and it’s unclear when they split off and what they base the data on. We toyed with an explicit construct — advance foo — to introduce a commit point. But what if you advanced and then begun editing in a new nested block; the block would certainly make its copy from the newly minted version, so would the new version be visible from other threads too? If you made three changes in succession, what would the previous version refer to? (The previous version remains notable because if you reject the changes, that’s what they will revert to.) Will they coalesce into one big new version?

Maybe this sounds academic, but it’ll make sense in a real way: Justin was heavily on the side of allowing every read to return the result from the previous version, and every write to affect the future value, figuring that this would best preserve the semantics of the system, which is true. But it’s also true that doing so and then calling into a method will leave the method clueless about any possible changes that it should help prepare the new version for.

Let’s say the object was about a person with a set number of hours and an hourly rate, and the method was supposed to precalculate the pay for a week’s work. If the new values don’t carry over, you can’t precalculate the new pay, you can only recalculate the old pay. This is a textbook case of why you’d want to use methods — for code reuse.

You could solve this by extracting the new values into local variables which are then passed to the method or closed over by a closure and manipulated at will, and then setting them at the end. (The local variables are also naturally unsuitable for falling prey to the parallelism bugs that prompted the idea in the first place.)

In the end, we fiddled some more and ended up putting away this idea for later. Every time I go back to look at this, I feel the onset of brain freeze. We don’t solve every problem, but we do have fun trying.

Maybe there’s a reason this doesn’t really exist. Or maybe it does exist with everything already worked out, but as a library or in an obscure dutch paper. What do you think about the approach to the problem? Have you seen anything like it? What kind of conclusions did those people come to?

With Many Cheerful Facts About the Point of the Hypothesis

On the inaugural post, my belief in the Sapir-Whorf hypothesis was challenged, and it likened to something of being this domain’s “intelligent design”; easily repeatable, poorly provable and not fundamentally true. I’ll freely admit that I have no idea whether the Sapir-Whorf hypothesis is true, nor what it reads in full. But let me give you an example of why I think something like it applies in a real sense in programming from day to day.

Witness the modest LINQ statement in some C# code:

var userDetails = from user in ctx.Users
                  where user.Name.StartsWith("Je") 
                  select new {
                      UserName = user.Name,
                      UserID = user.ID,
                      UpdatedDate = user.UpdatedDate
                  };

LINQ was introduced in C# 3.0. Let’s go back to C# 2.0 and try to write this. Because the query comprehension syntax came later, let’s reformat this into pure calls instead.

var userDetails = ctx.Users.
                  Where(user => user.Name.StartsWith("Je")).
                  Select(user => new {
                      UserName = user.Name,
                      UserID = user.ID,
                      UpdatedDate = user.UpdatedDate
                  });

That should work, right? Well, no. There were no anonymous types in C# 2.0 either. Let’s make that into a real class.

/** somewhere else **/
class TransientDetail {
    private readonly string _userName;
    private readonly Guid /* or long */ _userID;
    private readonly DateTime _updatedDate;
    TransientDetail(string userName, Guid userID,
                    DateTime updatedDate) {
        // argument validation elided
        _userName = userName;
        _userID = userID;
        _updatedDate = updatedDate;
    }

    public string UserName { get { return _userName; } }
    public Guid UserID { get { return _userID; } }
    public DateTime UpdatedDate { get { return _updatedDate; } }
}

TransientDetail userDetails = ctx.Users.
                  Where(user => user.Name.StartsWith("Je")).
                  Select(user => new TransientDetail(
                      user.Name, user.ID, user.UpdatedDate
                  ));

Now it’s starting to look a bit better. But we’re using lambda syntax (complete with type inference), and that’s C# 3.0 material. Anonymous methods did exist in C# 2.0, so let’s use that instead.

TransientDetail userDetails = ctx.Users.
                  Where(delegate(User user) {
                      return user.Name.StartsWith("Je")
                  }).
                  Select(delegate(User user) { 
                      return new TransientDetail(
                          user.Name, user.ID, user.UpdatedDate
                      )
                  });

We’re almost there. Except that .Where and .Select aren’t available on the Users table directly; they’re extension methods, which are static methods that can appear as patched-on instance methods. We can still find these methods somewhere and call them directly.

TransientDetail userDetails = EnumerableExtensions.Select(
                      EnumerableExtensions.Where(ctx.Users, 
                          delegate(User user) {
                              return user.Name.StartsWith("Je")
                          }
                      ), delegate(User user) { 
                          return new TransientDetail(
                              user.Name, user.ID,
                              user.UpdatedDate
                          )
                      }
                  );

There! Clear as mud, right?

Yeah. The equivalent code is possible to write in C# 2.0 too. It too is Turing-complete. Theoretically, you could emit the same IL through some other mechanism. But at the heart of the hypothesis, or at least the message I’ve always taken away from it, is that if it takes contortion to say something, it will be less said. Sure enough, practically no one wrote code like that. (For one thing, you have to build a pyramid of nested invocations instead of “fluently” dotting into further methods.)

Wait a minute, though. This looks like a database! Database tools like NHibernate have offered a “fluent” query API wherein you dot into further methods for years, even before C# 3.0. And that’s right, and they had to invent special types to hold the query in itself to be able to do that. They had to rephrase their code in the form of C# 2.0.

And there’s one more ace up my sleeve. That last code you saw isn’t actually the truth either. ctx isn’t just an object, it’s some sort of LINQ entity context (“the database”, basically). So one of the externally under-appreciated powers of LINQ comes into effect. What actually gets generated in C# 3.0 for the original query is an object conforming to IQueryable<that anonymous type we used>.

Everything that has been invoked on it, through opting in, has been handed to it not in delegate/closure form, but in the form of an expression tree, as data. The database layer, whichever we’re using, is actually looking through the code and translating it into a database query when it runs. (And before anyone runs to comment, you’re able to do this in every system of sufficient reification, including many older than C# 3.0.)

But the further point is that even fewer people, in C# 2.0 or 3.0, want to write out these expression trees by hand. The compiler helps out and does it automatically. You could do all of it in C# 2.0, but I am afraid to attempt to write this code by hand for fear of getting it wrong. There wasn’t any set of expression tree classes around for that then, so you’d have to invent all of those too. Once more, you had to rephrase the problem in the form of C# 2.0 – not impossible, but a bit of an uphill battle.

What this all boils down to is that in human languages, you can describe anything. They arrived at where they are by thousands of years of evolution. That’s the only way they’ve grown. Programming languages are much like code programmed themselves; nothing is there unless someone put it there. You’ll have to formulate everything new in terms of what’s already there. And sure enough, you can do that. Many people find great pleasure in doing so. We are not absolutely locked in.

But take LINQ back to a C programmer in the early 1990s and she wouldn’t know what to make of it. She had most likely heard of objects and methods and knew that they weren’t in C (and given how C++ looked at the time, probably also wanted it that way), but there are layers upon layers upon layers that build the abstractions that we depend on each day, and different layers depend on different capabilities, and you have to have the right sort of capabilities around to build anything.

Thinking one hour, two hours, one work day ahead, it is not effective to concentrate on what’s theoretically possible if you assemble new abstractions. It is easier to stay in the world you know and you’ll get a lot more done. But maybe you’ll get even more done over the long term if you had access to some better tooling or more capable environment.

The Grand Compromise

If you’ve been following this year’s Perl 6 advent calendar, you may be noticing a sort of theme compared to last year’s: it’s now significantly more about minutia, whereas the 2009 edition focused on all the things Perl 6 gave you. I know I did.

At first I was aghast. My first instinct with Perl 6 has been to show people the new cool stuff in it, to justify the long wait we’ve had to endure, as if to say “it was worth it”, but mostly to affirm that I wasn’t foolish in believing that something would eventually materialize. The new priorities fly in the face of these efforts.

But something started to dawn on me over the course of several days. Much of the informed ridicule of Perl 6 mirrors the informed ridicule of Perl 5: it is too much its own creature, unlike that which we know, for the benefit of what exactly? But where the Perl 5 questions were sensible questions about needing to explicitly mention references (or be able to nest data structures) or the tendency to add a syntax for every feature (form formatting and the magic $a and $b in sorter subs spring to mind), Perl 6 questions are overwhelmingly about grandeur and unification. There’s a type system, unless you wish it away; You can redefine the language grammar; What’s the deal with variable sigils now that there are types, and what’s up with junctions anyway?

Corners of Perl 6, even those seemingly furthest away from the hoopla with lazy lists and making references standard, shine with brilliance in how they attempt to invent pinned-down concepts that stay consistent across the entire language and scale from soup to nuts, solving different problems along the way. The problem is that this takes a lot of time and effort to try to nail down adequately.

The thing that will help Perl 6 success more than anything right now isn’t blindly promoting adoption, but to dedicate resources to working through these concepts. To prove that it can all work out together, and that even if you don’t recognize or worry about the complexity of the whole language, it is coherent, does the right thing and is backed by intentional and specified concepts. For everything about list access sigils and variable scopes, that might be the biggest change from Perl 5.

Please Turn Off Your Pseudocode Jokes

Before this weblog begins proper, a brief announcement of an eternal, unchanging strategy.

Programming jokes based on emulating code are not ever funny. They chafe at the mind of their target audience the way the question whether a conjoined twin has “bonded” with its fellow twin must be. Not only are they focused on the form instead of the meaning, they are tired and predictable and don’t bring along insights nor laughs. To they extent they even qualify as humor, they are the “em, er, uh” of their art form — awkward filler to fill a less awkward silence.

So, in short, there won’t be any here, ever. There may be puns based on keywords — you try to catch me doing that though — but there won’t be any pseudocode.

Hello, World!\n

Can you ever start a weblog about programming languages without naming the first post “Hello, World!”? I am uncertain. It may lex fine, but it is grounds for disciplinary action if you treat warnings as errors. I hope you enjoyed that last sentence, because that’s what this is going to be.

Within days of starting out with programming, I became interested in programming languages themselves. Not just as crude tools that enable the act of programming, but as utensils that shape it. They are all different. They all have various characteristics. They all take the first opportunity to divide the world into a thousand aspects of which there’s a spectrum of possibilities, and more often than not, take off towards one extreme or the other.

The Sapir-Whorf hypothesis teaches that the extent of what is possible to express also forms the extent of what can be practically considered. It’s not enough that programming by itself is the language by which we tell the device what it should do; the language shapes the way in which we do it. It seems like there would be a big interest in trying to expand this extent, and history bears this out.

It is interesting being a programmer right now. Even if Alan Kay is right and nothing new has truly come out of computing since the 70’s, it doesn’t feel like it. Tethered as we may largely be by C and its historical legacy[1], there’s a hunger to take off in the opposite direction — a tension just as strong as that between, say, statically and dynamically typed languages.

So rather than try to take the time to distill the best features into my own toy language, I thought I’d start to write them down here instead. I already keep a weblog in the form of waffle, and sometimes weblog posts just appear in my mind. Maybe if I could train myself to write stmts posts as often and well as waffle posts, I will be able to connect the dots further.

[1]: By legacy, I mean the dictionary definition of “things that have been and shaped what was to come”, not the prevalent passive-aggressive industry speak for “things we have actually gotten around to deprecating, or which belong to the technological platform of our competitor, and which are therefore clearly unusable”. “Historical legacy” is a tautology, but it was meant to convey a wider perspective than the other usage.