With Many Cheerful Facts About the Point of the Hypothesis

On the inaugural post, my belief in the Sapir-Whorf hypothesis was challenged, and it likened to something of being this domain’s “intelligent design”; easily repeatable, poorly provable and not fundamentally true. I’ll freely admit that I have no idea whether the Sapir-Whorf hypothesis is true, nor what it reads in full. But let me give you an example of why I think something like it applies in a real sense in programming from day to day.

Witness the modest LINQ statement in some C# code:

var userDetails = from user in ctx.Users
                  where user.Name.StartsWith("Je") 
                  select new {
                      UserName = user.Name,
                      UserID = user.ID,
                      UpdatedDate = user.UpdatedDate
                  };

LINQ was introduced in C# 3.0. Let’s go back to C# 2.0 and try to write this. Because the query comprehension syntax came later, let’s reformat this into pure calls instead.

var userDetails = ctx.Users.
                  Where(user => user.Name.StartsWith("Je")).
                  Select(user => new {
                      UserName = user.Name,
                      UserID = user.ID,
                      UpdatedDate = user.UpdatedDate
                  });

That should work, right? Well, no. There were no anonymous types in C# 2.0 either. Let’s make that into a real class.

/** somewhere else **/
class TransientDetail {
    private readonly string _userName;
    private readonly Guid /* or long */ _userID;
    private readonly DateTime _updatedDate;
    TransientDetail(string userName, Guid userID,
                    DateTime updatedDate) {
        // argument validation elided
        _userName = userName;
        _userID = userID;
        _updatedDate = updatedDate;
    }

    public string UserName { get { return _userName; } }
    public Guid UserID { get { return _userID; } }
    public DateTime UpdatedDate { get { return _updatedDate; } }
}

TransientDetail userDetails = ctx.Users.
                  Where(user => user.Name.StartsWith("Je")).
                  Select(user => new TransientDetail(
                      user.Name, user.ID, user.UpdatedDate
                  ));

Now it’s starting to look a bit better. But we’re using lambda syntax (complete with type inference), and that’s C# 3.0 material. Anonymous methods did exist in C# 2.0, so let’s use that instead.

TransientDetail userDetails = ctx.Users.
                  Where(delegate(User user) {
                      return user.Name.StartsWith("Je")
                  }).
                  Select(delegate(User user) { 
                      return new TransientDetail(
                          user.Name, user.ID, user.UpdatedDate
                      )
                  });

We’re almost there. Except that .Where and .Select aren’t available on the Users table directly; they’re extension methods, which are static methods that can appear as patched-on instance methods. We can still find these methods somewhere and call them directly.

TransientDetail userDetails = EnumerableExtensions.Select(
                      EnumerableExtensions.Where(ctx.Users, 
                          delegate(User user) {
                              return user.Name.StartsWith("Je")
                          }
                      ), delegate(User user) { 
                          return new TransientDetail(
                              user.Name, user.ID,
                              user.UpdatedDate
                          )
                      }
                  );

There! Clear as mud, right?

Yeah. The equivalent code is possible to write in C# 2.0 too. It too is Turing-complete. Theoretically, you could emit the same IL through some other mechanism. But at the heart of the hypothesis, or at least the message I’ve always taken away from it, is that if it takes contortion to say something, it will be less said. Sure enough, practically no one wrote code like that. (For one thing, you have to build a pyramid of nested invocations instead of “fluently” dotting into further methods.)

Wait a minute, though. This looks like a database! Database tools like NHibernate have offered a “fluent” query API wherein you dot into further methods for years, even before C# 3.0. And that’s right, and they had to invent special types to hold the query in itself to be able to do that. They had to rephrase their code in the form of C# 2.0.

And there’s one more ace up my sleeve. That last code you saw isn’t actually the truth either. ctx isn’t just an object, it’s some sort of LINQ entity context (“the database”, basically). So one of the externally under-appreciated powers of LINQ comes into effect. What actually gets generated in C# 3.0 for the original query is an object conforming to IQueryable<that anonymous type we used>.

Everything that has been invoked on it, through opting in, has been handed to it not in delegate/closure form, but in the form of an expression tree, as data. The database layer, whichever we’re using, is actually looking through the code and translating it into a database query when it runs. (And before anyone runs to comment, you’re able to do this in every system of sufficient reification, including many older than C# 3.0.)

But the further point is that even fewer people, in C# 2.0 or 3.0, want to write out these expression trees by hand. The compiler helps out and does it automatically. You could do all of it in C# 2.0, but I am afraid to attempt to write this code by hand for fear of getting it wrong. There wasn’t any set of expression tree classes around for that then, so you’d have to invent all of those too. Once more, you had to rephrase the problem in the form of C# 2.0 – not impossible, but a bit of an uphill battle.

What this all boils down to is that in human languages, you can describe anything. They arrived at where they are by thousands of years of evolution. That’s the only way they’ve grown. Programming languages are much like code programmed themselves; nothing is there unless someone put it there. You’ll have to formulate everything new in terms of what’s already there. And sure enough, you can do that. Many people find great pleasure in doing so. We are not absolutely locked in.

But take LINQ back to a C programmer in the early 1990s and she wouldn’t know what to make of it. She had most likely heard of objects and methods and knew that they weren’t in C (and given how C++ looked at the time, probably also wanted it that way), but there are layers upon layers upon layers that build the abstractions that we depend on each day, and different layers depend on different capabilities, and you have to have the right sort of capabilities around to build anything.

Thinking one hour, two hours, one work day ahead, it is not effective to concentrate on what’s theoretically possible if you assemble new abstractions. It is easier to stay in the world you know and you’ll get a lot more done. But maybe you’ll get even more done over the long term if you had access to some better tooling or more capable environment.

Leave a Reply

Your email address will not be published. Required fields are marked *