On LINQ, Standards, Databases and Fruit
Around ten years ago, development started on X# at Microsoft Research, later renamed Cω (C-omega). Cω attempted to unify data stores like XML documents and databases with the C# language.
Cω eventually became the LINQ project, standing for Language Integrated Query. Below only .NET itself, the new Metro-style/WinRT infrastructure and some early Internet work, it may be the broadest-reaching endeavor to ever be undertaken within Microsoft. LINQ became a way to write SQL-like queries, translated by a series of paper-thin syntactic transformations to extension methods/”query operators”, library methods or your own for producing or manipulating a sequence. LINQ on its lonesome forced VB.NET and C# to adopt a slew of features to make sure that queries could be written due to the relatively meager standards of version 2 of those languages.
But it didn’t end at letting C# programmers stop hiding their head in their lap whenever they were asked why Map was missing from a standard library with thousands of classes. Part of the supporting infrastructure was a way to turn the lambdas used for implementing the higher-order function/”query operator” input into expression trees, a language neutral enough sort of syntax tree, and the concept of a query provider. A query provider could, given that the expression tree versions of the queries were used, recompile or reinterpret the query into another query format. Most commonly, this is used to translate the query to various SQL dialects, but there are also query providers for XML and LDAP.
The first SQL query provider came from LINQ to SQL, a trailblazing ORM originally meant as a proof of concept and which only worked for Microsoft SQL Server. It took two years for Microsoft to deliver Entity Framework, “EF”, the real ORM which also works for every database with the correct “provider”, and one more major version for Entity Framework to grow into its role. EF is now updated a lot more frequently and slowly pleases more and more developers.
Which brings us to today. LINQ won’t do much for key-value databases that just model a dictionary (even if, and especially if, smart such databases, like Redis, take the opportunity to model useful operations on the values like set sorting or atomic increments). And while it works for traditional SQL databases, it hasn’t eliminated the difference between objects and database concepts, it has in a sense highlighted the differences that exist even more. I may not be shuttling data on or off DataTables, but I am more aware than ever about what’s happening where. I have to be slower about adopting new features in my favorite database to make sure the entire chain supports it, and some databases might not have providers or might have buggy providers. In some ways, LINQ and Entity Framework have driven me into the uncanny valley of database connectivity.
This isn’t purely a problem of using “Microsoft’s solutions”. To a large degree, it’s the cost of picking this sort of approach. Traveling less audaciously might be calmer and more consistent. But the box has been opened and the bobcat is clawing wildly at the portrait of gramps on the mantlepiece. You won’t get the painting back, but you’re going to anger a whole lot of bob-er, developers if you try to put them back in their boxes. LINQ-like principles have been able to be replicated in Scala, gleefully (to some) without having to add language features. Rails’ ActiveRecord, since a while back, uses an SQL builder approach of constructing the query and translating it into SQL at the last minute for the best performance. The principle is sound, and although it’s not perfect, it also hasn’t been carried to its logical conclusion.
What’s left seems like low-hanging fruit. I have heard nearly nothing about it, but I’ll be damned if twenty people aren’t sitting around working on it somewhere right now: There needs to be an object/node/graph database built explicitly on LINQ-like principles and in a way as to optimize for LINQ-based usage patterns. The data conforms to object patterns instead of its own designs. The kind of queries you send it informs a dynamic query analyzer, rebuilding and maintaining indexes for optimal access. You don’t have to worry, really, about primary keys, but more about references to other things.
Yeah, it’ll take a new approach and we’ll have to toss what we have. Good. It doesn’t look to me as if we’re in the best state we could be, nor as if there’s been a ton of recent research in this direction. There have been object databases, but only to the point where there’s barely a new one every couple of years and they ironically don’t smell of being open to fresh approaches. And Microsoft Research (again) is working on Trinity, a distributed graph database, but it’s not externally available and seems disproportionally involved with serialization to and from binary messages (and that’s ignoring the references to the cast of Friends in the manual).
Fundamentally, this isn’t that involved with LINQ. But LINQ is a nice handle for what this new database should be – assuming code access instead of database operator access, eliminating the divide between the database and the accessing system and finally giving us a new set of tradeoffs for a new decade.