A CTP — publicly available, contents-may-settle, pre-beta preview — of the aforementioned Roslyn C#/VB structurally open compilers has been released.
So far, syntax trees, flow analysis and beyond-compilation reasoning such as “what type or thing would this be if I placed it into this location” seems to be almost fully available, although code generation is still missing in chunks. Comes with some interesting documentation. The project overview is so far a real page-turner and it’s interesting how they handle whitespace/comments round-tripping and partial compile error recovery and reporting.
It’s exciting to see Microsoft take the step of repaving Visual Studio’s tooling on top of this. Here’s hoping they start a trend.
The documentation is, to put it gently, “present”, but the language spec may be the most thought-out document.
I’m spending much of my day nose-deep in C#, the language that initially got a bad reputation for looking a lot like Java and that still lives by that image, despite all the things that have happened in the interim. Each new version since has had a feature with a marked lasting impact (maybe except for C# 4′s
dynamic). The feature for C# 5 was telegraphed in advance to be
async for bringing a sane asynchrony model to .NET and making continuation-passing style look like sequential code. With this in mind, I had thought that even bigger things were in store at the recent BUILD conference.
Imagine my surprise when Anders Hejlsberg took the stage and went deeper into the project codenamed Roslyn than ever before, and announced that a first CTP, Microsoft-speak for limited commitment-free public alpha, is upon us (mid-October). Better yet, Anders stopped short of saying that the source will eventually be available.
Roslyn has been discussed since back in 2008 under the murky premise of “opening up the C# compiler while rewriting it in C#”. The new information was that Roslyn opens up nearly everything at every stage in the compiler to be a public, supported API, and that Microsoft is retooling Visual Studio to consume only these APIs in a future version (beyond the coming Visual Studio 11/2012).
C# is going to go overnight from not having
eval to essentially having something much more interesting. You can’t execute dynamically generated code willy-nilly as in most dynamically interpreted/compiled languages, but you can grab the compiler’s opinion on and analysis of any piece of code. You can get syntax trees, you can figure out binding and find out which types are involved, you can get information about where the breakpoints can go in a piece of code, you can get the IL and you can actually compile and emit the code in situ.
In addition, there are two more pieces to this. First, there’s an API model for writing dynamic read-eval-print loops where lines of code can be interpreted in a script-like fashion and kept intact as you go; running code, importing namespaces, referencing new assemblies, redefining methods or properties and so on. This would otherwise have to be reinvented for every such implementation and is a nice nod.
Secondly, there’s an object model for the language, something showcased by being able to translate between C# and VB (for which equivalent work is being done) and writing smart, custom refactorings, which was previously only in the hands of people who were already basically writing their own C# compiler, like ReSharper or CodeRush.
The Mono effort has laudably been supporting a C# REPL for years now, but they haven’t gone all out and figured out all of this. (They have diverted their resources to other efforts, and I think it was the right decision for Mono users, but I just wanted to highlight that their REPL and Roslyn are not equal in scope.)
When Roslyn is said and done, its team is going to have set a new bar for statically-compiled languages. C# and VB.NET will out of the box be able to do all of this. The onus must now be on the rest of the statically-compiled languages to provide the same thing, or risk being passed by, at least in programmer satisfaction.
Update: I mentioned that “Anders stopped short of saying that the source will eventually be available”. In the Channel 9 Live segment with Anders Hejlsberg from BUILD, at 25:50 in, there’s this: “It’s my hope that we can share as much of the Roslyn project with the community as is at all possible. For sure we’ll share the APIs. I’d love to share the source as well so people can do their own things with it, like understand how the compiler works and build extensions to it and experiment with new stuff in the language. The more people are doing that then the better we’re all going to do, you know, with the community.”
What this sounds like isn’t just “we’re going to drop a zip somewhere every few months” (although I’m sure they won’t develop everything in a public repository), but “we’re building this to encourage your participation and help in evolving the language and using the language outside of where it’s been, and I’d love to outright promise source to you except for the occurrence of lawyers”, which is a rather fantastic attitude.
The world can be divided into those who will attempt to divide the world into parts and those who will not. I am a recovering into-parts-divider.
Programming languages, it was thought, could be divided into roughly three buckets: Assembly language, C and C++; compiled, stiff managed languages like C# and Java; and the rest of them. To begin with, this is largely bullshit. Every language has a tag cloud of characteristics which expands over time. But I would like to use these buckets, under the names native, managed and hippie, to make a point.
Read a word of the academic scripture on performance, and the first thing to be asserted is that the native languages are the only way to write “performant” software. (Why otherwise healthy individuals use the word “performant” is beyond me and a topic for another, and entire, day.) Read a word of the pitchfork response to those claims and you will learn that performance can be used as an absolute indicator, but that the important thing is the relative indicator of “fast enough”-ness. Listen to Herb Sutter’s propaganda — it is actually mostly not propaganda — and you will learn that we will eventually run out of speed and that we’re wasting resources anyway and why do that.
The second most maddening thing is that every one of these points are valid. The most maddening thing is that they are valid today, but from all you could tell, it’s the way it’s going to be forever.
We need a new language. We need a whole new class of languages, but we need one to take charge. We need a managed language with native smarts. We need a language which remains memory-safe and the appropriate subset of type-safe, which shuns pointers as an approach to optimization and invents something better. We need a language whose first two rules are 1a) don’t run any slower than C most of the time and 1b) don’t be less productive than any other language.
A language that instead of fulfilling every wet wish settles for a limited number of strong recommendations of proven methods, like blocks/closures in out-of-thread or inside-event-loop queues for asynchrony and concurrency; integers that upgrade to bignums and decimal arithmetic instead of double unless you explicitly jump through a syntactical hoop. A language that contains the best sort of garbage collection available, the best sort of automatic reference counting available to cut down on most or all of the GC’s uses (or a strong opinion on weak references to disable garbage collection) and the option to write any isolated chunk with manual memory management for the subset of complete raving lunatics who are educated.
A language that does let you drop down into inline assembly, but which allows for so solid implementations that if you’re not doing it to support a new SIMD or fused multiply-add operation, you’re most of the time on a fool’s errand.
We also need someone with clout, competence and support to commission, design and build this language, hopefully in the open. This is the future of programming. Not “native” languages or even “managed” languages. Thanks to everyone’s need for backwards compatibility and fear of clean breaks, a new language is the only way we’re going to get there. When I hope for Apple to do xlang, wait for C# 5.0 or see the news about Dart, this is at the back of my mind.
This could be fun.
Okay, so Joe Duffy is always “on concurrency”. Even more so today at InfoQ though.
The major shift we face will be that mainstream languages will start to incorporate more concurrency-safety — immutability and isolation — and the platform libraries and architectures will better support this style of software decomposition. OOP developers are accustomed to partitioning their program into classes and objects; now they will need to become accustomed to partitioning their program into asynchronous Actors that can run concurrently. Within this sea of asynchrony will lay ordinary imperative code, frequently augmented with fine-grained task and data parallelism.
[..] It’s important not to fall into the trap of believing in a single silver bullet, however. Message passing is not a panacea; frequently the best path to scalability is data parallelism. Functional programming and immutability is not a panacea; if you’re replacing your C algorithms and need to compete performance-wise, you very well might need to use one of those familiar and efficient mutable data structures from your favorite algorithms book. We won’t throw out decades’ worth of research, although we will need to evolve the right parts of our programs in the right ways.
Straight outta TechEd: Microsoft to release VB6 [language] sources on Codeplex.
Update: And Microsoft has now said that it is not going to release VB6 sources on Codeplex and to please not believe everything you read.
It’s a funny road going from plain programming to multi-threading with deadlocks and lock hierarchies and synchronized data access to parallelism and attempts to break off parts of the program that was “safe” to concurrency and more directly trying to tackle concurrent data access in a sane way to… asynchrony? Packetization?
I know that the above is not a straight line that people follow on their way to getting a grip on feeding many CPU cores optimally. I know that the terms I’m using are overlapping and that I’m abusing them to describe one thing. But that’s the way I went about it and the hot keyword for me during every phase. This is not a set of instructions, it’s recollection of my own long and winding road to getting a grip on the problem. And I should mention that I thought I had a grip on the problem every step along the way. I probably don’t yet, but it makes for a good story of progression.
At first, I was completely ignorant.
Then, I found out about the theory about threads and how it’d make things better, and fell into the trap of thinking that for two things to happen simultaneously (for some value), they’d need to happen on different threads. This is a fallacy for the same reason that a restaurant with two guests in it works fine even with one waiter. It took a while for me to realize this. Even after that, I changed my approach to use the same data structures but to lock everything, because if you didn’t, bad things would happen. Like everyone else, my first steps were more akin to encoding a relay race, but lock granularity took care of some of this.
The next stage was the siren song of parallelization. I learned to “fan out” into parallel processes when appropriate and to separate my various jobs into objects which would otherwise make running many of the jobs simultaneously ugly. (The objects thing didn’t at all have to do with parallelization, and was just a nice thing that I should have done earlier.)
After that came the allure of concurrent data access. One of my first forays into threads involved pumping a queue manually. Concurrent data access is about either going to immutable data (a fine choice where you can make it) or working cannily with data structures, either by properly managing access to them or with lock-free data structures that have nice atomic operations. (If you’re working with a lock-free data structure that’s got the same interface as its ordinary counterpart, stop. Find the atomic operations and restrict yourself, or find a new data structure with a real set of atomic operations.)
Tending to your own queue is rough work. If you go with keeping locks, you’ll either risk dead-locking a thread pool or keeping a precious, resource-hungry thread for every one of those queues. I recommend lock-free data structures, because they can always give you nearly-instant answers. Whoops, I couldn’t snag an object right now; at least you can go do something else. The queue was busy in internal bookkeeping, but I’ll make sure the object gets in there eventually; go do something else.
It turns out that “go do something else” is a powerful concept. I’d like to propose the idea that the extent to which a system lets you or itself “go do something else” is one of the few reliable indicators of a good design.
The way the Internet itself handles the massive amounts of traffics is through breaking them into small packets which are easy and fast to handle. It’s hard work writing a router OS, but it’d be even harder work if it wasn’t for packets. If it wasn’t for packets, the tubes would clog. Something big would get in the way of something small. There is prioritization, but that doesn’t starve the processing of small packets. Even the worst case with prioritization is far better than the best case without packets.
So it is that my latest change has been to keep an eye out for asynchrony, in all its forms. Asynchrony, combined with packetization, changes your program from being a collection of big things and small things to being very many very small things. There’s a small price to pay to manage the very many things, but throughput increases. Even in places where you thought parallelism was impossible and concurrency not needed, you probably do something that takes a lot of time that you can cut into stages. By doing that, you’re making it harder for things to happen out of order, and making it easier for the CPU to spread the work to other cores.
As long as you try to limit yourself to local state that you pass along, you don’t have to invent a conceptual map for the flow of your program. You make your program define its own flow. I like this, not just on the level of poetic admiration but because it makes me more productive.
That’s why, when people insist on starting out by teaching threads, or defining everything in terms of threads first, I turn skeptical. Really? For me, it’s not important how many threads there are and how many cores there are. It’s extremely important to the execution of my program that someone has sorted out all the engine code behind the event loops and thread pools. But it’s also extremely important that someone has worked out windows, controls, fonts, text layout, rendering, networking, process management, data structures, math libraries, etc., and you see very few people jumping at the chance to remake them for every new project, or demand that you know everything to have the privilege of using the fruit of that particular concept.
I believe that you should try always to get an insight into the workings of something you depend on; to know on some plane how the sausage is made. But requiring a deep insight or even authorship? That doesn’t scale. Knowing too much could hurt you, too, if you confine your uses to scenarios that are good fits rather than make your code better. Then when someone improves the engine (and this is the good part, only someone has to, not everyone), you wouldn’t get the new benefits.
So what does all of this mean? Maybe that “solving” multi-threading, parallelism, concurrency, multi-core and asynchrony would be easier if everyone kept in mind that no one can make a pencil. If your whole program locks up, that’s bad in any case. Make your program bug-free in this regard, and then start juicing the power at your disposal safely, without turning your program upside-down or inside-out.
The key to conquering this problem, riddled with real factors like cache lines and CPU cores, is to work out the abstract parts, the overriding characteristics. If you can do asynchrony and avoid blocking, I recommend it. If you can do lock-free data structures, I recommend it. If you can defer requiring the value of something after you’ve ostensibly calculated it, I recommend it. If you can pass along data instead of making everything go to the same place to look for it, I recommend it.
By all means, don’t walk the path I walked and don’t let the story in this post determine your nomenclature. Despite what everyone’s saying, as long as you widen your view, it is okay not to worry about it. The only losing move is to entrench yourself in yesterday’s ceremony.
Nearly every time I see something I like, it comes back to composability.
When I can use class methods in Objective-C and Ruby, it’s because the classes themselves have the same capability for virtual dispatch that objects do. When I can mess around with an AST of a language construct, or even redefine the same language as it’s being written, it’s because of reification, which lets concepts be objects.
When I’m happy that I can use ranges or decimal-representation numbers or regular expressions as literals, it’s because they are so obviously one type of built-in object that they deserve their own sort of literals on the same level that strings, arrays and numbers do.
When you take the leap from letting concepts be either hidden inside someone else’s code or opaque and turn them into objects, suddenly you can describe bigger structures or concepts. You can reason about them, arrange them, schedule them, make them aware of each other.
I’m not worried about the state of computer science per se, but the biggest breakthrough that you’ll have personally within the next ten years will all be directly attributable to composability, or the way in which libraries, styles of programming and programming languages change to enable it.
If I go back and look at all the programming languages I’ve ever used, there’s been latent functionality that I didn’t know how to use at the time I was learning it and spent my time primarily using it.
Visual Basic had objects — it wasn’t fully object-oriented, and I didn’t know about “polymorphism” until much later, but it did have objects. Perl (5) had
map. PHP had– well, okay. C#, back in 2.0, had delegates. Ruby had good metaprogramming capabilities.
These are not objective measurements, but it happens to all of us. We see a concept flourish in a new language, maybe because of particular fitness, maybe because of culture, and your eyes are slowly opened to it. Then you look around and find it, or some variation of it, where you least expect it. Sometimes, you take up using the concept in the languages you thought you already knew.
Sometimes the eye-openers really do require a niche language to manifest itself in. Newspeak is an example of this. In Newspeak, every name is bound dynamically and there is no global scope. You cart around a “platform” object with the capabilities you need, and to constrain something, you just call it with a lesser platform object. I read about Newspeak and it introduced me to the idea of capability-based programming using objects and their messages themselves taken to the extreme.
(Newspeak is a Smalltalk-based language with syntax that’d make a logician cry tears of what I guess must be logic. Gilad Bracha sees the many contradictions and uphill battles with Newspeak and is intent on surmounting them, while still being able to conceive of more visions. For the dozens of people that must be using Newspeak right now, I hope it succeeds.)
The problem with coming to terms with eye-openers is that they are hidden in plain sight. You begin turning every rock, jumping around flailing your arms wildly, trying to catch the silent revolution in your own peripheral vision, as if it was shrouded by an SEP field. And sometimes, you have exhausted the vocabulary of your tool, but you won’t know.
What did you last learn that was very useful and turned out to have been hidden in plain sight?