Skip to content

Packetization

by Jesper on February 20th, 2011

It’s a funny road going from plain programming to multi-threading with deadlocks and lock hierarchies and synchronized data access to parallelism and attempts to break off parts of the program that was “safe” to concurrency and more directly trying to tackle concurrent data access in a sane way to… asynchrony? Packetization?

I know that the above is not a straight line that people follow on their way to getting a grip on feeding many CPU cores optimally. I know that the terms I’m using are overlapping and that I’m abusing them to describe one thing. But that’s the way I went about it and the hot keyword for me during every phase. This is not a set of instructions, it’s recollection of my own long and winding road to getting a grip on the problem. And I should mention that I thought I had a grip on the problem every step along the way. I probably don’t yet, but it makes for a good story of progression.

At first, I was completely ignorant.

Then, I found out about the theory about threads and how it’d make things better, and fell into the trap of thinking that for two things to happen simultaneously (for some value), they’d need to happen on different threads. This is a fallacy for the same reason that a restaurant with two guests in it works fine even with one waiter. It took a while for me to realize this. Even after that, I changed my approach to use the same data structures but to lock everything, because if you didn’t, bad things would happen. Like everyone else, my first steps were more akin to encoding a relay race, but lock granularity took care of some of this.

The next stage was the siren song of parallelization. I learned to “fan out” into parallel processes when appropriate and to separate my various jobs into objects which would otherwise make running many of the jobs simultaneously ugly. (The objects thing didn’t at all have to do with parallelization, and was just a nice thing that I should have done earlier.)

After that came the allure of concurrent data access. One of my first forays into threads involved pumping a queue manually. Concurrent data access is about either going to immutable data (a fine choice where you can make it) or working cannily with data structures, either by properly managing access to them or with lock-free data structures that have nice atomic operations. (If you’re working with a lock-free data structure that’s got the same interface as its ordinary counterpart, stop. Find the atomic operations and restrict yourself, or find a new data structure with a real set of atomic operations.)

Tending to your own queue is rough work. If you go with keeping locks, you’ll either risk dead-locking a thread pool or keeping a precious, resource-hungry thread for every one of those queues. I recommend lock-free data structures, because they can always give you nearly-instant answers. Whoops, I couldn’t snag an object right now; at least you can go do something else. The queue was busy in internal bookkeeping, but I’ll make sure the object gets in there eventually; go do something else.

It turns out that “go do something else” is a powerful concept. I’d like to propose the idea that the extent to which a system lets you or itself “go do something else” is one of the few reliable indicators of a good design.

The way the Internet itself handles the massive amounts of traffics is through breaking them into small packets which are easy and fast to handle. It’s hard work writing a router OS, but it’d be even harder work if it wasn’t for packets. If it wasn’t for packets, the tubes would clog. Something big would get in the way of something small. There is prioritization, but that doesn’t starve the processing of small packets. Even the worst case with prioritization is far better than the best case without packets.

So it is that my latest change has been to keep an eye out for asynchrony, in all its forms. Asynchrony, combined with packetization, changes your program from being a collection of big things and small things to being very many very small things. There’s a small price to pay to manage the very many things, but throughput increases. Even in places where you thought parallelism was impossible and concurrency not needed, you probably do something that takes a lot of time that you can cut into stages. By doing that, you’re making it harder for things to happen out of order, and making it easier for the CPU to spread the work to other cores.

As long as you try to limit yourself to local state that you pass along, you don’t have to invent a conceptual map for the flow of your program. You make your program define its own flow. I like this, not just on the level of poetic admiration but because it makes me more productive.

That’s why, when people insist on starting out by teaching threads, or defining everything in terms of threads first, I turn skeptical. Really? For me, it’s not important how many threads there are and how many cores there are. It’s extremely important to the execution of my program that someone has sorted out all the engine code behind the event loops and thread pools. But it’s also extremely important that someone has worked out windows, controls, fonts, text layout, rendering, networking, process management, data structures, math libraries, etc., and you see very few people jumping at the chance to remake them for every new project, or demand that you know everything to have the privilege of using the fruit of that particular concept.

I believe that you should try always to get an insight into the workings of something you depend on; to know on some plane how the sausage is made. But requiring a deep insight or even authorship? That doesn’t scale. Knowing too much could hurt you, too, if you confine your uses to scenarios that are good fits rather than make your code better. Then when someone improves the engine (and this is the good part, only someone has to, not everyone), you wouldn’t get the new benefits.

So what does all of this mean? Maybe that “solving” multi-threading, parallelism, concurrency, multi-core and asynchrony would be easier if everyone kept in mind that no one can make a pencil. If your whole program locks up, that’s bad in any case. Make your program bug-free in this regard, and then start juicing the power at your disposal safely, without turning your program upside-down or inside-out.

The key to conquering this problem, riddled with real factors like cache lines and CPU cores, is to work out the abstract parts, the overriding characteristics. If you can do asynchrony and avoid blocking, I recommend it. If you can do lock-free data structures, I recommend it. If you can defer requiring the value of something after you’ve ostensibly calculated it, I recommend it. If you can pass along data instead of making everything go to the same place to look for it, I recommend it.

By all means, don’t walk the path I walked and don’t let the story in this post determine your nomenclature. Despite what everyone’s saying, as long as you widen your view, it is okay not to worry about it. The only losing move is to entrench yourself in yesterday’s ceremony.

From → Uncategorized

No comments yet

Leave a Reply

Note: HTML/Markdown is allowed. Your email address will never be published.

Subscribe to this comment feed via RSS