Impacts of multi-core processing on programming language design
Within the next couple of years, all modern desktop computers will have multiple CPUs, thanks to multi-core packaging. This doesn't matter very much in the context of programming languages though until the number of CPUs crosses a certain threshold, say, 4-8 CPUs. There are a few reasons for this, including:
- 2 CPUs aren't a very compelling motivation for the significant extra development effort required for multi-threaded programming.
- There are numerous hackish programming solutions that work reasonably well when targeting sub-4-CPU systems, which reduces the pressure to develop general solutions.
- General solutions have some inherent overhead that substantially eats into the performance gains. Depending on the programming model, general solutions need on the order of 4-8 CPUs before the overhead is an acceptable tradeoff for the improved scalability.
Researchers have long since developed the raw technology necessary to see us through this transition, but current tools are woefully lacking. Let me briefly describe the shortcomings of some of the tools we currently have available.
- C/C++ pthreads: C/C++ are dangerous, difficult languages when it comes to writing reliable software, leaving alone multi-threading. The single-image programming model that pthreads provides adds insult to injury. Even experts have a very difficult time writing reliable multi-threaded C/C++ programs. On top of this, development aids such as debuggers are of limited use here, because of the Heisenbug Principle.
- Java, C#: These languages improve on the C/C++ situation by providing language-level threading support, and the runtime environments improve analysis and debugging prospects. For the next few years, this is probably the best we're going to do, but the single-image programming model is rather difficult to deal with, even under the best of circumstances. I think there will always be a place for languages like these, but that as the number of CPUs in computers increases, these will be increasingly seen as low-level languages.
- Erlang: Erlang relies entirely on message passing for communication among threads (let's ignore for now that Erlang's terminology differs). Threads do not explicitly share memory. There are two problems with this:
- Erlang actually runs all threads inside a single process that uses only one CPU. This is an implementation detail, but in practice it limits flexibility, and the workarounds are less than ideal.
- The overhead of passing all data as messages between threads is very expensive, depending on the application. This is a general concern, but at some threshold number of CPUs, I expect it to become an acceptable cost of developing highly scalable multi-threaded software.
- Perl, Python, Ruby, etc.: These languages vary in their approaches to multi-threading, but I think it fair to say that none of them provide scalable, useful multi-threaded development support. I find this noteworthy because this class of languages is of increasing importance both for scripting and for larger-scale systems programming. These languages will have to adapt if they are to maintain their value as systems programming languages.
Okay, here's where I pull out my crystal ball. I predict that five years from now, the languages that provide the highest productivity with regard to multi-threaded programming will make message-passing easy (i.e. it will be the primary mode of multi-threaded development), and shared-image-based threading possible. Right now, none of the available languages I'm aware of provide this focus. What really concerns me though is that of the primary "scripting" languages, none are even close to providing the necessary programming infrastructure, let alone the appropriate focus on methodology. This is where my attentions are currently focused, and I expect many of my future ramblings will relate to the topic.