Skip to main content

Bless: A New Twist on Concurrent Programming

(Disclaimer: I'm just a guy who likes learning new programming languages. I'm not an academic language expert.)

I've been reading Coders at Work lately and really enjoying it. Fran Allen is a woman who won the Turing Award for her work on optimizing compilers. She said something interesting. She said that we've been in a rut since C came along because the pointers in C prevent the compiler from doing a lot of interesting optimizations. She said that programming languages before C were actually more optimizable. She also said something about using offsets rather than pointers so that the compiler could easily move memory around.

It reminded me of Dreaming in Code. Alan Kay said that when it comes to software, we're still building pyramids because we haven't yet discovered the programming version of the arch. Pyramids are heavy, whereas arches can support an amazing amount of weight while not having as much weight themselves.

I remember years ago that Alan Kay's statements made me think of actors in Erlang. Fran Allen's statements also made me think of actors in that an actor could be pointed to by a pointer, whereas all the state in that actor could be represented by offsets of that pointer. This is how you can isolate actors from each other within the same UNIX process. (Of course, you have to make sure the language doesn't provide the equivalent of pointer arithmetic with indexes. That's easy. References are indexes, and you can't set them arbitrarily.)

One thing is clear in "Coders at Work". Everyone is trying to figure out better ways to do concurrency. Currently, the three competing models are threads, actors, and STM (software transactional memory). One lesson that I learned when I read Why Science Fiction So Often Fails to Predict the Future is that life isn't homogeneous. That post pointed out that every form of human transportation is still in use somewhere in the world. Did I walk to work, drive, or take the train? The answer is a little bit of all three. Similarly, I've read that the CISC vs. RISC debate has fizzled because these days chips are CISC on the outside and RISC on the inside. Hence, I've been thinking of ways to use threads, actors, and STM together.

Bless is a new "twist" on concurrent programming. I haven't written it yet--I'm just thinking out loud to get feedback from fellow language enthusiasts. Bless is a little bit like Erlang at a high-level and a little like Python at the mid-level. I haven't decided on the low-level parts yet.

First, let's start with Erlang-style processes. I like to think of them as "green processes" because they form their own "address space", but they're implemented by the VM. However, to avoid confusion, I'm going to call them actors. An actor in Bless is an entity that is implemented by the VM. It has its own VM-level stack and its own VM-level heap. The VM-level stack must not use the C stack. At the C level, an actor can be referenced by a pointer (i.e. a location in memory), but all data within the actor is referenced by offsets to that pointer. Hence, the actor can move around in memory without disturbing the contents.

Each actor has its own green process. Actors can be scheduled onto any number of operating system threads. Erlang is like this.

Each actor can receive asynchronous messages using a queue. I think there are queue implementations that don't require locks; however, I was also thinking of implementing the queue using STM.

All data is local to an actor. The code for an actor looks sort of like a Python module. It has lexically scoped functions that have local variables. Variables global to the actor are kind of like instance variables in object oriented languages. If a module contains the code for an actor, you can run that same code on multiple green processes. For instance, you might use a new actor for every web server connection. Perhaps it makes sense to use the term "module" for the source code for an actor and "actor" for when that module is run on a green process (remember, I say "process" not in the UNIX sense, but rather because each actor has its own data).

Data is not immutable like In Erlang. However, messages between actors are passed by deeping copying the data. I'm thinking of the messages as being JSON-like messages. This is something my buddy Warren DeLano was a real fan of. Immutable data made a lot of sense when Erlang was created because you can implement message passing via passing a pointer to immutable data. However, mutable data is more familiar to most programmers, and if you have 100+ CPUs, you can afford to copy some data around. In a perfect implementation, messages would be contiguous blocks of data, and they could be memcpy'd between green processes. One thing I learned from the Sam Rushing is that you'd be amazed at how fast memcpy is, especially if you have 100+ CPUs to play with.

I know that a lot of functional people are going to be upset with my use of mutable data. That's okay. Most of these ideas still apply if you want to go purely functional. That's cool too.

Actors can block on IO, like in Erlang. (I don't buy Node.JS's argument that everything should be a callback. A little brilliance at the VM level can make everyone else's life easier. Think of how Java frees programmers from the pains of manual memory management.) However, like Erlang, you can use asynchronous IO under the covers.

Bless will need a scheduler in order to schedule actors onto the real kernel threads underneath. This sort of thing is done in Erlang and Scala, so there's nothing particularly mysterious about this. Note that if you have 100+ CPUs, the time slices can be longer which cuts down on the amount of work you need to do for scheduling.

I want Pythonic syntax. I want indentation-based syntax. I know some Python haters will reject Bless because of this, but they'd probably find other reasons to reject Bless anyway. Besides, Python isn't the only language to use indentation. Another one of my favorite languages, Haskell, does it too.

I want blocks. Blocks in Ruby are wonderful. I think the best way to represent a block in Python is "function_call(args) do: ...". The "do:" behaves the same way "if ...:" does. Blocks should follow the same scoping rules that if statements do. I'm not opposed to anonymous functions such as "def (args):".

I want keyword arguments. Python, Common Lisp, and Smalltalk all got this right.

I want variable declarations. I would prefer the keyword "var", but it should behave sort of like "my" in Perl. Variable declarations can help the system catch you if you misspell things, and they also help when you want to set a variable in an outer scope.

There is no "self" in method calls. It's just like functions in a module. However, you can of course "instantiate" the module (i.e. run the actor) multiple times in separate green processes. There's no need for "@" like in Ruby or "self." in Python because it's all lexically scoped.

If I decide to do static typing, I want it to look like ActionScript3, "var a: type". If I decide to do an ML-style type system, the ":type" can be optional.

Although I really like Python syntax, I think I'd be okay with a slightly-lower level language like Java. I.e., I'm not against using a compiler, and I am interested in resolving symbols at compile time. If you think about levels of dynamism, Python, Java, and Haskell all have valid, interesting approaches. I really like the middle ground that ActionScript3 provides. At this point, I care more about the approach to concurrency in Bless than which level of dynamism to use. Certainly, having both a compiler and an interpreter (like Common Lisp, Haskell, and Ocaml) is very appealing.

Compared to how crazy you can get in Python, I'd be okay with sacrificing a little bit of that dynamism for a little more safety, although I'm not a fan of babysitting the compiler like in Java, and I am still a fan of either duck typing or ML-style typing.

By the way, there are other forms of concurrency such as data parallelism, data-flow programming (aka promises and futures), etc. To some degree, I think they are orthogonal--i.e. you can do them in Bless as well, but that's not my focus. It's best to avoid scope creep and second system syndrome.

I think there are a few things I haven't straightened out in my head. I haven't thought about how to reuse functions between actors. Perhaps mixins make sense. I also haven't thought about inheritance.

Notice that if you do it right, you can safely run multiple programs in the same UNIX process using separate actors. However, to do this, you'll need to control how actors are able to obtain references to each other. This would be a nice feature, but it can easily be thrown out if it makes the implementation too complex--there's a reason we have UNIX processes. It may make sense to think of the actors within a UNIX process as all being "friendly", whereas separate UNIX processes are allowed to be somewhat less "friendly" with each other.

That's it. It's basically Erlang-style concurrency with Python syntax and actors with mutable "process-local" state (note, that I know that Erlang has mutable process-local state too). Easy peasy. Anyone know how to implement this thing? ;)

By the way, if I ever did manage to implement it, my next step would be to implement an operating system as I described here. Basically, start with a microkernel and then code all the things like the filesystem in userland using Bless.

Comments

I think I've settled on dynamic typing with optional type declarations, like ActionScript3.

I'm trying to figure out if I can get rid of objects or closures, but I don't think I can.

There's a fundamental difference between an actor and an object. An actor gets its own VM-level stack and heap, whereas objects don't. An object can only exist on a particular actor's VM-level heap.

Bless has all the same basic data types as Python (including ints that are actually bignums). You can create your own data types using classes.

I'm still wondering how to split up objects, actors, and modules. It's easy to take the Python approach, including the way objects and threads are related. However, I don't think that's what I want to do. I'm thinking that files should work the way they work in Java. A file can have one top-level "public thing". That thing starts with a keyword such as class, actor (object is to class as actor is to what?), or interface.

Bless is duck typed with optional static typing. Interfaces are optional.

There's some confusion if you use ":type" to denote a function's return value when the function itself has a ":" before the body, especially since the ":type" is optional. Hence, I'm switching to using this syntax "var a of Int". "of" and "var" are keywords. Classes must start with an uppercase like in Ruby and Haskell.

It should be written "a.len", not "len(a)". Casts should be written "OtherType(value)" since that is a constructor. I'm flexible on this last one.

Process ids should have a string representation. However, they should be objects. It must not be possible to create a process id yourself. You can only be given a process id. You must not be able to arbitrarily access a process you haven't been given access to by manufacturing a process id. A process id is an object that lives on an actor's VM-level heap.

Many different actors may all have their own process id objects pointing to the same process. That process might have already died leaving a dangling process id. When you try to send a message to a process, the VM must do so carefully because that process might have died and been reclaimed.

When an actor is laid out in RAM, it should start with a magic number that the VM can look at to tell whether there's an actor at that location in memory. When an actor dies, that magic number can be cleared out. STM can help with the coordination between kernel threads.

Bless has data that is global to all actors. It must be immutable. Global variables must be written in UPPER_CASE. I haven't decided whether I want the global variables to be statically defined at "compile time" or whether I want variables that can be set once like in Erlang. Globals may not be present in the first iterations of Bless.

An actor's heap can use a garbage collector that uses mark / sweep. Since there are lots and lots of actors, garbage collection isn't such a big deal. One actor can be running while another is undergoing garbage collection.

Actors themselves have VM-level heaps and hacks, and hence they are reference counted in some way. STM may help with this. The reference counts may represent process id objects being held by other actors. The reference can be incremented and decremented using STM.

As I said before, I'd like an interpreter and a compiler like Common Lisp has. The interpreter itself can run as an actor that launches other actors.

The VM must be stackless. Specifically, C extensions cannot recursively call into the interpreter. C extensions must not maintain any state that is outside of an actor's heap. It must be possible for multiple actors to separately make use of the same C extensions without having any sorts of race conditions because they each have their own VM-level heaps.
I've decided that using "def (args):" for anonymous functions is unnecessary. Blocks are good enough.

I want block-level scoping. That's the way "my" works in Perl.

I think I might need a "match" statement to better handle parsing messages received by an actor. Erlang is like that.

When I said that the interpreter was an actor, I meant that the parser should be an actor.

When you launch an actor, you have the option of asking for the VM to send you a message when the actor dies. The actor can die on purpose, or it can die if it raises an uncaught exception. The message can contain the value that the actor exited with or an exception. When the parent actor receives the message, the child actor may be long gone and already reclaimed.

I think there are many patterns that make sense when using actors. For instance:

An actor has an input queue, so actors themselves can act as serial access to resources.

It makes sense to use an actor per window when building a user interface.

It makes sense for actors to register other actors as listeners for events and for actors to send events to one another.

It makes sense for an actor to handle messages in an RPC-like manner--automatically dispatching messages to like-named methods.

It makes sense for actors to act as watch dogs. They can watch for other actors that die too soon, and they can respawn them.

I want object members to be declared, like in Java. Ideally, member access doesn't need to be a dict lookup. However, reflection should still be possible.

Eventually, it should be possible to have a process id that refers to an actor on another machine, a la Erlang.

I don't currently a need for "private" declarations, although I might change my mind. Encapsulation can be achieved at the actor level.

The scheduler may be an actor itself. When a new OS thread gets swapped in, it can ask the scheduler which actor to run. The scheduler may also be involved in deciding when to run garbage collection, but I don't know enough about how that works yet. It may make sense to have multiple schedulers when we get to having 10,000+ cores. They might decide to just shard the list of actors that they each need to schedule.
The next question is more about logistics. How the heck do I find enough time and money to build this thing?

One solution would be to get a grant from the national science foundation. I know a few people who have done that, but I'm not sure whether or not the NSF would be interested, especially since I'm not in a Ph.D program.

Another solution would be to find someone somewhere to help me get into a Ph.D program. That would be fun. However, I'm not sure I can support my wife and six kids on a Ph.D stipend.

Another solution would be to get hired as a researcher at a company such as Google or (gulp) Microsoft. Unfortunately, you have to have a Ph.D just to get hired as a researcher at most R&D labs. At least Google or Microsoft could afford to pay me a full salary. They might even be able to arrange a situation where I'm getting my Ph.D at a university while doing research at the company on Bless.

I've implemented Lisp twice, but I realize I have massive holes in my knowledge. For instance, I know there's a book on implementing VMs, but I haven't read it. I also know there's a book that the V8 guys used to make JavaScript so fast, but I haven't read that either.

My buddy Sam Rushing is building a language right now, but he made enough money when IronPort got bought by Cisco that he doesn't need to have an income. Unfortunately, I don't have that luxury.

The one thing that I have going for me is that it's an achievable project. Most of the parts have been done already in other projects. It's more practical than theoretical, although it's still far enough "out there" to be interesting as a research project. There are some people in the academic community who might make fun of languages like Python for being too mundane, but I certainly think it was an immensely valuable contribution to the field of programming. I think Bless could be like that.
Dj Gilcrease said…
Bless sounds similar to go (http://golang.org/)
Hey, Dj, I think Bless may be higher level than Go. Also, does Go have the concept of separate VM-level heaps?
I've come up with a slogan: "Concurrency for the Masses".

I think the Bless VM can be similar to the JVM in that it supports multiple syntaxes. It makes sense to have different actors written in different languages. They interact by passing each other copies of JSON-like data--that's pretty easy.

This is one possible way to integrate with C. Although C is "unsafe" in that it can arbitrarily destroy an entire UNIX process space, implementing an actor in C in such a way that it only allocates memory from its VM-level heap is a good way to reclaim memory that is leaked by a C program.
I'm making a subtle assumption. Ben Bangert showed me some new forms of memory that could drastically improve how much memory we can work with. Hence, I'm assuming a couple orders of magnitude more memory by the time we deal with 100+ CPUs.

It takes a lot of memory to give each actor its own VM-level stack and heap. That may not be doable in the short term if you're trying to create 10k-100k actors.

The reason I'm trying to give each actor its own VM-level stack and heap is to reduce contention among cores. However, there is a compromise between what we have now and what I'd like for Bless. The compromise is inspired by how Gevent, Eventlet, etc. work.

Create 4x as many threads as there are cores. As we get more cores, we get more threads. Each thread manages an OS-level stack and a VM-level heap.

Let's suppose we have 100 cores. This gives us 400 threads. Each thread has one OS-level stack and one VM-level heap. That one thread is responsible for managing memory in its VM-level heap, and it doesn't have to worry about other threads messing with that VM-level heap. Actors get assigned to a thread and allocate memory from that thread's heap. All of the actors assigned to a thread are time sliced. If you have 40,000 actors, then each thread manages 100 actors.

It may make sense to use more than 4 threads per CPU. It may make sense to use 100 threads per CPU. 100 CPUs * 100 threads per CPU * 10 actors per thread = 100,000 actors, which is what I'm aiming for.

It makes sense to have each thread managing a set of sockets with epoll. I'm not sure what to do with file IO since there are no non-blocking file IO APIs. It kind of sucks to have one actor doing blocking IO because that would block the whole thread, including all the other actors. Of course, blocking 10 actors is better than blocking all the other actors.

Perhaps it's not unthinking about that the OSs could provide a non-blocking file IO API. I know that FreeBSD "kernel schedulable entities" (KSEs) could do file IO in a non-blocking manner.
I think I have an approach to solve the blocking file IO problem. If you have 100,000 actors running on 1000 threads, you can devote 100 actors each with their own thread (i.e. 100 threads) to doing IO. They will receive IO requests asynchronously from other actors and respond with the results.

Hence, most threads will never block on anything. Only threads dedicated to blocking IO can block, and those threads will only have one actor each. You don't need that many threads that can block on IO because there are only so many things you can block on. For instance, networking doesn't count since there are asynchronous network APIs.
By the way, that approach to dealing with blocking IO is inspired by Node.JS.
Masklinn said…
> Immutable data made a lot of sense when Erlang was created because you can implement message passing via passing a pointer to immutable data. However, mutable data is more familiar to most programmers, and if you have 100+ CPUs, you can afford to copy some data around.

Erlang's data immutability has no relation whatsoever with passing messages around as pointers. As far as I know, *all* data but binaries are copied from one process to the next.

Binaries is the only exception, they're reference-counted and passed by reference, but this has as much to do with blowing up local process heaps as with the time it takes to copy megabytes of memory (an other specificity of binaries is that sections of binaries are pointers to the original binary, kind-of like taking the tail of a list, except it references the whole binary not just the used section).
bomma said…
Have you looked at AmbientTalk? Very similar :)
http://code.google.com/p/ambienttalk/
No, I haven't. Thanks!
Bless must have an approach to UNIX signal handling. When a UNIX signal occurs, the signal should be passed as a message to an actor on a thread devoted to handling signals. UNIX signals run in an weird no-man's land that is difficult to deal with, but I think passing a message to an actor is achievable. Bless programs interested in handling signals can pass messages to the actor responsible for handling signals. That actor can send messages back to interested actors when a signal is received.

Let me talk about index-based pointers. First of all, they're a massive pain in the butt if you're trying to work with existing C code that doesn't expect them. C code likes having absolute pointers. Fran Allen said that if you give up this level of indirection and allow C to have absolute pointers, you end up in a rat hole in which a lot of compiler optimizations are not possible. Index-based pointers aren't all that weird. If you have a pointer to a VM-level heap, you just add the index to that pointer via C's pointer arithmetic. When I think back on segment-based addressing (shudder), it reminds me that pointer + offset is something that machines can do very easily and quickly. Once you have index-based pointers, it has two advantages. I think that garbage collection algorithms may become a bit easier since you can move the VM-level heap around when you're doing mark and sweep. Second of all, it allows you to realloc the VM-level heap to make it bigger without breaking any of the index-based pointers within the heap. Remember, I'm thinking that each thread will manage its own VM-level heap for the actors running within that thread.

Speaking of malloc, I'll need a malloc implementation. Actually, I'll need two. There's the malloc devoted to the UNIX process as a whole. It must allocate a VM-level heap for each thread. This is something that happens rarely. Then there's the malloc for allocating memory withing a VM-level heap used by a thread. It allocates memory and returns an index-based pointer. I think that each thread can maintain a pointer to its VM-level heap in its own thread-local storage. I may be able to grab Python's implementation of malloc and hack it to be index-based.

A lot of the way that I'm thinking of message passing may be familiar if you've looked at ZeroMQ. That shouldn't be surprising since I think of ZeroMQ as a C-based implementation of the ideas in Erlang, and Bless is clearly inspired by Erlang. It sure would be nice if I could just use ZeroMQ for my message passing implementation. However, I can think of a few challenges. 1) ZeroMQ doesn't know about Bless's approach to managing memory. 2) I want the transport used to talk to internal actors (interal to the process) and external actors to be somewhat hidden from the sender, whereas ZeroMQ exposes this. I don't think either of these are deal breakers. However, I don't know how far I can get by just using ZeroMQ creatively vs. whether or not I just just grab its code and hack it for my own purposes.
There's a reason I keep coming back to C. It's probably obvious. The easier it is to integrate with existing C libraries, the more likely a language is to be successful. Someone once told me that this was a huge problem for Smalltalk, whereas good C integration is one of the reasons why Python is so successful. That's why I keep trying to come up with more ways to integrate with C.

Another way to integrate with C would be to put C extensions in their own UNIX processes and talk with them via message passing. This is the ZeroMQ way of integrating multiple languages. There's no reason a program that doesn't run on the Bless VM can't interact with messages from a Bless VM.

Another "crazy" way to integrate with C is to do what VMware does--emulate the machine so that you can interpret native binary. This would allow me to run any C extension in process, and it would let me constrain it to make it safe. This approach is certainly hard, but it also has some real advantages. If a new architecture comes along (which seems at least possible by the time we get 100+ CPUs), I can continue interpreting C code that was compiled to run on x86.

Another "crazy" way to integrate with C is to use my own "hacked" C compiler. I could have the compiler spit out byte code that runs on the Bless VM. Certainly the code would run more slowly, but it may be worth it if you have 100+ CPUs all running in parallel and you can use existing C code.

I wonder how far I can get toward self hosting like the Smalltalk guys did? I think my mind just segfaulted ;)
Why would you need your own hacked C compiler? Hack something on top of LLVM instead.
> Why would you need your own hacked C compiler? Hack something on top of LLVM instead.

Sweet! Thanks.
Jim Roepcke said…
Have you looked at http://reia-lang.org/
> Have you looked at http://reia-lang.org/

Yep. I was following the mailing list for several months. Good stuff!
Anonymous wrote:

I've only had the opportunity to skim, but I am glad that language
design is not dead! Here are a bunch of loose shots (some may miss
widely :-).

For comparison, I notice you've seen Go -- but have you seen Tart
(http://code.google.com/p/tart/,
http://viridia.org/projects/tart/doc/index.html)?

I think the idea of combining Erlang and Python ideas makes sense --
Erlang is better at concurrency, but Python is better at pretty much
everything else. :-)

Re: the speed of copying data: I can't tell from your blog (as I said,
I just skimmed) but if you have 100+ CPUs data will be copied over
some kind of bus that may be blindingly fast but isn't anywhere near
the speed of RAM let alone L2 cache.

I am a bit skeptical about the relative addressing thing, mostly
because it sounds like it will make implementing the VM hard (other
languages don't have this concept so you can't use their constructs
directly).

Re: mutability: IIUC the problem with mutability isn't so much
functional purity (purity schmurity, to summarize the Zen of Python
:-) but the semantics of having "state" somewhere and keeping that
state consistent. You can't copy a mutable object behind the
programmer's back because the semantics would change. The same thing
happens at a different level for databases: these are typically seen
as mutable immediately consistent shared state, but sometimes scalable
apps need faster read access and people get into kludges like creating
many slave copies of a master database, with eventually consistent
semantics -- or you get non-SQL models like Bigtable (and App Engine
datastore), couchdb etc.

On the third hand if you are deep-copying data to do message passing I
think you are in effect using immutability, aren't you?

Thank you for rejecting the "everything is a callback" philosophy. It
is evil (even if Twisted also lives it).

A bunch of stuff you mention for surface syntax is not very important
(e.g. blocks, keyword args, selflessness, len(a) vs. a.len(), etc.).
It doesn't deeply affect the language's "feel" (although obviously you
have to pick something). TBH I don't think an indentation-based
language is likely to succeed these days -- braces have become the
de-facto block delimiter for all new languages. Just go with it.

Static vs. dynamic typing OTOH will deeply affect your language
because it will affect how much of your language's promises (about
consistency etc.) you can put in the compiler. There are techniques
for type inferencing (see e.g. Scala, also Go and Tart) that, at the
cost of a much more complex compiler, can let you write code that is
almost as short as Python and as safe as Java. But (as I've seen in
Tart's development) it can seriously slow down writing the compiler.

Please don't go with JavaScript's prototype-based not-classes. :-)

Re: NSF grant: that's a waste of time if you're not serious about
entering and staying in Academia. It's the ultimate Old Boys Club. Do
it like I did with Python: in your spare time, slowly, cutting corners
where you can on stuff that you need but where you don't think you can
add something new, either borrowing ideas from established languages,
or borrowing implementations from other open source projects (there's
so much these days!).

I didn't see you write about Erlang's pattern matching, which I think
is brilliant and important and possibly essential.

Isn't there someone in the Mozilla foundation also working on a new language?

Finally: don't think of 100+ CPUs as your goal. Think of several
Google data centers full of CPUs, spread across the world.

Let me know how far off I am!
I've been learning Go, lately, and I'm pretty happy with it.
Wow, I've been looking at Dart lately, and it's very neat: http://www.dartlang.org/
Shannon Behrens said…
Thinking back on this blog post, I'd like to point out Go, Dart, Rust, and TCMalloc.
Shannon Behrens said…
And Nim: http://nim-lang.org/
Anonymous said…
Look at http://www.ponylang.org/ , very cool stuff.