Skip to main content

Python: Limitations of Coroutines via Enhanced Generators

I've been thinking a lot about tricks such as the coroutine system in this article. PEP 342 improves the situation for Python 2.5. In fact, Example 3 in the PEP is exactly what was on my mind. I came to the same conclusion as the PEP:
In effect, this example emulates simple tasklets as are used in Stackless Python, as long as you use a yield expression to invoke routines that would otherwise "block".
However, what happens if you can't use a yield expression to invoke routines that would otherwise block?

For instance, say coro() is a coroutine that calls third_party_1() that calls third_party_2() that calls blocker() where blocker() blocks. If third_party_1() and third_party_2() are intermediary functions that exist in third-party code, they won't know how to "do the right" thing using yield. If third_party_1() and third_party_2() don't do the right thing with yield, there's no way for blocker() to yield back to the scheduler.

Ideally, I'd like for coro() to be a coroutine that blocks within blocker(), leaving third_party_1() and third_party_2() ignorant of the whole process. If you know how, please tell me!


Anonymous said…
Let's assume blocker() is python code and that it is blocking on file I/O. You should be able to replace the __builtin__.file() function to return a file object that replaces read(), write(), etc. with implementations using yield and nonblocking_read(), nonblocking_write(), etc.

You could do the same trick with socket.socket() and any other common blocking offenders.

In which case, you could have completely-transparent non-blocking coroutine goodness. Which means that third-party (python) code wouldn't need to be re-written to take advantage of your non-blocking coroutine scheduler. Assuming this works (disclaimer: I haven't tried it), you could *almost* write the echo server example in PEP 342 as a textbook accept/read/write loop, but which actually handles multiple concurrent connections via non-blocking I/O under the hood. I can't think of a sneaky way offhand to hide the Trampoline run loop, though.

Of course, in your example, if blocker() was implemented via a C library that did blocking I/O, your SOL.
jjinux said…
I absolutely had envisioned overriding the socket module and file module. That's a given ;) However, the problem is that a context switch can only happen with a yield *directly back to the scheduler*, and the third party code doesn't know how to follow this "coding style".

If coro(), third_party_1(), third_party_2(), and blocker() all called yield, then the "chaining" would work. However, if third_party_1() and third_party_2() don't call yield (they don't know about this coro business), then there's no way for blocker() to call yield. Afterall, third_party_2() doesn't know that blocker() is a generator. blocker() has to be called from the scheduler which does know that blocker() is a generator.

Confusing, eh?
simon said…
The scheduler system I use was derived from the round-robin scheduler presented in the deque documentation. I added extra facilities to reschedule tasks based on the yielded value from a task. I can choose the value the task is going to yield on its next iteration, from other parts of the code, as the generator is wrapped in a class which can dynamically switch the .next method.

This lets me perform actions inside the scheduler based on yielded task values. The values might be anything, however, if an UNBLOCK instance is returned, the scheduler spawns a thread which iterates the task once, then joins back into scheduler thread (which is usually the main thread).

This means that if I have to make a blocking call, and it really bothers me, I surround the potential blocking call with yield statements which look like this:

yield UNBLOCK()
yield None

This approach does have some problems, however in my simple use cases, I haven't experienced any.

The framework I use does have some limitations. I currently treat all tasks as a big flat list. If a task yields another task, it gets ignored. I'll probably change this sometime soon, now that we have real coroutines to play with. You can see the implementation in google SVN.
Anonymous said…
Ah, I see. The problem is how to return a value back to a frame several frames above the current frame while retaining a reference to the current frame so you can resume execution later (which then leads to the question of how to identify the outer frame to return to).

Generators fit the frame resumption half of this description without supporting returning a value to a different (parent) frame. I guess that is why it feels like they should do the trick, but don't. :(
jjinux said…
Guido has said that "generators get you 80% of the way to continuations." In this case, the 20% is clearly missing. Continuations would totally solve this problem, but I guess that's old news.
jjinux said…
> yield UNBLOCK()
> i_might_block()
> yield None

This is nice and probably the best that can be done given the circumstances. However, if you assume that at some point during handling the request, you're *always* going to need to do this, you've really decayed into normal multi-threading :-/ Your technique is very helpful in cases where you can *mostly* stay away from blocking code.

Thank you so much for taking the time to answer my question.

Richard said…
> I absolutely had envisioned overriding
> the socket module and file module.

This is something I am trying to get done for Stackless at the moment, providing versions of socket, file and maybe more which operate asynchronously in the background while stalling the caller on a Stackless channel.
jjinux said…
/me is quite interested.
Brandon L. Golm said…
JJ asked me to comment, so I will. Normally I try to say something funny (usually making fun of JJ), but that seems difficult considering the fact that I don't know what he's talking about.

That said, you are talking about synonizing two antonyms in one process. This is like having half your threads use cooperative multitasking, and half of them use preemptive.

That's a nice idea, but the preemption is going to happen with lower-level scheduler than the cooperative stuff; that's by the definition of "cooperative multitasking." That is to say, the executing code does the scheduling.

In a way, JJ, you are asking how to impose cooperative multitasking on code that assumes a preemptive environment. "kelly" et. al. would suggest ways to re-superclass, or subclass your third-party libraries (akin to the C++ superclass pointer-change trick of old vesions of uControl).

I would suggest this: put your third_party stuff in it's own process, and talk to it however you like. The kernel, after all, has a pretty good preemptive scheduler.

I do have plans for a kernel-support for userland scheduling that will support both preemption and cooperation, but it won't be released in time for you.
B. said…
I tought about using the PEP to implement cororoutines but Armin Rigo has such a nice implementation in his "greenlets" C module that I chose to use that for my python cororoutine needs.

However, the problem is that a context switch can only happen with a yield *directly back to the scheduler*

Let's take an example - a socket read. Could you split the read into two parts? I think so. The trick is that you'll need to set up a trampoline in the socket interface to avoid trying to yield back thru the third-party code.

"socket.recv" calls "socket._yieldy_recv". The internal function is a generator. The "recv" call first needs to set up a trampoline: First "recv" creates a genator object "r = self._yieldy_recv()". "Recv" then needs to give the schedule a read "s.sched_read(r)". Finally "recv" will wait for the scheduler for some data: "". The next call then needs to ask the scheduler to do some useful work until it returns ... maybe "s.poller_gen.send(None)" or something like that. The poller eventually finds work and then calls "r.send(n_bytes)" or whatever.
jjinux said…
I've read it twice, but I'm having a hard time understanding that post. Can you use the coro(),third_party_1(), etc. example that I used above?

Thanks for taking the comment.
jjinux said…
Yep, it sounds like the greelets code has the "smart" stack copying that I was looking for.

Popular posts from this blog

Drawing Sierpinski's Triangle in Minecraft Using Python

In his keynote at PyCon, Eben Upton, the Executive Director of the Rasberry Pi Foundation, mentioned that not only has Minecraft been ported to the Rasberry Pi, but you can even control it with Python. Since four of my kids are avid Minecraft fans, I figured this might be a good time to teach them to program using Python. So I started yesterday with the goal of programming something cool for Minecraft and then showing it off at the San Francisco Python Meetup in the evening.

The first problem that I faced was that I didn't have a Rasberry Pi. You can't hack Minecraft by just installing the Minecraft client. Speaking of which, I didn't have the Minecraft client installed either ;) My kids always play it on their Nexus 7s. I found an open source Minecraft server called Bukkit that "provides the means to extend the popular Minecraft multiplayer server." Then I found a plugin called RaspberryJuice that implements a subset of the Minecraft Pi modding API for Bukkit s…

Apple: iPad and Emacs

Someone asked my boss's buddy Art Medlar if he was going to buy an iPad. He said, "I figure as soon as it runs Emacs, that will be the sign to buy." I think he was just trying to be funny, but his statement is actually fairly profound.

It's well known that submitting iPhone and iPad applications for sale on Apple's store is a huge pain--even if they're free and open source. Apple is acting as a gatekeeper for what is and isn't allowed on your device. I heard that Apple would never allow a scripting language to be installed on your iPad because it would allow end users to run code that they hadn't verified. (I don't have a reference for this, but if you do, please post it below.) Emacs is mostly written in Emacs Lisp. Per Apple's policy, I don't think it'll ever be possible to run Emacs on the iPad.

Emacs was written by Richard Stallman, and it practically defines the Free Software movement (in a manner of speaking at least). Stal…

ERNOS: Erlang Networked Operating System

I've been reading Dreaming in Code lately, and I really like it. If you're not a dreamer, you may safely skip the rest of this post ;)

In Chapter 10, "Engineers and Artists", Alan Kay, John Backus, and Jaron Lanier really got me thinking. I've also been thinking a lot about Minix 3, Erlang, and the original Lisp machine. The ideas are beginning to synthesize into something cohesive--more than just the sum of their parts.

Now, I'm sure that many of these ideas have already been envisioned within, LLVM, Microsoft's Singularity project, or in some other place that I haven't managed to discover or fully read, but I'm going to blog them anyway.

Rather than wax philosophical, let me just dump out some ideas:Start with Minix 3. It's a new microkernel, and it's meant for real use, unlike the original Minix. "This new OS is extremely small, with the part that runs in kernel mode under 4000 lines of executable code." I bet it&…