Thursday, August 24, 2006

Python: Limitations of Coroutines via Enhanced Generators

I've been thinking a lot about tricks such as the coroutine system in this article. PEP 342 improves the situation for Python 2.5. In fact, Example 3 in the PEP is exactly what was on my mind. I came to the same conclusion as the PEP:
In effect, this example emulates simple tasklets as are used in Stackless Python, as long as you use a yield expression to invoke routines that would otherwise "block".
However, what happens if you can't use a yield expression to invoke routines that would otherwise block?

For instance, say coro() is a coroutine that calls third_party_1() that calls third_party_2() that calls blocker() where blocker() blocks. If third_party_1() and third_party_2() are intermediary functions that exist in third-party code, they won't know how to "do the right" thing using yield. If third_party_1() and third_party_2() don't do the right thing with yield, there's no way for blocker() to yield back to the scheduler.

Ideally, I'd like for coro() to be a coroutine that blocks within blocker(), leaving third_party_1() and third_party_2() ignorant of the whole process. If you know how, please tell me!

12 comments:

Kelly Yancey said...

Let's assume blocker() is python code and that it is blocking on file I/O. You should be able to replace the __builtin__.file() function to return a file object that replaces read(), write(), etc. with implementations using yield and nonblocking_read(), nonblocking_write(), etc.

You could do the same trick with socket.socket() and any other common blocking offenders.

In which case, you could have completely-transparent non-blocking coroutine goodness. Which means that third-party (python) code wouldn't need to be re-written to take advantage of your non-blocking coroutine scheduler. Assuming this works (disclaimer: I haven't tried it), you could *almost* write the echo server example in PEP 342 as a textbook accept/read/write loop, but which actually handles multiple concurrent connections via non-blocking I/O under the hood. I can't think of a sneaky way offhand to hide the Trampoline run loop, though.

Of course, in your example, if blocker() was implemented via a C library that did blocking I/O, your SOL.

Shannon -jj Behrens said...

I absolutely had envisioned overriding the socket module and file module. That's a given ;) However, the problem is that a context switch can only happen with a yield *directly back to the scheduler*, and the third party code doesn't know how to follow this "coding style".

If coro(), third_party_1(), third_party_2(), and blocker() all called yield, then the "chaining" would work. However, if third_party_1() and third_party_2() don't call yield (they don't know about this coro business), then there's no way for blocker() to call yield. Afterall, third_party_2() doesn't know that blocker() is a generator. blocker() has to be called from the scheduler which does know that blocker() is a generator.

Confusing, eh?

simon said...

The scheduler system I use was derived from the round-robin scheduler presented in the deque documentation. I added extra facilities to reschedule tasks based on the yielded value from a task. I can choose the value the task is going to yield on its next iteration, from other parts of the code, as the generator is wrapped in a class which can dynamically switch the .next method.

This lets me perform actions inside the scheduler based on yielded task values. The values might be anything, however, if an UNBLOCK instance is returned, the scheduler spawns a thread which iterates the task once, then joins back into scheduler thread (which is usually the main thread).

This means that if I have to make a blocking call, and it really bothers me, I surround the potential blocking call with yield statements which look like this:

yield UNBLOCK()
i_might_block()
yield None

This approach does have some problems, however in my simple use cases, I haven't experienced any.

The framework I use does have some limitations. I currently treat all tasks as a big flat list. If a task yields another task, it gets ignored. I'll probably change this sometime soon, now that we have real coroutines to play with. You can see the implementation in google SVN.

Kelly Yancey said...

Ah, I see. The problem is how to return a value back to a frame several frames above the current frame while retaining a reference to the current frame so you can resume execution later (which then leads to the question of how to identify the outer frame to return to).

Generators fit the frame resumption half of this description without supporting returning a value to a different (parent) frame. I guess that is why it feels like they should do the trick, but don't. :(

Shannon -jj Behrens said...

Guido has said that "generators get you 80% of the way to continuations." In this case, the 20% is clearly missing. Continuations would totally solve this problem, but I guess that's old news.

Shannon -jj Behrens said...

> yield UNBLOCK()
> i_might_block()
> yield None

This is nice and probably the best that can be done given the circumstances. However, if you assume that at some point during handling the request, you're *always* going to need to do this, you've really decayed into normal multi-threading :-/ Your technique is very helpful in cases where you can *mostly* stay away from blocking code.

Thank you so much for taking the time to answer my question.

*sigh*

Richard Tew said...

> I absolutely had envisioned overriding
> the socket module and file module.

This is something I am trying to get done for Stackless at the moment, providing versions of socket, file and maybe more which operate asynchronously in the background while stalling the caller on a Stackless channel.

Shannon -jj Behrens said...

/me is quite interested.

Brandon L. Golm said...

JJ asked me to comment, so I will. Normally I try to say something funny (usually making fun of JJ), but that seems difficult considering the fact that I don't know what he's talking about.

That said, you are talking about synonizing two antonyms in one process. This is like having half your threads use cooperative multitasking, and half of them use preemptive.

That's a nice idea, but the preemption is going to happen with lower-level scheduler than the cooperative stuff; that's by the definition of "cooperative multitasking." That is to say, the executing code does the scheduling.

In a way, JJ, you are asking how to impose cooperative multitasking on code that assumes a preemptive environment. "kelly" et. al. would suggest ways to re-superclass, or subclass your third-party libraries (akin to the C++ superclass pointer-change trick of old vesions of uControl).

I would suggest this: put your third_party stuff in it's own process, and talk to it however you like. The kernel, after all, has a pretty good preemptive scheduler.

I do have plans for a kernel-support for userland scheduling that will support both preemption and cooperation, but it won't be released in time for you.

B. said...

I tought about using the PEP to implement cororoutines but Armin Rigo has such a nice implementation in his "greenlets" C module that I chose to use that for my python cororoutine needs.

However, the problem is that a context switch can only happen with a yield *directly back to the scheduler*

Let's take an example - a socket read. Could you split the read into two parts? I think so. The trick is that you'll need to set up a trampoline in the socket interface to avoid trying to yield back thru the third-party code.

"socket.recv" calls "socket._yieldy_recv". The internal function is a generator. The "recv" call first needs to set up a trampoline: First "recv" creates a genator object "r = self._yieldy_recv()". "Recv" then needs to give the schedule a read "s.sched_read(r)". Finally "recv" will wait for the scheduler for some data: "r.next()". The next call then needs to ask the scheduler to do some useful work until it returns ... maybe "s.poller_gen.send(None)" or something like that. The poller eventually finds work and then calls "r.send(n_bytes)" or whatever.

Shannon -jj Behrens said...

I've read it twice, but I'm having a hard time understanding that post. Can you use the coro(),third_party_1(), etc. example that I used above?

Thanks for taking the comment.

Shannon -jj Behrens said...

Yep, it sounds like the greelets code has the "smart" stack copying that I was looking for.