Skip to main content

Computer Science: Offset-based Linked Lists

Note, I'm speaking a little loosely, and I'm not a C expert, but I think this post is still interesting nonetheless.

There are many ways to "architect" and implement lists.

These days, it's the norm in languages like Java, Python, etc. for the language or language library itself to provide a list implementation. You simply create a list and then start shoving objects into it. Python, Ruby, Perl and PHP all provide native syntax for lists. The algorithm used is more sophisticated than a singly or doubly-linked list because lists must double as arrays in those languages. Java and C++ have native arrays and array syntax, but they also provide a variety of list implementations in their libraries. Both can use generics to constrain the type of objects you put in those lists. One thing to note, however, is that there's a distinction between the list itself and the items in the list.

Since C's standard library doesn't provide a list implementation, a variety of approaches are used. It's not uncommon to create structs that have prev and next pointers in them and to manage the list manually. In general, it seems common in C to create data structures manually and to simply manage the data structure as part of the overall programming task. This is in stark contrast to, say, Python where the list implementation is in C and the code using the list is in Python.

My buddy Kelly Yancey once showed me that FreeBSD had macros so that programmers wouldn't have to keep reimplementing linked lists all the time in the kernel. I think macros were used instead of functions so that the code behaved as if it were actually written inline, thus avoiding the function call overhead--but I could be wrong.

At Metaweb, I had a buddy named Scott Meyer who use to work at Oracle. He showed me a pretty interesting trick for creating linked lists. As before, he had structs which contained application data as well as next and prev pointers for managing the structs in a linked list. However, rather than manipulate those linked lists manually, he wrote separate, generic linked list functions. The question is: how can a generic linked list function operate on random structs (i.e. void *'s) that just happen to have prev and next fields somewhere within them?

For each type of struct, he would create a "descriptor" struct that contained the offset of the prev and next fields. He would pass the descriptor to the function anytime he needed to manipulate the linked list. Loosely speaking, it's like saying, "Hey, I got this linked list, and I want you to insert a new member into it. You might not know anything about the type of structs in the linked list, but I can tell you that the prev pointer is 16 bytes from the beginning of the struct, and the next pointer is 20 bytes from the beginning."

I think the lesson is deep. If you want to write a function that operates generically on structs that you know have certain fields, you just have to tell that function the offset of those fields. Naturally, you don't want to just pass the offsets. Rather, you pack them up into a struct. Creating such a struct is like declaring that you implement some interface. I had seen how structs full of function pointers doubled as interfaces in Apache, but structs full of offsets was something entirely new for me.

Neat!

Comments

Ian Bicking said…
Alan Kay described what he considers the first object programming similarly. I can't remember enough keywords to look it up. But basically it was in filesystems -- people wrote different filesystems for tapes, but this caused problems as new filesystems appeared or old filesystems were forgotten about. So some unnamed programmer made a system where the tape contained the code to access the records on the tape at known offsets. So, say, the get-record routine was at offset 0 (or maybe a pointer? Or maybe enough room to just jump to where the routine was held), set-record at offset 10, etc.

Similarly the linked list routines are just one step away from being a dynamically typed system -- if you put the struct describing offsets in the original struct itself then you'd have a type.
Joseph said…
This is off topic, but that wouldn't happen to be the Scott Meyers of Effective C++ fame would it?
mike barton said…
Huh... I've done similar things in the past, but my generic list functions mostly take a next pointer offset, which is computed with a macro. Something like:

#define NEXTOFFSET(list) ((unsigned long)((list)->next) - (unsigned long)(list))

Then,
list = list_sort(list, NEXTOFFSET(list), comparison_function);

I guess building a framework to organize that isn't too bad an idea.
Dave Kirby said…
The entire Amiga operating system was built around a linked list structure - every object started with the linked list struct, and the OS provided functions for navigating round them, even from assembler. Everything else was referenced by offsets to one of these structs - by convention data was at positive offsets and pointers to functions were at negative offsets. This enabled consistent object oriented programming, even from assembler.

In many ways the Amiga was way ahead of its time.
Kelly Yancey said…
Hey JJ, just wanted to add that I've seen versions of the BSD queue.h macros that used offsets from a known base rather than pointers for the next/previous links. The advantage is that you can then put the linked objects in a shared memory segment (the segment may have a different base address in each process).

There was a proposal to add them to FreeBSD back in 2002 which includes the source code.
jjinux said…
Ian, thanks for your comment.
jjinux said…
> This is off topic, but that wouldn't happen to be the Scott Meyers of Effective C++ fame would it?

Embarrassingly, I don't know. I don't know him all that well. I do know he's a brilliant hacker, and he's the manager of the graph database team at Metaweb.
jjinux said…
Wow, thanks for the great comments, everyone! I was taking a risk posting something outside my realm of experience, but, man, what interesting responses!
Anonymous said…
Yet another case of "why I didn't I think of it before?". A very simple approach to take advantage of obvious commonalities in existing code. .

I thnk that the Scott Meyers of Effective C++ is into training and consulting http://www.aristeia.com/
jjinux said…
> I thnk that the Scott Meyers of Effective C++ is into training and consulting

Ah, not the same guy.
Anonymous said…
Interesting that you mention macros, and passing offsets... However, if you consistently name the next/prev pointers, you can combine the two into a macro that calls the underlying function with the offsets for the specific struct. For example:

#define LINKSHOW(object) linkshow(((void *) object.next) - ((void *) object), \
((void *) object.prev) - ((void *) object), object)

so you can just call "LINKSHOW(object)" which then calls linkshow(nextOffset, prevOffset, object)

The magic of macros effectively implement C++-like code templates.

Sean
jjinux said…
> #define LINKSHOW(object) linkshow(((void *) object.next) - ((void *) object), \
((void *) object.prev) - ((void *) object), object)

Very nice ;)
George Reilly said…
Windows, internally, has used the same idea for many years. Look at CONTAINING_RECORD and LIST_ENTRY, which are defined in ntdef.h. LIST_ENTRYs are used heavily in the Windows source code, though it's only exposed to general developers in driver code.

Popular posts from this blog

Drawing Sierpinski's Triangle in Minecraft Using Python

In his keynote at PyCon, Eben Upton, the Executive Director of the Rasberry Pi Foundation, mentioned that not only has Minecraft been ported to the Rasberry Pi, but you can even control it with Python. Since four of my kids are avid Minecraft fans, I figured this might be a good time to teach them to program using Python. So I started yesterday with the goal of programming something cool for Minecraft and then showing it off at the San Francisco Python Meetup in the evening.

The first problem that I faced was that I didn't have a Rasberry Pi. You can't hack Minecraft by just installing the Minecraft client. Speaking of which, I didn't have the Minecraft client installed either ;) My kids always play it on their Nexus 7s. I found an open source Minecraft server called Bukkit that "provides the means to extend the popular Minecraft multiplayer server." Then I found a plugin called RaspberryJuice that implements a subset of the Minecraft Pi modding API for Bukkit s…

Apple: iPad and Emacs

Someone asked my boss's buddy Art Medlar if he was going to buy an iPad. He said, "I figure as soon as it runs Emacs, that will be the sign to buy." I think he was just trying to be funny, but his statement is actually fairly profound.

It's well known that submitting iPhone and iPad applications for sale on Apple's store is a huge pain--even if they're free and open source. Apple is acting as a gatekeeper for what is and isn't allowed on your device. I heard that Apple would never allow a scripting language to be installed on your iPad because it would allow end users to run code that they hadn't verified. (I don't have a reference for this, but if you do, please post it below.) Emacs is mostly written in Emacs Lisp. Per Apple's policy, I don't think it'll ever be possible to run Emacs on the iPad.

Emacs was written by Richard Stallman, and it practically defines the Free Software movement (in a manner of speaking at least). Stal…

JavaScript: Porting from react-css-modules to babel-plugin-react-css-modules (with Less)

I recently found a bug in react-css-modules that prevented me from upgrading react-mobx which prevented us from upgrading to React 16. Then, I found out that react-css-modules is "no longer actively maintained". Hence, whether I wanted to or not, I was kind of forced into moving from react-css-modules to babel-plugin-react-css-modules. Doing the port is mostly straightforward. Once I switched libraries, the rest of the port was basically:
Get ESLint to pass now that react-css-modules is no longer available.Get babel-plugin-react-css-modules working with Less.Get my Karma tests to at least build.Get the Karma tests to pass.Test things thoroughly.Fight off merge conflicts from the rest of engineering every 10 minutes ;) There were a few things that resulted in difficult code changes. That's what the rest of this blog post is about. I don't think you can fix all of these things ahead of time. Just read through them and keep them in mind as you follow the approach above.…