Skip to main content

Computer Science: Offset-based Linked Lists

Note, I'm speaking a little loosely, and I'm not a C expert, but I think this post is still interesting nonetheless.

There are many ways to "architect" and implement lists.

These days, it's the norm in languages like Java, Python, etc. for the language or language library itself to provide a list implementation. You simply create a list and then start shoving objects into it. Python, Ruby, Perl and PHP all provide native syntax for lists. The algorithm used is more sophisticated than a singly or doubly-linked list because lists must double as arrays in those languages. Java and C++ have native arrays and array syntax, but they also provide a variety of list implementations in their libraries. Both can use generics to constrain the type of objects you put in those lists. One thing to note, however, is that there's a distinction between the list itself and the items in the list.

Since C's standard library doesn't provide a list implementation, a variety of approaches are used. It's not uncommon to create structs that have prev and next pointers in them and to manage the list manually. In general, it seems common in C to create data structures manually and to simply manage the data structure as part of the overall programming task. This is in stark contrast to, say, Python where the list implementation is in C and the code using the list is in Python.

My buddy Kelly Yancey once showed me that FreeBSD had macros so that programmers wouldn't have to keep reimplementing linked lists all the time in the kernel. I think macros were used instead of functions so that the code behaved as if it were actually written inline, thus avoiding the function call overhead--but I could be wrong.

At Metaweb, I had a buddy named Scott Meyer who use to work at Oracle. He showed me a pretty interesting trick for creating linked lists. As before, he had structs which contained application data as well as next and prev pointers for managing the structs in a linked list. However, rather than manipulate those linked lists manually, he wrote separate, generic linked list functions. The question is: how can a generic linked list function operate on random structs (i.e. void *'s) that just happen to have prev and next fields somewhere within them?

For each type of struct, he would create a "descriptor" struct that contained the offset of the prev and next fields. He would pass the descriptor to the function anytime he needed to manipulate the linked list. Loosely speaking, it's like saying, "Hey, I got this linked list, and I want you to insert a new member into it. You might not know anything about the type of structs in the linked list, but I can tell you that the prev pointer is 16 bytes from the beginning of the struct, and the next pointer is 20 bytes from the beginning."

I think the lesson is deep. If you want to write a function that operates generically on structs that you know have certain fields, you just have to tell that function the offset of those fields. Naturally, you don't want to just pass the offsets. Rather, you pack them up into a struct. Creating such a struct is like declaring that you implement some interface. I had seen how structs full of function pointers doubled as interfaces in Apache, but structs full of offsets was something entirely new for me.



Ian Bicking said…
Alan Kay described what he considers the first object programming similarly. I can't remember enough keywords to look it up. But basically it was in filesystems -- people wrote different filesystems for tapes, but this caused problems as new filesystems appeared or old filesystems were forgotten about. So some unnamed programmer made a system where the tape contained the code to access the records on the tape at known offsets. So, say, the get-record routine was at offset 0 (or maybe a pointer? Or maybe enough room to just jump to where the routine was held), set-record at offset 10, etc.

Similarly the linked list routines are just one step away from being a dynamically typed system -- if you put the struct describing offsets in the original struct itself then you'd have a type.
Joseph said…
This is off topic, but that wouldn't happen to be the Scott Meyers of Effective C++ fame would it?
mike barton said…
Huh... I've done similar things in the past, but my generic list functions mostly take a next pointer offset, which is computed with a macro. Something like:

#define NEXTOFFSET(list) ((unsigned long)((list)->next) - (unsigned long)(list))

list = list_sort(list, NEXTOFFSET(list), comparison_function);

I guess building a framework to organize that isn't too bad an idea.
Dave Kirby said…
The entire Amiga operating system was built around a linked list structure - every object started with the linked list struct, and the OS provided functions for navigating round them, even from assembler. Everything else was referenced by offsets to one of these structs - by convention data was at positive offsets and pointers to functions were at negative offsets. This enabled consistent object oriented programming, even from assembler.

In many ways the Amiga was way ahead of its time.
Kelly Yancey said…
Hey JJ, just wanted to add that I've seen versions of the BSD queue.h macros that used offsets from a known base rather than pointers for the next/previous links. The advantage is that you can then put the linked objects in a shared memory segment (the segment may have a different base address in each process).

There was a proposal to add them to FreeBSD back in 2002 which includes the source code.
jjinux said…
Ian, thanks for your comment.
jjinux said…
> This is off topic, but that wouldn't happen to be the Scott Meyers of Effective C++ fame would it?

Embarrassingly, I don't know. I don't know him all that well. I do know he's a brilliant hacker, and he's the manager of the graph database team at Metaweb.
jjinux said…
Wow, thanks for the great comments, everyone! I was taking a risk posting something outside my realm of experience, but, man, what interesting responses!
Anonymous said…
Yet another case of "why I didn't I think of it before?". A very simple approach to take advantage of obvious commonalities in existing code. .

I thnk that the Scott Meyers of Effective C++ is into training and consulting
jjinux said…
> I thnk that the Scott Meyers of Effective C++ is into training and consulting

Ah, not the same guy.
Anonymous said…
Interesting that you mention macros, and passing offsets... However, if you consistently name the next/prev pointers, you can combine the two into a macro that calls the underlying function with the offsets for the specific struct. For example:

#define LINKSHOW(object) linkshow(((void *) - ((void *) object), \
((void *) object.prev) - ((void *) object), object)

so you can just call "LINKSHOW(object)" which then calls linkshow(nextOffset, prevOffset, object)

The magic of macros effectively implement C++-like code templates.

jjinux said…
> #define LINKSHOW(object) linkshow(((void *) - ((void *) object), \
((void *) object.prev) - ((void *) object), object)

Very nice ;)
George Reilly said…
Windows, internally, has used the same idea for many years. Look at CONTAINING_RECORD and LIST_ENTRY, which are defined in ntdef.h. LIST_ENTRYs are used heavily in the Windows source code, though it's only exposed to general developers in driver code.

Popular posts from this blog

Ubuntu 20.04 on a 2015 15" MacBook Pro

I decided to give Ubuntu 20.04 a try on my 2015 15" MacBook Pro. I didn't actually install it; I just live booted from a USB thumb drive which was enough to try out everything I wanted. In summary, it's not perfect, and issues with my camera would prevent me from switching, but given the right hardware, I think it's a really viable option. The first thing I wanted to try was what would happen if I plugged in a non-HiDPI screen given that my laptop has a HiDPI screen. Without sub-pixel scaling, whatever scale rate I picked for one screen would apply to the other. However, once I turned on sub-pixel scaling, I was able to pick different scale rates for the internal and external displays. That looked ok. I tried plugging in and unplugging multiple times, and it didn't crash. I doubt it'd work with my Thunderbolt display at work, but it worked fine for my HDMI displays at home. I even plugged it into my TV, and it stuck to the 100% scaling I picked for the othe

ERNOS: Erlang Networked Operating System

I've been reading Dreaming in Code lately, and I really like it. If you're not a dreamer, you may safely skip the rest of this post ;) In Chapter 10, "Engineers and Artists", Alan Kay, John Backus, and Jaron Lanier really got me thinking. I've also been thinking a lot about Minix 3 , Erlang , and the original Lisp machine . The ideas are beginning to synthesize into something cohesive--more than just the sum of their parts. Now, I'm sure that many of these ideas have already been envisioned within , LLVM , Microsoft's Singularity project, or in some other place that I haven't managed to discover or fully read, but I'm going to blog them anyway. Rather than wax philosophical, let me just dump out some ideas: Start with Minix 3. It's a new microkernel, and it's meant for real use, unlike the original Minix. "This new OS is extremely small, with the part that runs in kernel mode under 4000 lines of executable code.&quo

Haskell or Erlang?

I've coded in both Erlang and Haskell. Erlang is practical, efficient, and useful. It's got a wonderful niche in the distributed world, and it has some real success stories such as CouchDB and Haskell is elegant and beautiful. It's been successful in various programming language competitions. I have some experience in both, but I'm thinking it's time to really commit to learning one of them on a professional level. They both have good books out now, and it's probably time I read one of those books cover to cover. My question is which? Back in 2000, Perl had established a real niche for systems administration, CGI, and text processing. The syntax wasn't exactly beautiful (unless you're into that sort of thing), but it was popular and mature. Python hadn't really become popular, nor did it really have a strong niche (at least as far as I could see). I went with Python because of its elegance, but since then, I've coded both p