Why you should use talloc for your next project

Memory management is hard. This is one of the first things a programmer learns (usually by trial and much error) when they leave academia and get out into the real world. It is very easy to make mistakes when managing memory, especially when a particular piece of data needs to live beyond the life of the function that created it. It can become difficult to know when the memory is safe to destroy, as well as when it is optimal to destroy it.

In standard C, a programmer would use malloc() and free() to manage their memory. The problem with this is that every section of memory is allocated independently. There are no inherent relationships between bits of data. The programmer is required to maintain any relationships between data in their own code.

Enter talloc, which is a hierarchical memory-management tool wrapped around C’s malloc(). The basics of talloc are easy to pick up. With talloc, you have the option of declaring that the memory you are allocating is a child of another piece of memory. The advantage to this approach is that calling talloc_free() on any piece of talloc-allocated memory will not only delete that memory, but will recursively descend through any children of that memory and free them first.

To provide a trivial example, consider that you wanted to create a new struct containing student data:

struct student {
   char *name;
}

In a traditional C approach, you would allocate memory for a new student in this manner:

student1 = malloc(sizeof(struct student));
student1->name = strdup("steve");

and would sometime later be freed with:

free(student1->name);
free(student1);

That works fine in the trivial case, but start considering what happens when you have much more complicated data structures. It becomes a challenge to ensure that you free all memory in the proper order so as to ensure that you don’t leave any dangling memory behind. Traditionally, this would be done by creating a cleanup function for your structure. Internally, this cleanup function would recursively call the cleanup functions for every subordinate structure, until finally it removed the toplevel memory.

The problem with this approach is that it requires the creation and maintenance of large numbers of cleanup functions.

The same problem with talloc is markedly simpler.

student1 = talloc(NULL, struct student);
student1->name = talloc_strdup(student1, "Steve");

Later, the struct can be freed with the single command:

talloc_free(student1);

Now, in the trivial case this doesn’t look terribly impressive, but try considering when you have nested structs, structs containing large numbers of strings, etc.  talloc_free(<toplevel>) will recursively clean up all of the child memory. No need to write complicated cleanup scripts to ensure that the memory is all gone.

Furthermore, talloc makes it very easy to abort the changes in a function. For example, partway through a complicated function, a fatal error occurs. In a traditional model, one would now need to examine all the memory that has been allocated thus far in the function and free it. A cleanup function may not be of any help here, as it would expect a fully-constructed structure to remove. With talloc, you simply need to delete the parent context and you’ll be certain to know that it will be completely cleaned up, regardless of its partially-constructed state.

So lets talk about more advanced and useful applications of talloc. Consider the case of asynchronous services. A request comes in (on a pipe, a TCP connection, etc.) requesting some information. Assuming that the service is unable to return a reply without performing additional functions (for example, contacting a remote server for authoritative data), the program would allocate memory to hold the data provided for the request, and then queue it up internally, to be processed when resources allow.

This request might require multiple trips to and from a remote server, it might require memory allocation and deallocation in many places, and it could fail with an error or be cancelled if the requesting process disconnects or otherwise indicates that it no longer cares about the reply.

So now we have a new concept: requests. With talloc, the way one would handle a request would be to create a request context. This request context would be a structure containing all of the data necessary to execute the event. As the event is processed by the mainloop, it may have additional subrequests (such as the example remote server query) attached as children to it. If at any time the request needs to be terminated, such as the original client has disconnected, all that is needed is to call talloc_free() on the original request and it will iterate through all of the allocated memory and clean up after itself.

Now, one thing I’ve glossed over is the case where just freeing the memory might not be enough. In the case of a request, before freeing memory it might be necessary to send a disconnect command to a remote server, or close a file descriptor. Talloc makes it easy to add a destructor to any allocated memory, such that when talloc_free() is called, it will first invoke this destructor and allow cleanup to commence. So in the case described above, one might add a destructor to the remote server query sub-request that would terminate the server connection in a non-destructive manner (or cancel a transaction but leave the connection in place, etc.)

By now, I think you begin to see the power inherent in the use of talloc over malloc. It’s five O’clock – do you know where your memory is?

How NOT to run a community

As you probably know, I am generally in favor of community-driven software development. I think being able to work alongside others of similar (or different!) goals can result in excellent progress in many different directions. It’s a great boon to development to not be forced to reinvent the wheel in order to move forward.

However, sometimes the naysayers have it right. There are times when, no matter how much you try to be a good citizen of a community, they just won’t let you.

I’ve been working for some time now on molding the fantastic Review Board software into a deployment for the Fedora Hosted infrastructure. Today, I was doing some testing on the upgrade feature, to make sure we wouldn’t get bitten in the future. Well, I’m glad I did, because it didn’t work.

After a bit of intense Google-searching, I finally happened upon the source of the problem: django_evolution has a long-standing (years) issue when used with PostgreSQL. That bug report, however, has a link to a patch that one intrepid user constructed as a means to work around the problem. I tested it myself and found that it worked. However, this is where we begin our cautionary tale.

Mistake number 1) Offhanded disregard for a community-submitted patch. The response from the upstream maintainers for this godsend of a patch was less than helpful. “Why did you copy the code from here instead of trying to make a common change?” and “Your patch breaks our tests. Go fix it.” (paraphrased). These are not friendly responses to an obviously helpful individual.

Since the discussion thread on that bug pretty much ended there, I decided to try myself to pick up where the original author left off. I downloaded the patch and modified it so that it would apply cleanly on the HEAD of the django_evolution repository. I tried it out on ReviewBoard, and miracle of miracles: the upgrade completed successfully.

So, armed with the knowledge that I now have a working solution to the problem, I decided to see what I could do to massage the patch into a format that would be accepted by upstream (given their unhelpful replies). So I dug into the source code… and discovered that I couldn’t figure out how to run this much-vaunted test suite. So I found my way to the Django IRC channel and started to ask questions about how to set up django_evolution to test my patch.

Mistake number 2) The denizens of that channel were… less than helpful. In the first place, I was berated for attempting to write a patch for “a dead project”. They paid no attention to my assertions that django_evolution worked just fine for ReviewBoard, and I just needed to solve this one little problem to grease the wheels and start the ball rolling again. They continuously insisted that I switch the project over to use a project called South and give up on django_evolution. Now, while I certainly understand the desire to always be using the Next Big Thing, I’m not actually a developer on the ReviewBoard project. I in fact have very little say about the architectural direction that the project takes. I certainly have no control over the use of django_evolution. These reasoned arguments were ignored, opting instead to extol the virtues of South and why it will work better and cure cancer in the process. (I exaggerated that last part).

Now, this is the behavior shown to an interested participant in their community. Moreso, it was a person who was trying very hard to improve upon a project, and was seeking only enough aid to simplify any review that might need to be done before accepting the patch. If this is how we treat those who are interested in the work we do, is it any surprise at all when our project fails? Why should we expect anyone who isn’t already intimately familiar with our work to offer even a second glance?

A community needs to be run with an understanding that not every member is going to be a lifelong hacker with three advanced degrees that are all directly applicable to the project. A community needs to be welcoming and understanding. A community needs to be willing to mentor and market itself in a positive light.

A community needs to be communal.