24 Oct 2024

The stack pattern

Knowing how to skip the heap is a valuable tool in a C-programmers toolbox

Let me describe to you a problem. How can you program entirely without using the heap? The heap is one of the dynamic parts of a programs memory. It is where we can store data that we did not know we needed beforehand across different functions.

We can allocate to the heap (in C) like so:

int *intPtr = malloc(sizeof(int));

When we do not need that memory anymore we might free it:

free(intPtr);

Even though using the heap seems straightforward there are multiple potential pitfalls when using the heap.

Allocating and freeing memory is slow from a computing perspective.
We have to be our own garbage collector and know specifically when to free what we're not using anymore. If we fail at this we can at best have a memory leak, and at worst free memory that is already freed. This would leave us open to the using after free vulnerability.
We might not actually have access to a heap, like in a constrained embedded setting.

In fact, the way C uses a heap is seen as so dangerous, that many languages have been built specifically to improve this non-garbage-collected headache, like Rust and Zig. Because of the difficulty in using heap memory NASA actually avoids using a heap. But alas C is so dominant that we most likely will need to know how to handle the heap properly.

So, how can we be careful with our heap usage?

We first need to look at how we allocate on the stack, another dynamic part of memory.

Stack

Within a function we can allocate space for any variable on the stack by declaring it:

int var;

Later on we can place a value within that variable. The integer in this case will only live as long as the function it was created in is running, unless we return its actual value.

If we have a function and we want to return something from that function that was allocated on the stack we may do so freely.

typedef struct S2 {
    int c;
    int d;
} S2;

typedef struct S {
    int a;
    int b;
    S2 innerStruct;
} S;

S func(void) {
    S s; // Allocating on the stack
    S2 inner;
    s.a = 1;
    s.b = 2;
    s.innerStruct.c = 3;
    s.innerStruct.d = 4;
    return s;
}

int main(void) {
    S myStruct = func();
    return 0;
}

This is a valid C-program, and there will be no segmentation faults, nor will any fields in the inner or the outer struct be overwritten. That data is copied from the stack when we return the struct.

This is fine for small datastructures, but we may suffer later on, returning and passing whole structs, and all allocated nested structs. Copying and moving data like this becomes quite the bottleneck when it happens often. To circumvent this we might want to return a pointer to a struct instead, and we might want that struct to have a pointer to its inner structs.

This is where the potential for segmentation faults lies, another dangerous memory situation we want to avoid.

Because we are pointing to something on the stack, we know that returning any such pointers (memory address) will return pointers to memory that will be invalid after the function returns.

This is usually where we would reach for a heap allocation or two.

Heap

A heap allocation version using pointers would look like this:

typedef struct S2 {
    int c;
    int d;
} S2;

typedef struct S {
    int a;
    int b;
    S2 *innerStruct;
} S;

S* func(void) {
    S* s = malloc(sizeof(S)); // Allocating on the heap
    S2* inner = malloc(sizeof(S2));
    s->innerStruct = inner;
    s->a = 1;
    s->b = 2;
    s->innerStruct->c = 3;
    s->innerStruct->d = 4;
    return s;
}

int main(void) {
    S* myStruct = func();

    free(myStruct->inner);
    free(myStruct);
    return 0;
}

There is more code, and we, ourselves, have to keep track of the memory we're allocating and freeing, instead of letting the function context do it for us. This leaves us with all the pitfalls described earlier.

But! There is a commonly known trick to bypass the heap.

The stack pointer pattern

Instead of letting the function handle the allocation, we move the allocation up the calling stack to the same function we would free the memory in. In this case it would be the "main" function. We now need to, instead of returning the pointer, pass the pointer as an argument to the function.

The stack version using pointers will then look like this:

// Skipping the struct declarations that remain the same from the heap version

void func(S* s, S2* inner) {
    s->innerStruct = inner;
    s->a = 1;
    s->b = 2;
    s->innerStruct->c = 3;
    s->innerStruct->d = 4;
}

int main(void) {
    S myStruct;
    S2 inner;
    func(&myStruct, &inner);

    return 0;
}

This design lets the function context worry about freeing memory, and it makes our code more performant then the heap equivalent. We can also be sure that the memory is not freed when we're using it in the top most function where it is allocated.

There are some downsides to this approach.

We have to allocate both the outer and inner struct in our main function. We leak our function abstraction, because the outer function needs to have information about a lot of the internals of the function.
We have to pass both addresses to our function. The more we have to do this, the more arguments we get in a function. This could be circumvented by creating a context type struct and initializing all internal struct pointers before we pass it into a function.

An observation we could make with this particular program is that we know that we need the memory for the struct for the lifetime of the program. We could use static memory.

Static memory


S myStruct;
S2 inner;

void func(void) {
    s.innerStruct = inner;
    s.a = 1;
    s.b = 2;
    s.innerStruct.c = 3;
    s.innerStruct.d = 4;
}

int main(void) {
    func();
    return 0;
}

This is also commonly called the global memory segment. Global variables should of course be handled with care. Therefore this should be limited to data that will persist througout a programs lifetime, like it did in this case. If we allocated memory on the heap and never freed it, it would, like global memory, most likely be freed at the end of our program.

Concluding from these approaches, its important to note that each type of memory has its specific use.

Heap memory is more flexible, and is used extensively in garbage collected languages precisely because it is so flexible and allows for better abstractions.
Stack memory lets us make our code more efficient and safe, at the cost of leaky abstractions. Its important to note that handled incorrectly, we could get segmentation faults if we return addresses of stack allocated memory.

Conclusion

I have in this article explored the possible alternatives to heap allocation. Knowing the consequences and special consideration one needs to make when allocating to the heap and that NASA strongly discourages the use of the heap, knowing how to program in a way that does not need to use the heap is an important tool in a C-programmers toolbox.

Using the heap sparingly is probably the safest approach to C. Handling the heap allocations with care and consideration.