7 Key Insights on Stack vs Heap Allocations in Go

Introduction

Go programmers constantly seek ways to make their applications faster. In recent releases, a major focus has been on reducing heap allocations—a common performance bottleneck. Each heap allocation triggers a significant amount of runtime code and adds pressure on the garbage collector. Even with improvements like the Green Tea collector, overhead remains. The alternative, stack allocation, is often much cheaper and avoids GC entirely. This article explores seven critical points about stack and heap allocations, using a concrete slice-building example to illustrate the hidden costs and practical optimizations.

7 Key Insights on Stack vs Heap Allocations in Go — Source: blog.golang.org

1. Why Heap Allocations Slow Down Your Go Program

Every time your Go code requests memory from the heap, a chain of operations fires. The runtime must find a suitable block, update internal data structures, and often handle synchronization across goroutines. This process is far from free—it involves function calls, locking, and potential system calls. Even with a fast allocator, the overhead adds up quickly in tight loops. Moreover, heap-allocated objects survive beyond the scope they were created in, complicating memory management. In contrast, stack allocations are essentially ‘free’ because they just adjust the stack pointer. Understanding this fundamental difference is the first step to writing faster Go programs.

2. The Garbage Collector: A Necessary Overhead

Heap allocations place a direct burden on the garbage collector (GC). Every allocated object must be tracked, and when it becomes unreachable, the GC must reclaim it. Even with concurrent collectors like Go’s, this work steals CPU cycles from your application. The Green Tea enhancements reduced pauses, but did not eliminate the total CPU cost. Stack allocations, on the other hand, are automatically freed when the function returns—no GC involvement needed. The less pressure you put on the GC, the smoother your application runs, especially under high load.

3. Stack Allocations: The Fast and Frugal Path

Stack allocations are considerably cheaper to perform, sometimes completely free if the compiler can inline or reuse space. They present zero load to the garbage collector because the entire stack frame is reclaimed atomically when the function exits. Furthermore, stack data enjoys excellent cache locality—it is contiguous and often accessed soon after allocation. This makes stack-allocated data very cache friendly. For hot paths, moving allocations from heap to stack can yield dramatic speedups and reduce memory fragmentation.

4. The Slice Append Dilemma

Consider the common pattern of building a slice by appending from a channel:

func process(c chan task) {
    var tasks []task
    for t := range c {
        tasks = append(tasks, t)
    }
    processAll(tasks)
}

At first glance, it seems simple. But behind the scenes, append triggers multiple heap allocations as the slice grows. On the first iteration, a backing array of size 1 is allocated. When full, a new array of size 2 is allocated, copying the old elements. Then size 4, 8, and so on—doubling each time. This classic exponential growth strategy works well for large slices, but the startup phase produces many tiny allocations that become garbage immediately.

5. Startup Phase Waste: Many Small Allocations

The startup phase—when the slice is small—is where most of the overhead lurks. Many of these small backing arrays end up as garbage after each append triggers a reallocation. For example, after the third iteration, the array of size 2 becomes garbage. This not only consumes allocation cycles but also forces the GC to eventually trace and free these short-lived objects. If your loop processes only a few items, the proportional cost is huge. That seemingly innocuous slice building can dominate the profile of a hot code path. Recognizing this waste is crucial to optimizing real-world Go programs.

6. Pre-Allocating Slices for Better Performance

A straightforward fix is to pre-allocate the slice with a reasonable capacity using make. If you know the approximate number of tasks, you can avoid the repeated reallocation overhead entirely:

tasks := make([]task, 0, expectedLen)

Even if you don’t know the exact count, choosing a generous initial capacity—like 100 or 1000—can dramatically reduce the number of allocations during the startup phase. The slice will still grow if needed, but the frequency of reallocation drops. In many cases, this simple change yields a 2x or more speedup in tight loops. Combined with stack allocation analysis by the compiler, this pattern lets you keep data on the stack when the capacity is constant or small.

7. Leveraging Stack Allocation for Hot Paths

Go’s compiler uses escape analysis to decide whether an allocation can be placed on the stack. When a variable’s address does not escape the function, the compiler often allocates it on the stack. You can help the compiler by avoiding pointer sharing and using value receivers instead of pointer receivers where appropriate. For constant-sized slices (e.g., make([]T, 5)) that do not escape, the entire backing array can be stack-allocated. This is the ultimate win: zero heap allocation, zero GC pressure. Profile your hot paths, look for slice growth in tight loops, and consider restructuring code to enable stack allocation.

Conclusion

Heap allocations are a major source of performance degradation in Go programs, but with targeted effort you can shift many allocations to the stack. By understanding the mechanics of slice growth, pre-allocating buffers, and guiding the compiler’s escape analysis, you can reduce GC overhead and improve cache usage. The result is faster, more predictable code. Start by profiling your real-world workloads and applying these seven insights—your application will thank you.

Tags: