When it comes to software development we always want to ensure our code never uses excessive memory when a function finishes its work. That’s why some of the language features like garbage collection (GC) were implemented so that memory leaks do not occur. Also as a beginner software engineer, who just learning to code in Golang uses built-in maps data structure for faster access to data. Sometimes even seniors may miss the drawbacks of excessively using those data structures without thinking. Sometimes those data structures may lead to memory leaks in the code base. Today we’re going to talk about the cons of the map’s data structure if anyone without thinking uses it excessively it may lead to a memory leak.
The post is taken from a book called
100 Go Mistakes and How to Avoid Them
If anyone wants to learn more about this mistake and others please read the book. I learned a lot from the book.
A map is a powerful, ingenious, and versatile data structure. Golang Maps is a collection of unordered pairs of key-value. It is widely used because it provides fast lookups and values that can retrieve, updated, or delete with the help of keys.
Let’s create a simple map in Golang
package main import "fmt" func main() { // Creating and initializing empty ma using `var` keyword var nilMapContainer map[int]int // Checking if the map is nil or not if nilMapContainer == nil { fmt.Println("True") } else { fmt.Println("False") } // Note: `nilMapContainer` will throw panic if we try to assign value to an uninitialized key // Using shorthand declaration animes := map[int]string{ 1: "One Piece", 2: "Naruto", 3: "Link Click", 4: "Death Parade", 5: "Tomodachi Game", } fmt.Println("animes ", animes) }
Also, this data structure references a hash table. So data lookups from this table are fast using the key.
But
There’s a huge drawback if we use a map without thinking. So let’s talk about that…
In memory, If we put a new key value in a Map it takes some memory space. Interestingly this memory space never shrinks in size, it always grows in the memory as a result. Because of this, it may lead to memory problems.
When working with the map data structure in Golang, we need to know some of the important characteristics of this data structure, how it grows and how much it shrinks.
Although, It’s proposed that in the future version of Go, the map implementation will be changed so that it can shrink too. Anyone can check the issue on Github.
Let’s deep dive into this to prevent issues that can cause memory leaks.
First, to view a concrete example of this problem, let’s design a scenario where we will work with the following map:
m := make(map[int][128]byte)
Each value of m
is an array of 128 bytes. We will do the following:
- Allocate an empty map.
- Add 1 million elements.
- Remove all the elements, and run a Garbage Collection (GC).
After each step, we want to print the size of the heap (using a printAllocatedMemory
utility function). This shows us how this example behaves in memory:
package main import ( "fmt" "runtime" ) func main() { n := 1_000_000 m := make(map[int][128]byte) printAllocatedMemory() for i := 0; i < n; i++ { // Adds 1 million elements m[i] = [128]byte{} } printAllocatedMemory() for i := 0; i < n; i++ { // Deletes 1 million elements delete(m, i) } runtime.GC() // Triggers a manual GC printAllocatedMemory() runtime.KeepAlive(m) // Keeps a reference to m so that the map isn’t collected } func printAllocatedMemory() { var m runtime.MemStats runtime.ReadMemStats(&m) fmt.Printf("%d MB\n", (m.Alloc/1024)/1024) }
We allocate an empty map, add 1 million elements, remove 1 million elements, and then run a GC. We also make sure to keep a reference to the map using runtime.KeepAlive
so that the map isn’t collected by the GC and removed it. In simple terms, KeepAlive
will not be collected by the Garbage Collector when all the key-value is removed from the variable m
. Let’s run this example:
Steps | Memory |
After m is allocated | 0 MB |
After we add 1 million elements | 461 MB |
After we remove 1 million elements | 293 MB |
After observing the result above, we can say that the heap size is initially very small. Once 1 million additional elements have been added to the map, it then expands dramatically. But if we had anticipated that the heap size would shrink after eliminating all the values, Which is not the case here, because Go maps don’t operate in this way. In the end, the heap size is still 293 MB even if the GC has collected all of the components. Although we say it doesn’t shrink at the beginning of the post which is not true, right? but in the end, even though the GC has collected all the elements, the heap size is still 293 MB. The memory shrank as a result, but not in the way we might have anticipated. Why is this happening, exactly? We must thoroughly examine how Go’s map data structure operates.
A map offers an unordered collection of key-value pairs where each key is unique. A map in Go is based on the hash table data structure, which is an array whose elements each point to a collection of key-value pairs. As shown in figure 1.
Figure 1 – Hash Table In Golang
In Golang, The bucket size is fixed and the max limit is eight elements per bucket. So in case of inserting a new element into an overflowed bucket, Go assign a new bucket for the new element and links the new bucket to the previous one. Figure 2 shows an example:
Figure 2 – Bucket overflow, link new bucket to previous one.
Behind the scene, a Go map is a pointer to a runtime.hmap
struct. This struct contains multiple fields which including a B
field, B
represents the number of buckets in a map:
After adding 1 million elements to the map, the value of B
which corresponds to the bucket is 18, which means 2¹⁸ = 262,144 buckets assign to the map. But when we remove the 1 million elements from the map, what happen to the number of buckets? As the number of the bucket in a map cannot shrink therefore removing elements from the map doesn’t reduce the number of buckets. it just zeros the slots in the buckets.
In our example, we went from 461 MB to 293 MB because the elements were collected by the garbage collector, but running the GC didn’t impact the map itself. Even the number of extra buckets (the buckets created because of overflows) remains the same.
Let’s take a step back and discuss when the fact that a map cannot shrink can be a problem. Imagine building a cache using a map[int][128]byte
. This map holds per customer ID (the int), a sequence of 128 bytes. Now, suppose we want to save the last 1,000 customers. The map size will remain constant, so we shouldn’t worry about the fact that a map cannot shrink.
However, let’s say we want to store one hour of data. Meanwhile, our company has decided to have a big promotion for the Eid festival: in one hour, we may have millions of customers connected to our system. But a few days after the Eid festival, our map will contain the same number of buckets as during the peak time. This explains why we can experience high memory consumption that doesn’t significantly decrease in such a scenario. This is a big problem, right? So we want to clean this amount of memory without manually restarting our service, right?
Solution
#1
We can make copies of the current map at regular intervals. We could create a new map, clone all the components, and release it every hour, as an example. The biggest disadvantage of this method is that for a little period after the copy and up to the subsequent garbage collection, we might need twice as much RAM as we do right now.
#2
We may also alter the map type to map[int]*[128]byte
to hold an array pointer. Although it doesn’t address the issue of how many buckets we will have, each bucket entry will reserve a pointer size for the item rather than 128 bytes (8 bytes on 64-bit systems and 4 bytes on 32-bit systems).
Let’s compare the memory usage for each map type after each step, going back to the initial scenario. The comparison is presented in the following table.
Steps | map[int][128]byte |
map[int]*[128]byte |
Allocate an empty map | 0 MB | 0 MB |
Add 1 million elements | 461 MB | 182 MB |
Remove all the elements and run a GC | 293 MB | 38 MB |
As we can see, a map[int]*[128]byte
type requires substantially less memory after all the elements have been removed. Additionally, because of several optimizations to lower memory consumption, the amount of RAM required in this scenario is less significant during peak times.
NOTE: If a key or a value is over 128 bytes, Go won’t store it directly in the map bucket. Instead, Go stores a pointer to reference the key or the value.
If you like, you can read the same article on my [ Personal blog]
You can read my other blog-posts [Here]
Conclusion
We can say, Go never removes the buckets from memory even after removing all the elements from the map. So we need to be careful when using a map as it only grows in memory as a result. To reduce the high memory consumption, we need to use different techniques like re-create the map forcefully or using pointers so that memory consumption can be optimized.