Saturday, May 3, 2014

Glibc malloc in linux

How does glibc implement malloc? I guess as many of you know, in traditional nix systems, it is implemented using brk() and sbrk() system calls. Basically these system calls adjust the heap pointer of process (or rather break pointer) to new location as per desired size. However in linux, glibc employs a different methodology in implementing malloc functions. For "x" number of bytes, the glibc allocates via brk()/sbrk() system calls while anything equal or above "x", mmap system call is used. Currently this threshold is 128k. Now what is mmap(). In the case of mmap(), the process address space is not adjusted rather the kernel allocates memory from its page cache and attaches the Virtual Memory Address to the process address space. Hence the heap segment is untouched and stack can grow freely upwards teasing heap :D.

Alright, can we have plausible example? Here it is :-). The first one is with 127k and latter is 128k. Follow the explanations after example code. Both of the code have sleep() function to be in wait state for some time. This enables me to see process maps. To see how malloc() behaves, one needs to run system call trace tool (strace) to understand the flow for different sizes.

1) 127k

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define BUF_SIZE (127 * 1024)

int main()
{
    char *temp = NULL;

    temp = (char*)(malloc(BUF_SIZE));
    strcpy(temp, "LINUX-TUX :-)");
    sleep(100);
}


compile with gcc and run the resulting executable with strace
>>strace ./a.out

I have stripped most of the portions especially the memory mappings of shared libraries. The actual snippet looks like

<<snip>>
brk(0)                                  = 0xb1d000
brk(0xb5d000)                           = 0xb5d000
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
<<snip>>


As you can see malloc() implementation uses brk() system calls to increase the data segment width. Initially it passes 0 to find current data segment location and later adds 127k to inform the OS to adjust data segment to new location.

How to verify? Most unix systems list memory mapings in /proc/<pid>/maps. Lets do the same for our case. As I mentioned earlier, the sleep serves the purpose of grabbing memory map or else the proc entry of the process gets flushed out.

In one more terminal, obtain process-id of a.out

>>pgrep a.out
14565

>>cat /proc/14565/maps
00400000-00401000 r-xp 00000000 08:0a 1708078                            /home/nandakumar/a.out
00600000-00601000 r--p 00000000 08:0a 1708078                            /home/nandakumar/a.out
00601000-00602000 rw-p 00001000 08:0a 1708078                            /home/nandakumar/a.out
00b1d000-00b5d000 rw-p 00000000 00:00 0                                  [heap]
7f5fe1ca0000-7f5fe1e5d000 r-xp 00000000 08:0a 658188                     /lib/x86_64-linux-gnu/libc-2.17.so
7f5fe1e5d000-7f5fe205d000 ---p 001bd000 08:0a 658188                     /lib/x86_64-linux-gnu/libc-2.17.so
7f5fe205d000-7f5fe2061000 r--p 001bd000 08:0a 658188                     /lib/x86_64-linux-gnu/libc-2.17.so
7f5fe2061000-7f5fe2063000 rw-p 001c1000 08:0a 658188                     /lib/x86_64-linux-gnu/libc-2.17.so
7f5fe2063000-7f5fe2068000 rw-p 00000000 00:00 0
7f5fe2068000-7f5fe208b000 r-xp 00000000 08:0a 658164                     /lib/x86_64-linux-gnu/ld-2.17.so
7f5fe2268000-7f5fe226b000 rw-p 00000000 00:00 0
7f5fe2288000-7f5fe228a000 rw-p 00000000 00:00 0
7f5fe228a000-7f5fe228b000 r--p 00022000 08:0a 658164                     /lib/x86_64-linux-gnu/ld-2.17.so
7f5fe228b000-7f5fe228d000 rw-p 00023000 08:0a 658164                     /lib/x86_64-linux-gnu/ld-2.17.so
7ffffa977000-7ffffa998000 rw-p 00000000 00:00 0                          [stack]
7ffffa99d000-7ffffa99f000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]


Here we go :D. The new break pointer obtained from system call matches with proc entry [and obviously should match ;-)]. The length if you have calculate is [00b1d000-00b5d000]= 256K

Wait a minute! How come so much? That's how malloc does. It allocates more than what is required and manages later requests internally. This avoids frequent system call overheads and also fragmentation to certain extent

If you are curious to know the logic behind the calculation, please dissect sysmalloc() function in glibc/malloc/malloc.c {Home work :-)}

2) 128k (mmap). Here is next example!

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define BUF_SIZE (128 * 1024)

int main()
{
    char *temp = NULL;

    temp = (char*)(malloc(BUF_SIZE));
    strcpy(temp, "LINUX-TUX :-)");
    sleep(100);
}

Snipping unnecessary parts:

<<snip>>
mmap(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbd51e67000
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
<<snip>>


Now lets look into maps

>>pgrep a.out
15060

>>cat /proc/15060/maps
00400000-00401000 r-xp 00000000 08:0a 1708078                            /home/nandakumar/a.out
00600000-00601000 r--p 00000000 08:0a 1708078                            /home/nandakumar/a.out
00601000-00602000 rw-p 00001000 08:0a 1708078                            /home/nandakumar/a.out
7fbd518c0000-7fbd51a7d000 r-xp 00000000 08:0a 658188                     /lib/x86_64-linux-gnu/libc-2.17.so
7fbd51a7d000-7fbd51c7d000 ---p 001bd000 08:0a 658188                     /lib/x86_64-linux-gnu/libc-2.17.so
7fbd51c7d000-7fbd51c81000 r--p 001bd000 08:0a 658188                     /lib/x86_64-linux-gnu/libc-2.17.so
7fbd51c81000-7fbd51c83000 rw-p 001c1000 08:0a 658188                     /lib/x86_64-linux-gnu/libc-2.17.so
7fbd51c83000-7fbd51c88000 rw-p 00000000 00:00 0
7fbd51c88000-7fbd51cab000 r-xp 00000000 08:0a 658164                     /lib/x86_64-linux-gnu/ld-2.17.so
7fbd51e67000-7fbd51e8b000 rw-p 00000000 00:00 0
7fbd51ea8000-7fbd51eaa000 rw-p 00000000 00:00 0
7fbd51eaa000-7fbd51eab000 r--p 00022000 08:0a 658164                     /lib/x86_64-linux-gnu/ld-2.17.so
7fbd51eab000-7fbd51ead000 rw-p 00023000 08:0a 658164                     /lib/x86_64-linux-gnu/ld-2.17.so
7fff48a97000-7fff48ab8000 rw-p 00000000 00:00 0                          [stack]
7fff48bfe000-7fff48c00000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]


If you glance at the process maps, the address returned by mmap() 0x7fbd51e67000 is attached to process address space with size of 7fbd51e67000-7fbd51e8b000 which is 144k. The kernel for mmap returns memory which is aligned by pagesize. If you look at mmap system call, the number supplied by glibc is 128k+4k. As said earlier, it may be for some house keeping stuffs. Later kernel allocates with multiples of pagesize. The extra memory returned by kernel is zeroed out and useless which means even if you write at that location, it will be still zero (citation required). Also note that there is no heap segment now!

The memory allocation is complex matter and one needs to dwell into glibc code if you wish to know more. The mmap() system call is quite complex implementation in linux kernel :-). If you wish to know why malloc follows this mechanism, please read Linux System Programming by Robert Love which provides excellent insight on this topic.

As mentioned, memory topics are quite confusing and complicated. If you have feedback or corrections, please feel free to leave a comment. It also helps me to improve.

References:


1) Linux system programming by Robert Love

2) glibc malloc code (just peeped into)

3) If you wish, you can check linux kernel mmap code [mm/mmap.c -- SYSCALL_DEFINE6(mmap_pgoff, ...)]. To give some heads-up, the code is quite complicated and requires to have good knowledge of Linux Kernel MM subsystem

No comments:

Post a Comment