back

ELF loader

Binary files in linux are usually in the ELF file format. You can check if a file is in the ELF format by running:
       
file /bin/ls

/bin/ls: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), 
dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, 
BuildID[sha1]=c50003031a2ce019c50810f8abdaefc4d44f9e52, 
for GNU/Linux 4.4.0, stripped
      
    
What we can see here is that its ELF 64-bit format and that it is dynamically linked. Dynamically linked means that it (opposed to statically linked) has dynamic libraries that get linked at the time when the program is loaded into memory. In linux these library files are called .so files, in Windows many know them as DLLs. To know which libraries are being loaded we can run:

ldd /bin/ls

linux-vdso.so.1 (0x00007ffe02bf7000)
libcap.so.2 => /usr/lib/libcap.so.2 (0x00007f5016593000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007f50163c7000)
/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f50165ef000)
    

Cheat sheet

readelf -SW binary                   # show headers
readelf -x .text binary              # show text segment hex
objdump -j .text -d -M intel binary  # show text segment asm
gdb --args binary                    # debug binary
xxd                                  # make a hexdump
ldd                                  # print shared libraries (.so)
strings                              # print printables in binary
cat /proc/1/maps                     # show memory mapping of binary
readelf -lW binary                   # show program headers
We can see a linux-vdso.so.1 library which is something special as it is put into a programs memory by the kernel on execution. It helps to cache syscalls. More on that can be read here: VDSO

libcap.so and libc.so are libraries included by the program, whereas ld-linux-x86-64.so.2 is the dynamic loader that actually loads the program into memory. This loader is part of the glibc library and its source can be found here

So we found our linker. But how does ldd know that information? Easy, it is stored in the binary in the .interp section. To view the sections of a program we can run:


readelf -x .interp /bin/ls


Hex dump of section '.interp':
  0x00000318 2f6c6962 36342f6c 642d6c69 6e75782d /lib64/ld-linux-
  0x00000328 7838362d 36342e73 6f2e3200          x86-64.so.2.
    

But we can see even more sections than only .interp.


readelf -SW /bin/ls

There are 28 section headers, starting at offset 0x23388:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .interp           PROGBITS        0000000000000318 000318 00001c 00   A  0   0  1
  [ 2] .note.gnu.property NOTE            0000000000000338 000338 000040 00   A  0   0  8
  [ 3] .note.gnu.build-id NOTE            0000000000000378 000378 000024 00   A  0   0  4
  [ 4] .note.ABI-tag     NOTE            000000000000039c 00039c 000020 00   A  0   0  4
  [ 5] .gnu.hash         GNU_HASH        00000000000003c0 0003c0 0000b0 00   A  6   0  8
  [ 6] .dynsym           DYNSYM          0000000000000470 000470 000c00 18   A  7   1  8
  [ 7] .dynstr           STRTAB          0000000000001070 001070 0005cb 00   A  0   0  1
  [ 8] .gnu.version      VERSYM          000000000000163c 00163c 000100 02   A  6   0  2
  [ 9] .gnu.version_r    VERNEED         0000000000001740 001740 0000a0 00   A  7   1  8
  [10] .rela.dyn         RELA            00000000000017e0 0017e0 001ef0 18   A  6   0  8
  [11] .rela.plt         RELA            00000000000036d0 0036d0 000018 18  AI  6  23  8
  [12] .init             PROGBITS        0000000000004000 004000 00001b 00  AX  0   0  4
  [13] .plt              PROGBITS        0000000000004020 004020 000020 10  AX  0   0 16
  [14] .text             PROGBITS        0000000000004040 004040 0143d2 00  AX  0   0 16
  [15] .fini             PROGBITS        0000000000018414 018414 00000d 00  AX  0   0  4
  [16] .rodata           PROGBITS        0000000000019000 019000 005029 00   A  0   0 32
  [17] .eh_frame_hdr     PROGBITS        000000000001e02c 01e02c 0009a4 00   A  0   0  4
  [18] .eh_frame         PROGBITS        000000000001e9d0 01e9d0 003370 00   A  0   0  8
  [19] .init_array       INIT_ARRAY      0000000000022f70 021f70 000008 08  WA  0   0  8
  [20] .fini_array       FINI_ARRAY      0000000000022f78 021f78 000008 08  WA  0   0  8
  [21] .data.rel.ro      PROGBITS        0000000000022f80 021f80 000ad8 00  WA  0   0 32
  [22] .dynamic          DYNAMIC         0000000000023a58 022a58 000200 10  WA  7   0  8
  [23] .got              PROGBITS        0000000000023c58 022c58 000398 08  WA  0   0  8
  [24] .data             PROGBITS        0000000000024000 023000 000268 00  WA  0   0 32
  [25] .bss              NOBITS          0000000000024280 023268 0012d8 00  WA  0   0 32
  [26] .comment          PROGBITS        0000000000000000 023268 000012 01  MS  0   0  1
  [27] .shstrtab         STRTAB          0000000000000000 02327a 00010a 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  l (large), p (processor specific)
    

So the .interp section stores the interpreter, also called dynamic linker, but what do the other sections do? The Type field gives us an indication that .note fields store notes. The most relevant sections are .rodata which stores global variables and .text which stores the actually executed code. The .bss (block starting symbol) contains statically declared variables that have not been assigned a value. .strtab stands for String Table and .shstrtab stands for Section Header String Table which store Symbols. More can be read here.

No lets look at where the dynamic linker actually puts the program into memory.


readelf -lW /bin/ls

Elf file type is DYN (Shared object file)
Entry point 0x5bc0
There are 13 program headers, starting at offset 64

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000000040 0x0000000000000040 0x0002d8 0x0002d8 R   0x8
  INTERP         0x000318 0x0000000000000318 0x0000000000000318 0x00001c 0x00001c R   0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x000000 0x0000000000000000 0x0000000000000000 0x0036e8 0x0036e8 R   0x1000
  LOAD           0x004000 0x0000000000004000 0x0000000000004000 0x014421 0x014421 R E 0x1000
  LOAD           0x019000 0x0000000000019000 0x0000000000019000 0x008d40 0x008d40 R   0x1000
  LOAD           0x021f70 0x0000000000022f70 0x0000000000022f70 0x0012f8 0x0025e8 RW  0x1000
  DYNAMIC        0x022a58 0x0000000000023a58 0x0000000000023a58 0x000200 0x000200 RW  0x8
  NOTE           0x000338 0x0000000000000338 0x0000000000000338 0x000040 0x000040 R   0x8
  NOTE           0x000378 0x0000000000000378 0x0000000000000378 0x000044 0x000044 R   0x4
  GNU_PROPERTY   0x000338 0x0000000000000338 0x0000000000000338 0x000040 0x000040 R   0x8
  GNU_EH_FRAME   0x01e02c 0x000000000001e02c 0x000000000001e02c 0x0009a4 0x0009a4 R   0x4
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0x10
  GNU_RELRO      0x021f70 0x0000000000022f70 0x0000000000022f70 0x001090 0x001090 R   0x1

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .interp .note.gnu.property .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt
   03     .init .plt .text .fini
   04     .rodata .eh_frame_hdr .eh_frame
   05     .init_array .fini_array .data.rel.ro .dynamic .got .data .bss
   06     .dynamic
   07     .note.gnu.property
   08     .note.gnu.build-id .note.ABI-tag
   09     .note.gnu.property
   10     .eh_frame_hdr
   11
   12     .init_array .fini_array .data.rel.ro .dynamic .got
    
Here we can see in the virtual memory row to which adress which segment will be mapped and out of which sections the segment consists of.