(Position) Independent Code
Disclaimer: This post is highly technical and on a complicated subject. I learned enough to “get things working” and so I’ll probably get some things wrong and as always, criticism is welcome.
All of the above work is actually in preparation for phase two, which is the ability to load and run custom module firmware and extra versions of the main application firmware. In the short term, we fully intend to just bake our module firmware right into the main firmware, for simplicity. Long term, though, this won’t work. Especially as we get into users providing their own drivers for custom modules and the like.
Anybody that has gotten creative with firmware has inevitably ended up in this quagmire. My first foray was while I was trying to keep two separate versions of firmware around during upgrades so that the earlier version was available as a backup. This simply won’t work without some effort.
The problem comes down to knowing where data and code are during runtime. You see, when your code is being linked the locations of variables, functions and instructions used in loops and conditions are typically fixed in the binary as absolute addresses.
void say_hello_world(const char *message, int a, int b, int c) {
printf("Hello, world! %s", message);
}
void test() {
say_hello_world("How are you?", 34, 12, 76);
}
When you call a function the compiler turns the call into assembly instructions for passing arguments and receiving return values as per the architecture’s calling convention and then a jump to the absolute address of the callee. During link time the linker knows the memory regions the code is intended to run in (from the linker script). Typically this means the code expects to find itself loaded at some memory address and so that offset is baked into the absolute addresses used when calling functions as well as to finding global variables.
In many typical cases the running code is offset by the memory set aside for the bootloader. So if the bootloader occupies 0x0000 to 0x4000 then the application firmware is expecting to be loaded at 0x4000 and so all jumps and access will be offset from there. This means that loading firmware at a different location in memory will cause all those absolute addresses to be horribly wrong!
So, how do we fix this?
Thankfully this is a problem that’s had a solution for a long time. Under GCC, the first trick is to enable position-independent code [1] using the -fpic compiler flag. This tells gcc to avoid using absolute addresses and include link time information to decouple the binary from a predetermined location in memory. My solution involved combining this with a few other flags and so two different mechanisms come into play:
- Relative addressing. By combining the above flag with -mpic-data-is-text-relative many of the addressing issues go away. This flag tells the compiler that the data section is always at a predictable location relative to the code section. So relative addressing can be used to find symbols in that section. Most jumps to functions will also be relative by virtue of using -fpic. This leaves the location of global variables as the problem.
- Being able to find data relative to the code segment gets us close, but we’re still falling short because the code running needs to be able to find mutable variables that are in RAM. This is important because it’s very easy for two separate binaries to try and occupy the same region of memory for their variables, and so using relative addressing wouldn’t really help in that situation. Instead, the linker introduces a Global Offset Table [2]. This is a table that contains the addresses of global variables. (See “The Fundamental Theorem of Software Engineering” [3])
After compilation the compiler will have wrapped global variable accesses in an indirect lookup through the GOT. How does it find that table? Good question, enter another compiler flag: -msingle-pic-base. This, I believe, is ARM specific and tells the compiler to dedicate a specific register to holding the location of the GOT. In the default case this is R9.
What this means, is that before running a binary compiled with the above flags some runtime work needs to be taken care of. When the linker does its job, part of the process is to generate a section that contains “relocations”. Relocations are an incredibly complicated topic and if you’re curious or find this description lacking I urge you to do some research on your own and I wish you luck…
The general, high level idea is the linker adds a table to the ELF file that contains locations in the final binary that need to be corrected at runtime. How much of this information depends on your needs, really. Most discussions around relocations stem from their use in dynamic linking of shared libraries, which is very similar to the work we’re doing here, only our situation is simpler. For example, we only need to run one instance of these binaries against one instance of their data, rather than allowing multiple consumers to use a shared library with their own separate data regions. This means we simply need to allocate room for a single collection of global variables, fixup the GOT and ensure that R9 is set correctly!
The basic process looks something like this, from our dynamic loader in the bootloader or elsewhere:
- Loop over all the relocations in the header, each symbol can have multiple relocations!
- If we haven’t yet, allocate some memory for this symbol.
- Fill in the location of the allocated memory in the GOT.
- Before launching the application firmware, set R9 to the base of the GOT.
And we’re done! We now have true position independence for the binaries and can load and run binaries like this from arbitrary positions in memory! We’re also not that far from supporting more complex scenarios, should they be necessary. One more quick aside, is that during that runtime dynamic linking phase I manually intervene for some symbols that are truly global and shared among the various binaries. For example, the RTT buffers mentioned in the debugging article are shared and fixed in memory and so the relocations for those are still handled, but we can simply return the address of the bootloader’s instances instead of allocating new ones!
I apologize for how cursory this was… there was much more I could have addressed but there’s already so many great resources on relocations and that stuff out there, especially in the security and vulnerability space 😉
Some great resources I used, from Eli Bendersky’s blog:
https://eli.thegreenplace.net/2011/08/25/load-time-relocation-of-shared-libraries/
https://eli.thegreenplace.net/2011/11/03/position-independent-code-pic-in-shared-libraries/
References
[1] https://en.wikipedia.org/wiki/Position-independent_code
[2] https://en.wikipedia.org/wiki/Global_Offset_Table
[3] https://en.wikipedia.org/wiki/Fundamental_theorem_of_software_engineering