2008 Headshot

The Life Unwired

with Ben Combee

Previous Entry Share Next Entry
Weak Linking and Linux Shared Libraries
2008 Headshot
unwiredben
This post is about esoteric dynamic linking issues on Linux... feel free to skip it if you're not a programmer.


The task seemed simple. Palm had released an early version of a shared library called libpdl.so with version 1.4.0 of webOS. We had put out an updated library with our PDK that added a few useful calls for developers. Now, management had asked that we get a few of the in-progress PDK apps ready to put in the catalog to distribute to devices running 1.4.0 and 1.4.1.

The problem is that if you use one of the new calls, your app won't even start on the currently shipping OS versions. The system's loader will complain about a missing symbol like "PDL_Init". What we needed was a way to tell which library version we were using at runtime so apps could hardcode some values for the older devices while still calling the right APIs in the future when we release an OS version that has those calls.

I pulled an idea out of my toolkit of things to help linkers work correctly, a concept called "weak linking". The idea is to tell the compiler and linker that you want to use a function, but it's OK if it doesn't actually exist when you link your program together. In GCC, you do this by marking the function's prototype with __attribute__((weak)). At runtime, you check to see if that function is bound to NULL first before calling it, and if it is, you know that no definition was provided so you can do something else.

I've used this for statically-linked libraries in the past, as well as in shared libraries where the weak-linked symbol could be provided by the caller. However, in this case, I wanted to weak-link with a symbol in a shared library so I could detect if it wasn't there.

When I tried this out in a test application, I was surprised to see that if I weak linked PDL_Init, I'd get a non-NULL value for it when run on a device with the older version of libpdl.so that didn't provide the symbol. Using the __attribute__((weak)) would allow the app to startup, but I didn't get the behavior I wanted since a line like if (PDL_Init != NULL) would always be true. I'd then call PDL_Init, the device would jump to a NULL address, and I'd get a crash.

What's going on? The problem is that the address of PDL_Init wasn't NULL -- it was the address of the ELF jump table. My code would jump into the PDL_Init stub, but since the stub wasn't bound to a symbol since there's no PDL_Init in the library, we'd just jump to address 0.

How do you fix this? I thought about it for a while and was pursuing a complex code path where I'd have the code use dlopen to open the libpdl.so library, then use dlsym to check and load the symbol. In reading the dlsym man page, I found that instead of using a open library, you can pass the value RTLD_DEFAULT which tells dlsym to just search the libraries that are open. With this knowledge, I was able to change the check from if (PDL_Init != NULL) to be if (dlsym(RTLD_DEFAULT, "PDL_Init") != NULL) and things started working again.

To summarize: __attribute__((weak)) does affect dynamic linking, but you need to use dlsym to check for a symbols existence because the weakly-linked function's name is bound to the ELF jump table, not to the final address.

Hi Ben,

Do you think this is the behavior on every Linux system? Does it depend on toolchain/linker/loader/os?

I've not tried this on any other versions of Linux/toolchain/platform since I made this post, but I expect this behavior is a general issue with ELF shared libraries and weak linking.