Tuesday, 1 September 2009

Accelerating OBS .... !scratchbox

So I've been working on accelerating the Open Build Service recently.

It's a great build environment for Mer and we love it. It uses qemu to emulate arm architectures and provides a pristine chroot build environment using native packages. This approach of using distribution tools rather than a toolchain provides a more consistent build system.

The problem is that this approach is slow.

Following the 80/20 rule, the OBS engineers (thanks Martin, Jan Simon) identified the applications that represented the greatest load on the build system:
  • bash
  • bzip2
  • dpkg
  • gzip
  • m4
  • make
  • perl
And of course:
  • gcc
So the intention was to accelerate these binaries in the emulator by using host machine code, not emulated code.

The emulator uses binfmt_misc to call qemu for foriegn binaries when it hits them so simply replacing the binaries with host binaries should work.

Well; almost.

The host binaries need their shared libraries; and they live in the same place as the emulated shared libraries... which may be used by non-replaced code. Hmmm.

(Before going any further, lets establish a naming convention: target is the architecture of the chroot and eventual target; host is the architecture of the underlying CPU. Typically for us target is armel, host is x86.)

The approach to solving this problem is to minimise changes from a pure arm build environment. This is achieved by installing both arm and x86 packages; to avoid collisions the x86 packages are installed into /lib-x86 (in the chroot) and of course, shared libraries will fall into /lib-x86/[usr/]lib and not overwrite the target .so files.
The final step is to move selected binaries in /bin and /usr/bin out of the way and replace them with symlinks to the i586 binaries in /lib-x86/[usr/]bin

Most x86 packages are unmodified; any paths they use will relate to the main / directory and the config/data files installed by the target package will be used. (This works in general although the eagle-eyed will spot issues with things like XS modules for perl - not a major problem for a simple build environment though).

Some packages are modified to minimise build times - eg to avoid 64 bit builds; others are modified (eg dpkg) to set the target architecture as a default.

Obviously the host binaries need to link to shared libraries which are not in the normal LDPATH. We don't want to have to rely on environment variables etc so 2 steps are used:
  • modify ldlinux.so to look in /lib-x86/lib etc
  • modify the rpath value in the ELF header of each replaced binary

This is done by creating a static build of patchelf and modifying the binaries as they are installed.

Finally a modified package has all postinst scripts removed and replaced with something like:

$prefix/usr/bin/patchelf --set-rpath $prefix/lib $prefix/usr/bin/make
mv /usr/bin/make /usr/bin/make-orig-x86lib
ln -s $prefix/usr/bin/make /usr/bin/make

Note that some libraries have an rpath and need to be treated with patchelf too.

One final problem arises with fakeroot: it modifies the LD_LIBRARY_PATH to look for libfakeroot-XXX.so This is handled by editing the fakeroot script to append the /lib-x86/ paths in the same way the multi-arch paths are added.

All in all this is actually a very clean solution:
  • the modified binary uses the exact same source as the one it replace
  • the installation principles mimic biarch (and will, we hope use biarch when it's ready)
  • only the specific binary executable is replaced - data and scripts come from the target package
  • it can easily be switched on or off for a specific binary
  • enabling it on the OBS is done in the enclosing project, not the package

Next steps, making it work on the OBS and then ... building a cross-gcc.


  1. This is very similar to what we did in Mamona. See http://dev.openbossa.org/trac/mamona/wiki/Releases/0.2/noemu.

    In Mamona, we decided to use only static x86 binaries to avoid x86 libraries inside the chroot.

  2. I'll take a look at that Lauro.

    We considered building statically but the idea was that we used (as far as possible) unmodified executables.

    The approach we use is almost trivial.

    unpack the .deb; move the entire tree down a level; repack the .deb with a dependency on the target package and a postinst scripts that runs patchelf on the binary and then symlinks it into /.

    It wouldn't have been as straightforward without patchelf though ;)