Better Living Through Clang-istry


Recently I came across this article, explaining how to use clang to dump the memory layout of a C++ object. Running

clang -cc1 -fdump-record-layouts ppfile.cpp

on a preprocessed C++ file, produced using, e.g.,

clang -E -I/probably/lots/of/include/paths  file.cpp

gives output like:

*** Dumping AST Record Layout
   0 | class StarObject
   0 |   class SkyObject (primary base)
   0 |     class SkyPoint (primary base)
   0 |       (SkyPoint vtable pointer)
   0 |       (SkyPoint vftable pointer)
  16 |       long double lastPrecessJD
  32 |       class dms RA0
  32 |         double D
     |       [sizeof=8, dsize=8, align=8
     |        nvsize=8, nvalign=8]
  ...(snipped)...
 184 |   float B
 188 |   float V
     | [sizeof=192, dsize=192, align=16
     |  nvsize=192, nvalign=16]

Notice that the lastPrecessJD variable is stored as a long double, with possibly 63 bits of precision instead of the usual 53 bits given by a double. In practice, long double has 16-byte storage and alignment. Since the vtable takes up only 8 bytes (on 64-bit), we waste 8 bytes on padding. Moreover, we then take up 16 bytes to store lastPrecessJD, but using a program like the following:

#include <stdio.h>
#include <math.h>

int main()
{
    double jd2000 = 2451545.0;
    double delta = nextafter(jd2000,jd2000+1) - jd2000;
    printf("delta: %.30f\n", delta);
    return 0;
}

we can compute that at the year 2000, the minimum time step at (64-bit) double precision is approximately 40 microseconds, so it’s not clear that we gain anything by using 80-bit long doubles instead of 64-bit doubles. Changing the long double to double (and placing it last, though this isn’t strictly necessary) results in memory layout for the SkyPoint class like so:

*** Dumping AST Record Layout
   0 | class SkyPoint
   0 |   (SkyPoint vtable pointer)
   0 |   (SkyPoint vftable pointer)
   8 |   class dms RA0
   8 |     double D
     |   [sizeof=8, dsize=8, align=8
     |    nvsize=8, nvalign=8]
  ...(snipped)...
  48 |   class dms Az
  48 |     double D
     |   [sizeof=8, dsize=8, align=8
     |    nvsize=8, nvalign=8]

  56 |   double lastPrecessJD
     | [sizeof=64, dsize=64, align=8
     |  nvsize=64, nvalign=8]

This saves 16 bytes, cutting the size to 64 bytes from 801. Since KStars suffers from abuse of complex inheritance heirarchies and everything-is-an-object, this is 16 bytes saved for every single object in the sky.

Doing some simple rearrangements of the data in other classes means we can also save 8 bytes per StarObject and DeepSkyObject. Overall, these changes give approximately a 10% reduction in memory usage, just from removing padding.

Footnotes

  1. This also has the benefit that the SkyPoint data fits in a single cache line, though I don’t think this really makes a difference given the inefficiencies in the rest of the code, and the fact that none of our data has any thought put into alignment, but it’s nice to have.