Recently I came across this article, explaining how to use clang to dump the memory layout of a C++ object. Running
clang -cc1 -fdump-record-layouts ppfile.cpp
on a preprocessed C++ file, produced using, e.g.,
clang -E -I/probably/lots/of/include/paths file.cpp
gives output like:
*** Dumping AST Record Layout
0 | class StarObject
0 | class SkyObject (primary base)
0 | class SkyPoint (primary base)
0 | (SkyPoint vtable pointer)
0 | (SkyPoint vftable pointer)
16 | long double lastPrecessJD
32 | class dms RA0
32 | double D
| [sizeof=8, dsize=8, align=8
| nvsize=8, nvalign=8]
...(snipped)...
184 | float B
188 | float V
| [sizeof=192, dsize=192, align=16
| nvsize=192, nvalign=16]
Notice that the lastPrecessJD
variable is stored as a long double
, with possibly 63 bits of precision instead of the usual 53 bits given by a double
. In practice, long double
has 16-byte storage and alignment. Since the vtable takes up only 8 bytes (on 64-bit), we waste 8 bytes on padding. Moreover, we then take up 16 bytes to store lastPrecessJD
, but using a program like the following:
#include <stdio.h>
#include <math.h>
int main()
{
double jd2000 = 2451545.0;
double delta = nextafter(jd2000,jd2000+1) - jd2000;
printf("delta: %.30f\n", delta);
return 0;
}
we can compute that at the year 2000, the minimum time step at (64-bit) double precision is approximately 40 microseconds, so it’s not clear that we gain anything by using 80-bit long doubles instead of 64-bit doubles. Changing the long double
to double
(and placing it last, though this isn’t strictly necessary) results in memory layout for the SkyPoint class like so:
*** Dumping AST Record Layout
0 | class SkyPoint
0 | (SkyPoint vtable pointer)
0 | (SkyPoint vftable pointer)
8 | class dms RA0
8 | double D
| [sizeof=8, dsize=8, align=8
| nvsize=8, nvalign=8]
...(snipped)...
48 | class dms Az
48 | double D
| [sizeof=8, dsize=8, align=8
| nvsize=8, nvalign=8]
56 | double lastPrecessJD
| [sizeof=64, dsize=64, align=8
| nvsize=64, nvalign=8]
This saves 16 bytes, cutting the size to 64 bytes from 801. Since KStars suffers from abuse of complex inheritance heirarchies and everything-is-an-object, this is 16 bytes saved for every single object in the sky.
Doing some simple rearrangements of the data in other classes means we can also save 8 bytes per StarObject and DeepSkyObject. Overall, these changes give approximately a 10% reduction in memory usage, just from removing padding.
This also has the benefit that the SkyPoint data fits in a single cache line, though I don’t think this really makes a difference given the inefficiencies in the rest of the code, and the fact that none of our data has any thought put into alignment, but it’s nice to have.↩