All,
I've been comparing the readahead implementation that systemd has with
some of the code that Arjan and me maintained in a seperate readahead
implementation, to assure that we get adequate performance out of the
systemd version. We're not looking at small differences in the
implementation, but in large, the performance should be comparable.
A major obstacle we've identified is that the total readahead volume
(that is, the approximate size of stuff being listed in the readahead
pack) is significantly larger than it needs to be. The root cause for
this is that the default read_ahead_kb is set to 128kb.
What happens is that on first boot, when the pack file doesn't exist,
page faults are generated for all the libraries and files that are
needed to start up. The kernel sees the pages, and applies 128kb
read_ahead_kb on top of those. As a result, we will read 128kb more
for each file touched on top of every page needed, permitted that we
don't go beyond the end of the file. While this provides a reasonable
increase in speed on rotating media when booting without any readahead
acceleration, this is going to hurt later on.
Quick data summary: about 99% of all the files in the readahead pack
are read entirely, with 128kb read_ahead_kb. In reality, the amount
really needed is more in the 60%-70% region average.
Some data:
My test case is a light-weight desktop on an ultraboook. While the
system is using an SSD, we're looking at the volume, not boot time.
read_ahead_kb - resulting pack volume
128kb -> 83mb
16kb -> 53mb
8kb -> 50mb
The modification to the read_ahead_kb is done by a static C binary
that runs in all cases and modifies the sysfs files so the comparison
is not skewed - even in the default case the static binary runs. After
the modification it exec()'s systemd as usual.
As you can see, the total RA volume is significantly decreased by
lowering the RA size before we boot, but, in general, we don't want to
keep it low at all - the speedup from readahead should be put back to
the default size at a minimum (*experimental - mostly done by kernel
developer saying "hey this kernel compile is a great benchmark for
testing a good RA size setting").
So, at a minimum we want to revert any lower RA setting back to the
default once we're done with readahead collection.
However, due to the usage of fadvise WILLNEED in the replay service,
in subsequent boots, we don't have a problem with a higher default RA
size at all - since we tell the kernel exactly which pages we need,
the kernel will unlikely see page faults for pages we have not
readahead - and so the RA volume stays low on subsequent boots.
The solution:
Given above data, the solution is relatively simple:
collector service:
if (readahead pack file does not exist) {
- map block device to bdi/read_ahead_kb node
- get the current read_ahead_kb
- store this default somewhere
- lower the read_ahead_kb to 16kb or 8kb
}
normal collector code...
if (have read a default read_ahead_kb) {
- read the bdi/read_ahead_kb node
- if (stored default == value we put in there earlier) {
restore the original
}
}
Attached are:
- a pack file dumper
- a quick hack to tune the reada_head_kb before booting systemd - make
sure you compile with -static
Comments? If the proposed solution seems agreed upon, I will implement
a patch that accomplishes the above procedure. Since the hard part is
mapping the sysfs nodes, I haven't done this just yet.
Auke
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <unistd.h>
#include <inttypes.h>
#include <linux/limits.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <string.h>
int main(int argc, char *argv[])
{
char line[1024];
char path[PATH_MAX];
FILE *pack;
char a;
long tsize = 0;
uint32_t b;
uint32_t c;
struct stat st;
if (argc != 2) {
fprintf(stderr, "Usage: %s pack-file\n", argv[0]);
exit(EXIT_FAILURE);
}
pack = fopen(argv[1], "r");
if (!pack) {
fprintf(stderr, "Pack file missing\n");
exit(EXIT_FAILURE);
}
if (!(fgets(line, sizeof(line), pack))) {
fprintf(stderr, "Pack file corrupt\n");
exit(EXIT_FAILURE);
}
if ((a = getc(pack)) == EOF) {
fprintf(stderr, "Pack file corrupt\n");
exit(EXIT_FAILURE);
}
fprintf(stdout, "HOST=%s", line);
fprintf(stdout, "TYPE=%c\n", a);
fprintf(stdout, " from -> to : size path\n");
while(true) {
off_t size = 0;
int segments = 0;
if (!fgets(path, sizeof(path), pack))
break; /* done */
path[strlen(path)-1] = 0;
while (true) {
if (fread(&b, sizeof(b), 1, pack) != 1 ||
fread(&c, sizeof(c), 1, pack) != 1) {
fprintf(stderr, "Pack file corrupt\n");
exit(EXIT_FAILURE);
}
/* terminates with a 0x0000 0x0000 */
if ((b == 0) && (c == 0))
break;
size += (c - b);
segments++;
}
if (size == 0) {
if (stat(path, &st) == 0)
/* oops, this only works if we can stat() the file! */
size = st.st_size;
else
fprintf(stderr, "Unable to determine size of \"%s\"\n", path);
} else {
size *= 4096;
}
tsize += size;
fprintf(stdout, "%4d -> %4d = %12d (%d) : %s\n", b, c, size, segments, path);
}
fprintf(stdout, "\nTOTAL=%d\n", tsize);
exit(EXIT_SUCCESS);
}
#include <stdio.h>
#include <unistd.h>
void tune(const char *path)
{
FILE *f;
f = fopen(path, "w");
fprintf(f, "64");
fclose(f);
}
int main(void)
{
char initp[256] = "/sbin/bootchartd";
mount("sysfs", "/sys", "sysfs", 0, NULL);
tune("/sys/devices/virtual/bdi/8:0/read_ahead_kb");
tune("/sys/devices/virtual/bdi/btrfs-1/read_ahead_kb");
tune("/sys/devices/virtual/bdi/default/read_ahead_kb");
execl(initp, initp, NULL);
}
_______________________________________________
systemd-devel mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/systemd-devel