On Mon, 10.09.12 15:56, Roland Schatz ([email protected]) wrote:
> On 10.09.2012 09:57, Lennart Poettering wrote: > > Well, I am not aware of anybody having done measurements recently > > about this. But I am not aware of anybody running into scalability > > issues so far. > I'm able to share a data point here, see attachment. > > TLDR: Outputting the last ten journal entries takes 30 seconds on my server. > > I have not reported this so far, because I'm not really sure whom to > blame. Current suspects include the journal, btrfs and hyper-v ;) > > Some details about my setup: I'm running Arch Linux on a virtual server, > running on hyper-v on some windows host (outside of my control). I'm > currently using systemd 189, but the journal files are much older. > journald.conf is empty (everything commented, i.e., the default). The > journal logging was activated in February (note the strange first output > line that says the journal ends on 28 Feb, while still containing > entries up to right now). Since then, I have not removed/archived any > journal files from the system. > > After issuing journalctl a few times, time goes down significantly, ever > for larger values of -n (e.g. first try -n10 takes 30 secs, second -n10 > takes 18 secs, third -n10 takes 0.2 secs, after that, even -n100 takes > 0.2 secs, -n500 takes 0.8 secs and so on). Rebooting or simply waiting a > day or so makes it slow again. > > Btrfs and fragmentation may be an issue: Defragmenting the journal files > seems to make things better. But its hard to be sure whether this is > really the problem, because defrag could just be pulling the journal > files into the fs cache, therefore having a similar effect than the > repeated journalctl... > > I'm not able to reproduce the problem by copying the whole journal to > another system and running journalctl there. I'm also seeing the getting > faster effect, but it starts at 2 secs and then goes down to 0.2 secs. > Also, I'm not seeing any difference between btrfs and ext4, so maybe > really fragmentation is the issue, although I don't really understand > how a day of logging could fragment the log files that badly, even on a > COW filesystem. Yes, there still is the indirection of the virtual > drive, but I have a fixed-size disk file residing on an ntfs drive, so > there shouldn't be any noticable additional fragmentation coming from > that setup. > > I'm not sure what I can do investigate this further. For me this is a > low priority problem, since the system is running stable and the problem > goes away after a few journalctl runs. But if you have anything you'd > like me to try, I'd be happy to assist. Hmm, these are definitely weird results. A few notes: Appending things to the journal is done by adding a few data objects to the end of the file and then updating a couple of pointers in the front. This is not an ideal access pattern on rotating media but my educated guess would be that this is not your problem (not your only one at least...), as the respective tables are few and should be in memory quickly (that said, I didn't do any precise IO pattern measurements for this). If the access pattern turns out to be too bad we could certainly improve it (for example by delay-writing the updated pointers). Now, I am not sure what such an access pattern means to COW file systems such as btrfs. Might be worth playing around with COW for that, for example by removing the COW flag from the generated files (via FS_NOCOW_FL in FS_IOC_SETFLAGS which you need to start before the first write to the file, i.e. you need to patch journald for that). So much about the write access pattern. Most likely that's not impacting you much anyway, but the read access pattern is. And that's much more chaotic usually (and hence slow on rotating disks) than the write access pattern, as we write journal fields only once and the reference them by all entries using them. Which might mean that we end up jumping around on disk for each entry we try to read as we iterate through its fields. But this could be improved easily too, for example, as the order of fields is undefined we could just order them by offset, so that we read them by their order on disk. It would be really good getting some hard data about access patterns before we optimize things, though... Lennart -- Lennart Poettering - Red Hat, Inc. _______________________________________________ systemd-devel mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/systemd-devel
