I'm starting a new thread since "reading an empty directory after reboot is very slow" is quite long, and the subject has changed a bit. Now that I think I've found the reason, this one may be short. :)
My disk is a traditional hard disk, not SSD, and the filesystem is ext3. The problem is the following: I have a Perl script that reads a big maildir directory, and when data from this directory are not in the cache, this script is much slower than Mutt (without header cache). One can assume that the files are very small (see below): < 100 bytes. I've done a test with "grep -qr" (without matches so that all the files are read), and it is also slow. In my script, if I don't read the files at all, this is fast, so that the problem comes from the reading of these files. Now, the test. First, I create a big maildir folder with: ------------------------------------------------------------ #!/bin/sh set -e dir=maildir-test rm -rf "$dir" maildirmake "$dir" for i in `seq 5000` do date=$((10000000+i)) cat <<EOF > "$dir/cur/$date.1.host:2,S" From: <a@b.invalid> Subject: $date Message-ID: <$d...@vinc17.org> test $i EOF done ------------------------------------------------------------ ypig:~> sudo drop-caches && sh -c "time mutt -F /dev/null -f maildir-test" 0.11user 0.19system 0:02.41elapsed 12%CPU (0avgtext+0avgdata 9584maxresident)k 47728inputs+0outputs (18major+6360minor)pagefaults 0swaps ypig:~> sudo drop-caches && sh -c "time grep -qr zzz maildir-test" Command exited with non-zero status 1 0.03user 0.33system 0:34.63elapsed 1%CPU (0avgtext+0avgdata 4268maxresident)k 44336inputs+0outputs (4major+492minor)pagefaults 0swaps ypig:~> sudo drop-caches && sh -c "time labels2arc maildir-test" [...] 0.28user 0.26system 0:35.95elapsed 1%CPU (0avgtext+0avgdata 6308maxresident)k 48120inputs+0outputs (8major+758minor)pagefaults 0swaps where drop-caches is a script that does the following: sync echo 3 > /proc/sys/vm/drop_caches Note that the number of inputs is almost the same in these 3 cases, But only Mutt is really fast. So, I've looked at strace output. Here are some excerpts: * With Mutt: [...] 11777 14:47:11 open("maildir-test/cur/10000009.1.host:2,S", O_RDONLY) = 3 11777 14:47:11 fstat(3, {st_mode=S_IFREG|0644, st_size=80, ...}) = 0 11777 14:47:11 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1e529c0000 11777 14:47:11 lseek(3, 0, SEEK_CUR) = 0 11777 14:47:11 read(3, "From: <a@b.invalid>\nSubject: 100"..., 4096) = 80 11777 14:47:11 fstat(3, {st_mode=S_IFREG|0644, st_size=80, ...}) = 0 11777 14:47:11 close(3) = 0 11777 14:47:11 munmap(0x7f1e529c0000, 4096) = 0 11777 14:47:11 open("maildir-test/cur/10000010.1.host:2,S", O_RDONLY) = 3 11777 14:47:11 fstat(3, {st_mode=S_IFREG|0644, st_size=81, ...}) = 0 11777 14:47:11 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1e529c0000 11777 14:47:11 lseek(3, 0, SEEK_CUR) = 0 11777 14:47:11 read(3, "From: <a@b.invalid>\nSubject: 100"..., 4096) = 81 11777 14:47:11 fstat(3, {st_mode=S_IFREG|0644, st_size=81, ...}) = 0 11777 14:47:11 close(3) = 0 11777 14:47:11 munmap(0x7f1e529c0000, 4096) = 0 11777 14:47:11 write(1, "\33[76;25H10/5000 (0%)", 20) = 20 11777 14:47:11 open("maildir-test/cur/10000011.1.host:2,S", O_RDONLY) = 3 11777 14:47:11 fstat(3, {st_mode=S_IFREG|0644, st_size=81, ...}) = 0 11777 14:47:11 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1e529c0000 11777 14:47:11 lseek(3, 0, SEEK_CUR) = 0 11777 14:47:11 read(3, "From: <a@b.invalid>\nSubject: 100"..., 4096) = 81 11777 14:47:11 fstat(3, {st_mode=S_IFREG|0644, st_size=81, ...}) = 0 11777 14:47:11 close(3) = 0 11777 14:47:11 munmap(0x7f1e529c0000, 4096) = 0 [...] * With grep: [...] 11789 14:47:42 openat(6, "10000009.1.host:2,S", O_RDONLY|O_NOFOLLOW) = 3 11789 14:47:42 fstat(3, {st_mode=S_IFREG|0644, st_size=80, ...}) = 0 11789 14:47:42 ioctl(3, TCGETS, 0x7ffdcdff2d40) = -1 ENOTTY (Inappropriate ioctl for device) 11789 14:47:42 read(3, "From: <a@b.invalid>\nSubject: 100"..., 32768) = 80 11789 14:47:42 read(3, "", 32768) = 0 11789 14:47:42 close(3) = 0 11789 14:47:42 openat(6, "10003090.1.host:2,S", O_RDONLY|O_NOFOLLOW) = 3 11789 14:47:42 fstat(3, {st_mode=S_IFREG|0644, st_size=83, ...}) = 0 11789 14:47:42 ioctl(3, TCGETS, 0x7ffdcdff2d40) = -1 ENOTTY (Inappropriate ioctl for device) 11789 14:47:42 read(3, "From: <a@b.invalid>\nSubject: 100"..., 32768) = 83 11789 14:47:42 read(3, "", 32768) = 0 11789 14:47:42 close(3) = 0 11789 14:47:42 openat(6, "10004692.1.host:2,S", O_RDONLY|O_NOFOLLOW) = 3 11789 14:47:42 fstat(3, {st_mode=S_IFREG|0644, st_size=83, ...}) = 0 11789 14:47:42 ioctl(3, TCGETS, 0x7ffdcdff2d40) = -1 ENOTTY (Inappropriate ioctl for device) 11789 14:47:42 read(3, "From: <a@b.invalid>\nSubject: 100"..., 32768) = 83 11789 14:47:42 read(3, "", 32768) = 0 11789 14:47:42 close(3) = 0 [...] * With my script: [...] 11954 14:50:52 open("maildir-test/cur/10000009.1.host:2,S", O_RDONLY) = 5 11954 14:50:52 ioctl(5, TCGETS, 0x7fff5e65d060) = -1 ENOTTY (Inappropriate ioctl for device) 11954 14:50:52 lseek(5, 0, SEEK_CUR) = 0 11954 14:50:52 fstat(5, {st_mode=S_IFREG|0644, st_size=80, ...}) = 0 11954 14:50:52 fcntl(5, F_SETFD, FD_CLOEXEC) = 0 11954 14:50:52 read(5, "From: <a@b.invalid>\nSubject: 100"..., 8192) = 80 11954 14:50:52 lseek(5, 72, SEEK_SET) = 72 11954 14:50:52 lseek(5, 0, SEEK_CUR) = 72 11954 14:50:52 close(5) = 0 11954 14:50:52 open("maildir-test/cur/10003090.1.host:2,S", O_RDONLY) = 5 11954 14:50:52 ioctl(5, TCGETS, 0x7fff5e65d060) = -1 ENOTTY (Inappropriate ioctl for device) 11954 14:50:52 lseek(5, 0, SEEK_CUR) = 0 11954 14:50:52 fstat(5, {st_mode=S_IFREG|0644, st_size=83, ...}) = 0 11954 14:50:52 fcntl(5, F_SETFD, FD_CLOEXEC) = 0 11954 14:50:52 read(5, "From: <a@b.invalid>\nSubject: 100"..., 8192) = 83 11954 14:50:52 lseek(5, 72, SEEK_SET) = 72 11954 14:50:52 lseek(5, 0, SEEK_CUR) = 72 11954 14:50:52 close(5) = 0 11954 14:50:52 open("maildir-test/cur/10004692.1.host:2,S", O_RDONLY) = 5 11954 14:50:52 ioctl(5, TCGETS, 0x7fff5e65d060) = -1 ENOTTY (Inappropriate ioctl for device) 11954 14:50:52 lseek(5, 0, SEEK_CUR) = 0 11954 14:50:52 fstat(5, {st_mode=S_IFREG|0644, st_size=83, ...}) = 0 11954 14:50:52 fcntl(5, F_SETFD, FD_CLOEXEC) = 0 11954 14:50:52 read(5, "From: <a@b.invalid>\nSubject: 100"..., 8192) = 83 11954 14:50:52 lseek(5, 72, SEEK_SET) = 72 11954 14:50:52 lseek(5, 0, SEEK_CUR) = 72 11954 14:50:52 close(5) = 0 [...] One can see an obvious difference: grep and my script both read the files in the directory order (I know that this is the case with my script, and grep's behavior is identical), which can be regarded as random due to the use of a hash (see the other thread). Mutt uses a different order, and after a look at its mh.c source file, I can see that it sorts the files by inode number (see maildir_delayed_parsing function). IMHO, this is a good choice because, specially in big directories, doing that may lead to contiguous files on the disk, and I think that it is the reason why Mutt is much faster. Now I wonder whether the use of the hash by ext3 is a good idea... Alternatively, I suppose that a SSD disk could improve things. -- Vincent Lefèvre <vinc...@vinc17.net> - Web: <https://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20150424135238.ga12...@ypig.lip.ens-lyon.fr