Re: [gentoo-user] Fast file system for cache directory with lot's of files

Florian Philipp Tue, 14 Aug 2012 07:00:59 -0700

Am 13.08.2012 20:18, schrieb Michael Hampicke:
> Am 13.08.2012 19:14, schrieb Florian Philipp:
>> Am 13.08.2012 16:52, schrieb Michael Mol:
>>> On Mon, Aug 13, 2012 at 10:42 AM, Michael Hampicke
>>> <mgehampi...@gmail.com <mailto:mgehampi...@gmail.com>> wrote:
>>>
>>>         Have you indexed your ext4 partition?
>>>
>>>         # tune2fs -O dir_index /dev/your_partition
>>>         # e2fsck -D /dev/your_partition
>>>
>>>     Hi, the dir_index is active. I guess that's why delete operations
>>>     take as long as they take (index has to be updated every time) 
>>>
>>>
>>> 1) Scan for files to remove
>>> 2) disable index
>>> 3) Remove files
>>> 4) enable index
>>>
>>> ?
>>>
>>> -- 
>>> :wq
>>
>> Other things to think about:
>>
>> 1. Play around with data=journal/writeback/ordered. IIRC, data=journal
>> actually used to improve performance depending on the workload as it
>> delays random IO in favor of sequential IO (when updating the journal).
>>
>> 2. Increase the journal size.
>>
>> 3. Take a look at `man 1 chattr`. Especially the 'T' attribute. Of
>> course this only helps after re-allocating everything.
>>
>> 4. Try parallelizing. Ext4 requires relatively few locks nowadays (since
>> 2.6.39 IIRC). For example:
>> find $TOP_DIR -mindepth 1 -maxdepth 1 -print0 | \
>> xargs -0 -n 1 -r -P 4 -I '{}' find '{}' -type f
>>
>> 5. Use a separate device for the journal.
>>
>> 6. Temporarily deactivate the journal with tune2fs similar to MM's idea.
>>
>> Regards,
>> Florian Philipp
>>
> 
> Trying out different journals-/options was already on my list, but the
> manpage on chattr regarding the T attribute is an interesting read.
> Definitely worth trying.
> 
> Parallelizing multiple finds was something I already did, but the only
> thing that increased was the IO wait :) But now having read all the
> suggestions in this thread, I might try it again.
> 
> Separate device for the journal is a good idea, but not possible atm
> (machine is abroad in a data center)
>


Something else I just remembered. I guess it doesn't help you with your
current problem but it might come in handy when working with such large
cache dirs: I once wrote a script that sorts files by their starting
physical block. This improved reading them quite a bit (2 minutes
instead of 11 minutes for copying the portage tree).

It's a terrible clutch, will probably fail when passing FS boundaries or
a thousand other oddities and requires root for some very scary
programs. I never had the time to finish an improved C version. Anyway,
maybe it helps you:

#!/bin/bash
#
# Example below copies /usr/portage/* to /tmp/portage.
# Replace /usr/portage with the input directory.
# Replace `cpio` with whatever does the actual work. Input is a
# \0-delimited file list.
#
FIFO=/tmp/$(uuidgen).fifo
mkfifo "$FIFO"
find /usr/portage -type f -fprintf "$FIFO" 'bmap <%i> 0\n' -print0 |
tr '\n\0' '\0\n' |
paste <(
  debugfs -f "$FIFO" /dev/mapper/vg-portage |
  grep -E '^[[:digit:]]+'
) - |
sort -k 1,1n |
cut -f 2- |
tr '\n\0' '\0\n' |
cpio -p0 --make-directories /tmp/portage/
unlink "$FIFO"

signature.asc
Description: OpenPGP digital signature

Re: [gentoo-user] Fast file system for cache directory with lot's of files

Reply via email to