Am 13.08.2012 20:18, schrieb Michael Hampicke: > Am 13.08.2012 19:14, schrieb Florian Philipp: >> Am 13.08.2012 16:52, schrieb Michael Mol: >>> On Mon, Aug 13, 2012 at 10:42 AM, Michael Hampicke >>> <mgehampi...@gmail.com <mailto:mgehampi...@gmail.com>> wrote: >>> >>> Have you indexed your ext4 partition? >>> >>> # tune2fs -O dir_index /dev/your_partition >>> # e2fsck -D /dev/your_partition >>> >>> Hi, the dir_index is active. I guess that's why delete operations >>> take as long as they take (index has to be updated every time) >>> >>> >>> 1) Scan for files to remove >>> 2) disable index >>> 3) Remove files >>> 4) enable index >>> >>> ? >>> >>> -- >>> :wq >> >> Other things to think about: >> >> 1. Play around with data=journal/writeback/ordered. IIRC, data=journal >> actually used to improve performance depending on the workload as it >> delays random IO in favor of sequential IO (when updating the journal). >> >> 2. Increase the journal size. >> >> 3. Take a look at `man 1 chattr`. Especially the 'T' attribute. Of >> course this only helps after re-allocating everything. >> >> 4. Try parallelizing. Ext4 requires relatively few locks nowadays (since >> 2.6.39 IIRC). For example: >> find $TOP_DIR -mindepth 1 -maxdepth 1 -print0 | \ >> xargs -0 -n 1 -r -P 4 -I '{}' find '{}' -type f >> >> 5. Use a separate device for the journal. >> >> 6. Temporarily deactivate the journal with tune2fs similar to MM's idea. >> >> Regards, >> Florian Philipp >> > > Trying out different journals-/options was already on my list, but the > manpage on chattr regarding the T attribute is an interesting read. > Definitely worth trying. > > Parallelizing multiple finds was something I already did, but the only > thing that increased was the IO wait :) But now having read all the > suggestions in this thread, I might try it again. > > Separate device for the journal is a good idea, but not possible atm > (machine is abroad in a data center) >
Something else I just remembered. I guess it doesn't help you with your current problem but it might come in handy when working with such large cache dirs: I once wrote a script that sorts files by their starting physical block. This improved reading them quite a bit (2 minutes instead of 11 minutes for copying the portage tree). It's a terrible clutch, will probably fail when passing FS boundaries or a thousand other oddities and requires root for some very scary programs. I never had the time to finish an improved C version. Anyway, maybe it helps you: #!/bin/bash # # Example below copies /usr/portage/* to /tmp/portage. # Replace /usr/portage with the input directory. # Replace `cpio` with whatever does the actual work. Input is a # \0-delimited file list. # FIFO=/tmp/$(uuidgen).fifo mkfifo "$FIFO" find /usr/portage -type f -fprintf "$FIFO" 'bmap <%i> 0\n' -print0 | tr '\n\0' '\0\n' | paste <( debugfs -f "$FIFO" /dev/mapper/vg-portage | grep -E '^[[:digit:]]+' ) - | sort -k 1,1n | cut -f 2- | tr '\n\0' '\0\n' | cpio -p0 --make-directories /tmp/portage/ unlink "$FIFO"
signature.asc
Description: OpenPGP digital signature