http://www.gossamer-threads.com/lists/linux/kernel/434256

[RFC] extents,delayed allocation,mballoc for ext3

these patches implement several features for ext3:
- extents
- multiblock allocator
- delayed allocation (a.k.a. allocation on flush)


extents
=======
it's just a way to store inode's blockmap in well-known triples
[logical block; phys. block; length]. all the extents are stored
in B+Tree. code is splitted in two parts:
1) generic extents support
implements primitives like lookup, insert, remove, walk
2) VFS part
implements ->getblock() and ->truncate() methods

multiblock allocator
===================
the larger extents the better. the reasonable way is to ask block
allocator to allocate several blocks at once. it is possible to
scan bitmaps, but such a scanning isn't very good method. so, here
is mballoc - buddy algorithm + possibility to find contig.buddies
fast way. mballoc is backward-compatible, buddies are stored on a
disk as usual file (temporal solution until fsck support is ready)
and regenerated at mount time. also, with existing block-at-once
allocator it's impossible to write at very high rate (several
hundreds MB a sec). multiblock allocator solves this issue.

delayed allocation
==================
this is ->writepages() implementation that exploits very nice tagged
radix tree. it finds contiguous spaces and asks extents code to walk
over specified ranges of blocks. extents code calls given callback
routine that allocates blocks for listed pages, cookes a bio's and
submit them on a disk.


todo
====
1) blocks must be reserved to avoid -ENOSPC upon writeback
2) blocks must be available for allocation after committing only
3) data="" support
4) blocksize < PAGE_CACHE_SIZE support
5) option to allocator to look for +N blocks if goal is busy
6) probably preallocation for slowly-growing files
7) allocation policy tuning
8) regenerate buddies in crash case only


NOTE: don't try to use it in production. all the patches (probably
excluding extents) are pre-pre-alpha. because of size I put patches
in ftp://ftp.clusterfs.com/pub/people/alex/2.6.4-mm2/


benchmarks (hardware: 2way iPIII-1000Mhz/512MB/old scsi hdd (20MB/s)
====================================================================

I ran dd to write specified amount of data and measured time ext3
spent in allocator via get_cycles(). all the bitmaps were preloaded.

Reply via email to