http://www.gossamer-threads.com/lists/linux/kernel/434256
- extents - multiblock allocator - delayed allocation (a.k.a. allocation on flush) extents ======= it's just a way to store inode's blockmap in well-known triples [logical block; phys. block; length]. all the extents are stored in B+Tree. code is splitted in two parts: 1) generic extents support implements primitives like lookup, insert, remove, walk 2) VFS part implements ->getblock() and ->truncate() methods multiblock allocator =================== the larger extents the better. the reasonable way is to ask block allocator to allocate several blocks at once. it is possible to scan bitmaps, but such a scanning isn't very good method. so, here is mballoc - buddy algorithm + possibility to find contig.buddies fast way. mballoc is backward-compatible, buddies are stored on a disk as usual file (temporal solution until fsck support is ready) and regenerated at mount time. also, with existing block-at-once allocator it's impossible to write at very high rate (several hundreds MB a sec). multiblock allocator solves this issue. delayed allocation ================== this is ->writepages() implementation that exploits very nice tagged radix tree. it finds contiguous spaces and asks extents code to walk over specified ranges of blocks. extents code calls given callback routine that allocates blocks for listed pages, cookes a bio's and submit them on a disk. todo ==== 1) blocks must be reserved to avoid -ENOSPC upon writeback 2) blocks must be available for allocation after committing only 3) data="" support 4) blocksize < PAGE_CACHE_SIZE support 5) option to allocator to look for +N blocks if goal is busy 6) probably preallocation for slowly-growing files 7) allocation policy tuning 8) regenerate buddies in crash case only NOTE: don't try to use it in production. all the patches (probably excluding extents) are pre-pre-alpha. because of size I put patches in ftp://ftp.clusterfs.com/pub/people/alex/2.6.4-mm2/ benchmarks (hardware: 2way iPIII-1000Mhz/512MB/old scsi hdd (20MB/s) ==================================================================== I ran dd to write specified amount of data and measured time ext3 spent in allocator via get_cycles(). all the bitmaps were preloaded. | ||||
- [linuxkernelnewbies] [RFC] extents,delayed allocation,mballoc f... Peter Teoh
