Lots of good info here. I've copied Matt's reply to the project page on the open-zfs wiki: http://open-zfs.org/wiki/Projects#Sorted_Scrub
On Sat, Jul 9, 2016 at 2:24 PM, Matthew Ahrens <[email protected]> wrote: > We had an intern work on "sorted scrub" last year. Essentially the idea > was to read the metadata to gather into memory all the BP's that need to be > scrubbed, sort them by DVA (i.e. offset on disk) and then issue the scrub > i/os in that sorted order. However, memory can't hold all of the BP's, so > we do multiple passes over the metadata, each pass gathering the next chunk > of BP's. This code is implemented and seems to work but probably needs > some more testing and code cleanup. > > One of the downsides of that approach is having to do multiple passes over > the metadata if it doesn't all fit in memory (which it typically does > not). In some circumstances, this is worth it, but in others not so much. > To improve on that, we would like to do just one pass over the metadata to > find all the block pointers. Rather than storing the BP's sorted in > memory, we would store them on disk, but only roughly sorted. There are > several ways we could do the sorting, which is one of the issues that makes > this problem interesting. > > We could divide each top-level vdev into chunks (like metaslabs, but > probably a different number of them) and for each chunk have an on-disk > list of BP's in that chunk that need to be scrubbed/resilvered. When we > find a BP, we would append it to the appropriate list. Once we have > traversed all the metadata to find all the BP's, we would load one chunk's > list of BP's into memory, sort it, and then issue the resilver i/os in > sorted order. > > As an alternative, it might be better to accumulate as many BP's as fit in > memory, sort them, and then write that sorted list to disk. Then remove > those BP's from memory and start filling memory again, write that list, > etc. Then read all the sorted lists in parallel to do a merge sort. This > has the advantage that we do not need to append to lots of lists as we are > traversing the metadata. Instead we have to read from lots of lists as we > do the scrubs, but this should be more efficient We also don't have to > determine beforehand how many chunks to divide each vdev into. > > If you'd like to continue working on sorted scrub along these lines, let > me know. > > --matt > > > On Sat, Jul 9, 2016 at 7:10 AM, Gvozden Neskovic <[email protected]> > wrote: > >> Dear OpenZFS developers, >> >> Since SIMD RAID-Z code was merged to ZoL [1], I started to look into the >> rest of the scrub/resilvering code path. >> I've found some existing specs and ideas about how to make the process >> more rotational drive friendly [2][3][4][5]. >> What I've gathered from these is that scrub should be split to metadata >> and data traversal phases. As I'm new to ZFS, >> I've made a quick prototype simulating large elevator using AVL list to >> sort blocks by DVA offset [6]. It's probably >> broken in more than few ways, but this is just a quick hack to get a >> grasp of the code. Solution turned out similar to >> 'ASYNC_DESTROY' feature, so I'm wondering if this might be a direction to >> take? >> >> At this stage, I would appreciate any input on how to proceed with this >> project. If you're a core dev and would like >> to provide any kind of mentorship or willing to answer some questions >> from time to time, please let me know. >> Or, if there's a perfect solution for this just waiting to be >> implemented, even better. >> For starters, pointers like: read this article, make sure you understand >> this peace of code, etc., would also be very helpful. >> >> Regards, >> >> [1] >> https://github.com/zfsonlinux/zfs/commit/ab9f4b0b824ab4cc64a4fa382c037f4154de12d6 >> [2] https://blogs.oracle.com/roch/entry/sequential_resilvering >> [3] >> http://wiki.old.lustre.org/images/f/ff/Rebuild_performance-2009-06-15.pdf >> [4] https://blogs.oracle.com/ahrens/entry/new_scrub_code >> [5] http://open-zfs.org/wiki/Projects#Periodic_Data_Validation >> [6] >> https://github.com/ironMann/zfs/commit/9a2ec765d2afc38ec76393dd694216fae0221443 >> > > *openzfs-developer* | Archives > <https://www.listbox.com/member/archive/274414/=now> > <https://www.listbox.com/member/archive/rss/274414/28015357-32dd7c48> | > Modify > <https://www.listbox.com/member/?&> > Your Subscription <http://www.listbox.com> > ------------------------------------------- openzfs-developer Archives: https://www.listbox.com/member/archive/274414/=now RSS Feed: https://www.listbox.com/member/archive/rss/274414/28015062-cce53afa Modify Your Subscription: https://www.listbox.com/member/?member_id=28015062&id_secret=28015062-f966d51c Powered by Listbox: http://www.listbox.com
