On Mon, 21 Dec 2015, Joonsoo Kim wrote: > There is a performance drop report due to hugepage allocation and in there > half of cpu time are spent on pageblock_pfn_to_page() in compaction [1]. > In that workload, compaction is triggered to make hugepage but most of > pageblocks are un-available for compaction due to pageblock type and > skip bit so compaction usually fails. Most costly operations in this case > is to find valid pageblock while scanning whole zone range. To check > if pageblock is valid to compact, valid pfn within pageblock is required > and we can obtain it by calling pageblock_pfn_to_page(). This function > checks whether pageblock is in a single zone and return valid pfn > if possible. Problem is that we need to check it every time before > scanning pageblock even if we re-visit it and this turns out to > be very expensive in this workload. > > Although we have no way to skip this pageblock check in the system > where hole exists at arbitrary position, we can use cached value for > zone continuity and just do pfn_to_page() in the system where hole doesn't > exist. This optimization considerably speeds up in above workload. > > Before vs After > Max: 1096 MB/s vs 1325 MB/s > Min: 635 MB/s 1015 MB/s > Avg: 899 MB/s 1194 MB/s > > Avg is improved by roughly 30% [2]. >
Wow, ok! I'm wondering if it would be better to maintain this as a characteristic of each pageblock rather than each zone. Have you tried to introduce a couple new bits to pageblock_bits that would track (1) if a cached value makes sense and (2) if the pageblock is contiguous? On the first call to pageblock_pfn_to_page(), set the first bit, PB_cached, and set the second bit, PB_contiguous, iff it is contiguous. On subsequent calls, if PB_cached is true, then return PB_contiguous. On memory hot-add or remove (or init), clear PB_cached. What are the cases where pageblock_pfn_to_page() is used for a subset of the pageblock and the result would be problematic for compaction? I.e., do we actually care to use pageblocks that are not contiguous at all? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

