On 9/26/2025 4:30 PM, Matthew Auld wrote:
On 23/09/2025 10:02, Arunpravin Paneer Selvam wrote:
Add KUnit test cases that create severe memory fragmentation and
measure allocation/free performance.

The tests simulate two scenarios -

1. Allocation under severe fragmentation
    - Allocate the entire 4 GiB space as 8 KiB blocks with 64 KiB alignment,       split them into two groups and free with mixed flags to block coalescing.
    - Repeatedly allocate and free 64 KiB blocks while timing the loop.
    - Freelist runtime: 76475 ms(76.5 seconds), soft-lockup triggered.
      RB-tree runtime: 186 ms.

2. Reverse free order under fragmentation
    - Create a similarly fragmented space, free half the blocks, reverse
      the order of the remainder, and release them with the cleared flag.
    - Freelist runtime: 85620 ms(85.6 seconds).
      RB-tree runtime: 114 ms.

Signed-off-by: Arunpravin Paneer Selvam <[email protected]>
---
  drivers/gpu/drm/tests/drm_buddy_test.c | 110 +++++++++++++++++++++++++
  1 file changed, 110 insertions(+)

diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c b/drivers/gpu/drm/tests/drm_buddy_test.c
index 7a0e523651f0..19b49fb6ec19 100644
--- a/drivers/gpu/drm/tests/drm_buddy_test.c
+++ b/drivers/gpu/drm/tests/drm_buddy_test.c
@@ -21,6 +21,115 @@ static inline u64 get_size(int order, u64 chunk_size)
      return (1 << order) * chunk_size;
  }
  +static void drm_test_buddy_fragmentation_performance(struct kunit *test)
+{
+    const unsigned long max_acceptable_time_ms = 1000;
+    struct drm_buddy_block *block, *tmp;
+    int num_blocks, i, ret, count = 0;
+    LIST_HEAD(allocated_blocks);
+    unsigned long elapsed_ms;
+    LIST_HEAD(reverse_list);
+    LIST_HEAD(test_blocks);
+    LIST_HEAD(clear_list);
+    LIST_HEAD(dirty_list);
+    LIST_HEAD(free_list);
+    struct drm_buddy mm;
+    u64 mm_size = SZ_4G;
+    ktime_t start, end;
+
+    /*
+     * Allocation under severe fragmentation
+     *
+     * Create severe fragmentation by allocating the entire 4 GiB address space +     * as tiny 8 KiB blocks but forcing a 64 KiB alignment. The resulting pattern +     * leaves many scattered holes. Split the allocations into two groups and +     * return them with different flags to block coalescing, then repeatedly +     * allocate and free 64 KiB blocks while timing the loop. This stresses how +     * quickly the allocator can satisfy larger, aligned requests from a pool of
+     * highly fragmented space.
+     */
+    KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_init(&mm, mm_size, SZ_4K),
+                   "buddy_init failed\n");
+
+    num_blocks = mm_size / SZ_64K;
+
+    start = ktime_get();
+    /* Allocate with maximum fragmentation - 8K blocks with 64K alignment */
+    for (i = 0; i < num_blocks; i++)
+        KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, SZ_8K, SZ_64K,
+                                    &allocated_blocks, 0),
+                    "buddy_alloc hit an error size=%u\n", SZ_8K);
+
+    list_for_each_entry_safe(block, tmp, &allocated_blocks, link) {
+        if (count % 4 == 0 || count % 4 == 3)
+            list_move_tail(&block->link, &clear_list);
+        else
+            list_move_tail(&block->link, &dirty_list);
+        count++;
+    }
+
+    /* Free with different flags to ensure no coalescing */
+    drm_buddy_free_list(&mm, &clear_list, DRM_BUDDY_CLEARED);
+    drm_buddy_free_list(&mm, &dirty_list, 0);
+
+    for (i = 0; i < num_blocks; i++)
+        KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, SZ_64K, SZ_64K,
+                                    &test_blocks, 0),
+                    "buddy_alloc hit an error size=%u\n", SZ_64K);
+    drm_buddy_free_list(&mm, &test_blocks, 0);
+
+    end = ktime_get();
+    elapsed_ms = ktime_to_ms(ktime_sub(end, start));
+    /* Performance validation */
+    KUNIT_EXPECT_LT_MSG(test, elapsed_ms, max_acceptable_time_ms,
+                "Fragmented allocation took %lu ms (max acceptable: %lu ms)",
+                elapsed_ms, max_acceptable_time_ms);
+    drm_buddy_fini(&mm);
+
+    /*
+     * Reverse free order under fragmentation
+     *
+     * Construct a fragmented 4 GiB space by allocating every 8 KiB block with +     * 64 KiB alignment, creating a dense scatter of small regions. Half of the +     * blocks are selectively freed to form sparse gaps, while the remaining +     * allocations are preserved, reordered in reverse, and released back with +     * the cleared flag. This models a pathological reverse-ordered free pattern +     * and measures how quickly the allocator can merge and reclaim space when +     * deallocation occurs in the opposite order of allocation, exposing the +     * cost difference between a linear freelist scan and an ordered tree lookup.
+     */
+    ret = drm_buddy_init(&mm, mm_size, SZ_4K);
+    KUNIT_ASSERT_EQ(test, ret, 0);
+
+    start = ktime_get();
+    /* Allocate maximum fragmentation */
+    for (i = 0; i < num_blocks; i++)
+        KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, SZ_8K, SZ_64K,
+                                    &allocated_blocks, 0),
+                    "buddy_alloc hit an error size=%u\n", SZ_8K);
+
+    list_for_each_entry_safe(block, tmp, &allocated_blocks, link) {
+        if (count % 2 == 0)
+            list_move_tail(&block->link, &free_list);
+        count++;
+    }
+    drm_buddy_free_list(&mm, &free_list, DRM_BUDDY_CLEARED);
+
+    list_for_each_entry_safe_reverse(block, tmp, &allocated_blocks, link)
+        list_move(&block->link, &reverse_list);
+    drm_buddy_free_list(&mm, &reverse_list, DRM_BUDDY_CLEARED);
+
+    end = ktime_get();
+    elapsed_ms = ktime_to_ms(ktime_sub(end, start));
+
+    /* Performance validation */
+    KUNIT_EXPECT_LT_MSG(test, elapsed_ms, max_acceptable_time_ms,
+                "Reverse-ordered free took %lu ms (max acceptable: %lu ms)",
+                elapsed_ms, max_acceptable_time_ms);

Sorry for the delay. We are pretty sure these time asserts are not going to be flaky over many thousands of runs across different types of machines (maybe some underpowered atom)?
yes, correct. I have updated the performance test to avoid the hard coded timing thresholds. And the test now measures and reports execution time instead of enforcing a 1000ms limit, since run times vary across machines. This ensures the test remains portable and stable, while still exposing the performance data for regression tracking.

Regards,
Arun.

Assuming not a concern,
Reviewed-by: Matthew Auld <[email protected]>

+
+    drm_buddy_fini(&mm);
+}
+
  static void drm_test_buddy_alloc_range_bias(struct kunit *test)
  {
      u32 mm_size, size, ps, bias_size, bias_start, bias_end, bias_rem;
@@ -772,6 +881,7 @@ static struct kunit_case drm_buddy_tests[] = {
      KUNIT_CASE(drm_test_buddy_alloc_contiguous),
      KUNIT_CASE(drm_test_buddy_alloc_clear),
      KUNIT_CASE(drm_test_buddy_alloc_range_bias),
+    KUNIT_CASE(drm_test_buddy_fragmentation_performance),
      {}
  };


Reply via email to