morningman opened a new issue #4083: URL: https://github.com/apache/incubator-doris/issues/4083
We found that for the same set of integer data. After loading using `bitmap_hash()`, the performance of query using `bitmap_union_count is obviously` bad. So I did the following test: 1. Generate 1 million random uint32 numbers. 2. Sort the 1 million numbers to generate 1 million BitmapValue, and then perform the OR operation. 2. Randomly scramble these 1 million numbers, then generate 1 million BitmapValue, and then perform OR operation. The final result shows that the sequential execution time is 1.8 seconds, while the random execution time is 4.4 seconds. Test case as follow: ``` TEST(BitmapValueTest, perf) { // 1. generate 1000000 random uint32 std::vector<uint32_t> numbers; Random rand(0); for (int i = 100000000; i < 101000000; i++) { numbers.push_back(rand.Next()); } // 2. sort the numbers std::sort(numbers.begin(), numbers.end()); std::vector<BitmapValue> v1; for (int i = 0; i < numbers.size(); i++) { v1.emplace_back(numbers[i]); } // 3. test bitmap or operation with sorted integers MonotonicStopWatch w; w.start(); BitmapValue b = v1[0]; for (int i = 1; i < v1.size(); i++) { b &= v1[i]; } std::cout << "get seq bitmap or cost: " << w.elapsed_time() << std::endl; // 4. shuffle the numbers std::random_shuffle (numbers.begin(), numbers.end()); std::sort(numbers.begin(), numbers.end()); std::vector<BitmapValue> v2; for (int i = 0; i < numbers.size(); i++) { v2.emplace_back(numbers[i]); } // 3. test bitmap or operation with random shuffled integers MonotonicStopWatch w2; w2.start(); BitmapValue b2 = v2[0]; for (int i = 1; i < v2.size(); i++) { b2 &= v2[i]; } std::cout << "get random bitmap or cost: " << w2.elapsed_time(); } ``` From the user level, in some scenarios, the efficiency of bitmap may be significantly lower than `ndv` or even directly `count dinstinct`. I don't know the specific reason yet and need further investigation. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org