dolfinus commented on code in PR #50126:
URL: https://github.com/apache/doris/pull/50126#discussion_r2062602744


##########
be/src/util/uuid_generator.h:
##########
@@ -17,18 +17,89 @@
 
 #pragma once
 
+#include <atomic>
 #include <boost/uuid/uuid.hpp>
 #include <boost/uuid/uuid_generators.hpp>
 #include <boost/uuid/uuid_io.hpp>
+#include <chrono>
+#include <cstdint>
 #include <mutex>
+#include <random>
 
 namespace doris {
 
+// Format:
+// We use UUID v7 (RFC 4122) for generating UUIDs.
+// UUIDv7 was chosen for the following benefits:
+// 1. Time-ordered - Contains a timestamp component that makes UUIDs sortable 
by generation time,
+//    which is valuable for query tracking, debugging, and performance analysis
+// 2. High performance - Efficient generation with minimal overhead
+// 3. Global uniqueness - Combines timestamp with random data to ensure 
uniqueness across
+//    distributed systems without coordination
+// 4. Database friendly - The time-ordered nature makes it more efficient for 
database indexing
+//    and storage compared to purely random UUIDs (like v4)
+// 5. Future-proof - Follows the latest UUID standard with improvements over 
older versions
+
+// Note: Our implementation differs slightly from the standard UUIDv7 
specification by
+// using a counter instead of random bits in the "rand_a" field to further 
enhance
+// uniqueness when generating multiple UUIDs in rapid succession.
+
+// 0                   1                   2                   3
+// 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+// |                           unix_ts_ms                          |
+// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+// |          unix_ts_ms           |  ver  |       counter         |
+// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+// |var|                        rand_b                             |
+// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+// |                            rand_b                             |
+// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
 class UUIDGenerator {
 public:
     boost::uuids::uuid next_uuid() {
         std::lock_guard<std::mutex> lock(_uuid_gen_lock);
-        return _boost_uuid_generator();
+
+        auto now = std::chrono::system_clock::now();
+        uint64_t millis =
+                
std::chrono::duration_cast<std::chrono::milliseconds>(now.time_since_epoch())
+                        .count();
+
+        uint16_t counter = _counter.fetch_add(1, std::memory_order_relaxed) & 
0xFF;
+
+        // Generate random bits
+        std::random_device rd;
+        std::mt19937_64 gen(rd());
+        std::uniform_int_distribution<uint64_t> dis;
+        uint64_t random = dis(gen);

Review Comment:
   ```suggestion
           std::uniform_int_distribution<uint64_t> dis;
           uint64_t random = dis(gen);
   ```
   Random generator should be initialized only once, and then reused. Otherwise 
it will be very slow



##########
be/src/util/uuid_generator.h:
##########
@@ -37,8 +108,8 @@ class UUIDGenerator {
     }
 
 private:
-    boost::uuids::basic_random_generator<boost::mt19937> _boost_uuid_generator;
     std::mutex _uuid_gen_lock;
+    std::atomic<uint16_t> _counter {0};

Review Comment:
   ```suggestion
       std::atomic<uint16_t> _counter {0};
       std::random_device rd;
       std::mt19937_64 gen { std::mt19937_64(rd()) };
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to