dolfinus commented on code in PR #50126: URL: https://github.com/apache/doris/pull/50126#discussion_r2062602744
########## be/src/util/uuid_generator.h: ########## @@ -17,18 +17,89 @@ #pragma once +#include <atomic> #include <boost/uuid/uuid.hpp> #include <boost/uuid/uuid_generators.hpp> #include <boost/uuid/uuid_io.hpp> +#include <chrono> +#include <cstdint> #include <mutex> +#include <random> namespace doris { +// Format: +// We use UUID v7 (RFC 4122) for generating UUIDs. +// UUIDv7 was chosen for the following benefits: +// 1. Time-ordered - Contains a timestamp component that makes UUIDs sortable by generation time, +// which is valuable for query tracking, debugging, and performance analysis +// 2. High performance - Efficient generation with minimal overhead +// 3. Global uniqueness - Combines timestamp with random data to ensure uniqueness across +// distributed systems without coordination +// 4. Database friendly - The time-ordered nature makes it more efficient for database indexing +// and storage compared to purely random UUIDs (like v4) +// 5. Future-proof - Follows the latest UUID standard with improvements over older versions + +// Note: Our implementation differs slightly from the standard UUIDv7 specification by +// using a counter instead of random bits in the "rand_a" field to further enhance +// uniqueness when generating multiple UUIDs in rapid succession. + +// 0 1 2 3 +// 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +// | unix_ts_ms | +// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +// | unix_ts_ms | ver | counter | +// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +// |var| rand_b | +// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +// | rand_b | +// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + class UUIDGenerator { public: boost::uuids::uuid next_uuid() { std::lock_guard<std::mutex> lock(_uuid_gen_lock); - return _boost_uuid_generator(); + + auto now = std::chrono::system_clock::now(); + uint64_t millis = + std::chrono::duration_cast<std::chrono::milliseconds>(now.time_since_epoch()) + .count(); + + uint16_t counter = _counter.fetch_add(1, std::memory_order_relaxed) & 0xFF; + + // Generate random bits + std::random_device rd; + std::mt19937_64 gen(rd()); + std::uniform_int_distribution<uint64_t> dis; + uint64_t random = dis(gen); Review Comment: ```suggestion std::uniform_int_distribution<uint64_t> dis; uint64_t random = dis(gen); ``` Random generator should be initialized only once, and then reused. Otherwise it will be very slow ########## be/src/util/uuid_generator.h: ########## @@ -37,8 +108,8 @@ class UUIDGenerator { } private: - boost::uuids::basic_random_generator<boost::mt19937> _boost_uuid_generator; std::mutex _uuid_gen_lock; + std::atomic<uint16_t> _counter {0}; Review Comment: ```suggestion std::atomic<uint16_t> _counter {0}; std::random_device rd; std::mt19937_64 gen { std::mt19937_64(rd()) }; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org