yashrb24 opened a new issue, #21248:
URL: https://github.com/apache/datafusion/issues/21248

   ### Describe the bug
   
   `ArrowBytesMap` and `ArrowBytesViewMap` allocate their hash tables with 
`HashTable::with_capacity(INITIAL_MAP_CAPACITY)` (128 and 512 entries 
respectively) but initialize `map_size` to `0`. The `size()` method returns 
`map_size` as the hash table's contribution to total memory.
   
   The `insert_accounted` method (from `HashTableAllocExt`) only adds to 
`map_size` when the table needs to grow beyond its current capacity. Since the 
initial allocation happens in `with_capacity`, not through `insert_accounted`, 
those bytes are never counted — `size()` understates memory usage until the 
first resize.
   
   ```rust
   // binary_map.rs — INITIAL_MAP_CAPACITY = 128
   // binary_view_map.rs — INITIAL_MAP_CAPACITY = 512
   pub fn new(output_type: OutputType) -> Self {
       Self {
           map: 
hashbrown::hash_table::HashTable::with_capacity(INITIAL_MAP_CAPACITY),
           map_size: 0,  // ← should be map.allocation_size()
           // ...
       }
   }
   ```
   
   `insert_accounted` only tracks growth:
   ```rust
   fn insert_accounted(&mut self, x: Self::T, hasher: impl Fn(&Self::T) -> u64, 
accounting: &mut usize) {
       if self.len() == self.capacity() {
           let bump_size = self.capacity().max(16) * size_of::<T>();
           *accounting = 
(*accounting).checked_add(bump_size).expect("overflow");
           self.reserve(bump_elements, &hasher);
       }
       self.insert_unique(hash, x, hasher);
   }
   ```
   
   
   
   ### To Reproduce
   
   N/A
   
   ### Expected behavior
   
   `map_size` should be initialized with `map.allocation_size()` to account for 
the pre-allocated memory from `with_capacity`.
   
   
   ### Additional context
   
   Two other uses of `map_size: 0` in the codebase (`row.rs` and 
`multi_group_by/mod.rs`) use `HashTable::with_capacity(0)` which allocates 
nothing, so they are already correct.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to