shehab-ali commented on PR #21262:
URL: https://github.com/apache/datafusion/pull/21262#issuecomment-4300376278

   This is not intentional. You are right; this is a concern. 
   
   How about we instead of blindly dumping all values, we deduplicate by Arc 
pointer and split UDFs into two groups:
   - those that were displaced from their canonical name key by another UDF 
(like the built-in that lost "to_char" to the custom)
   - those that still own their canonical name key (like the custom). 
   
   We emit displaced first, owners last.  So this guarantees that when build() 
re-registers them, owners always write last and their overrides are preserved 
deterministically, regardless of HashMap iteration order.
   
   `
   /// Collects unique UDFs from a function registry map, ordered so that
   /// [`SessionStateBuilder::build`] reproduces the original [`HashMap`] state 
exactly.
   ///
   /// The map stores one entry per canonical name and one per alias, all 
pointing to the same
   /// [`Arc`]. We deduplicate by pointer and emit in two passes:
   ///
   /// - **Displaced** (their canonical name key was taken by another UDF): 
register first so their
   ///   entries can be overwritten.
   /// - **Owners** (still own their canonical name key): register last so 
their overrides survive.
   ///
   /// This ensures that if a user registered a custom UDF as an alias override 
(e.g.
   /// `postgres_to_char` with alias `"to_char"`), the override is preserved 
after a roundtrip
   /// through [`SessionStateBuilder::new_from_existing`].
   fn udfs_for_roundtrip<T>(
       map: &HashMap<String, Arc<T>>,
       canonical_name: impl Fn(&T) -> &str,
   ) -> Vec<Arc<T>> {
       let mut seen: HashSet<*const T> = HashSet::new();
       let mut displaced: Vec<Arc<T>> = Vec::new();
       let mut owners: Vec<Arc<T>> = Vec::new();
   
      for udf in map.values() {
           if seen.insert(Arc::as_ptr(udf)) {
               let owns_key = map
                   .get(canonical_name(udf.as_ref()))
                   .is_some_and(|u| Arc::ptr_eq(u, udf));
               if owns_key {
                   owners.push(Arc::clone(udf));
               } else {
                   displaced.push(Arc::clone(udf));
               }
           }
       }
   
       displaced.extend(owners);
       displaced
   }
   `
   then we can set each function to 
   `
   scalar_functions: Some(udfs_for_roundtrip(&existing.scalar_functions, |u| {
                   u.name()
               })),
   ...
   `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to