kosiew opened a new pull request, #22770:
URL: https://github.com/apache/datafusion/pull/22770

   ## Which issue does this PR close?
   
   * Part of #22669
   
   ## Rationale for this change
   
   `regexp_count_inner` currently uses an 8-arm match over combinations of 
scalar and array inputs for regex, start position, and flags. Many of these 
branches duplicate the same logic for length validation, null handling, regex 
compilation/cache lookup, and match counting.
   
   This refactor reduces duplication and centralizes row processing while 
preserving existing SQL-visible behavior, error messages, error ordering, and 
regex cache usage.
   
   ## What changes are included in this PR?
   
   * Replaced the large 8-arm scalar/array match in `regexp_count_inner` with a 
unified row-processing implementation.
   * Added private helper abstractions:
   
     * `StringValueSource` for scalar-or-array string arguments.
     * `StartValueSource` for scalar-or-array start arguments.
     * `string_value_opt` for null-aware string access.
     * `validate_array_len` for centralized length validation.
     * `compile_scalar_pattern` for compiling reusable scalar regex/flags 
combinations once.
   * Preserved existing scalar `NULL` regex short-circuit behavior by returning 
a zero-filled result array before validating other arguments.
   * Preserved regex cache reuse through `compile_and_cache_regex`.
   * Removed the dependency on `itertools::izip` by consolidating processing 
into a single row loop.
   * Kept outer type dispatch and public interfaces unchanged.
   
   ## Are these changes tested?
   
   Yes.
   
   Added tests covering behavior that is sensitive to validation and error 
ordering:
   
   * `test_regexp_count_error_order_invalid_scalar_regex_before_start_len`
   * `test_regexp_count_error_order_flags_len_before_start_len`
   
   Existing `regexp_count` tests continue to run unchanged. 
   
   ## Are there any user-facing changes?
   
   No.
   
   This change is an internal refactor intended to preserve existing behavior, 
error messages, validation ordering, and SQL-visible results.
   
   ## LLM-generated code disclosure
   
   This PR includes LLM-generated code and comments. All LLM-generated content 
has been manually reviewed and tested.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to