anjakefala opened a new issue, #43716:
URL: https://github.com/apache/arrow/issues/43716

   ### Describe the enhancement requested
   
   Acero's Hash Join does not support `ListType` in non-key fields for a hash 
join: 
https://github.com/apache/arrow/blob/main/cpp/src/arrow/acero/hash_join_node.cc#L48
 . This is a request to add that support.
   
   PyArrow code that reproduces here:
   
   ```
   import pyarrow as pa
   import pyarrow.acero as acero
   
   # Creating the Arrow tables
   basic_tbl = pa.table({'x': [1, 2, 3], 'y': ['a', 'b', 'c']})
   basic_tbl_src = acero.Declaration("table_source", 
options=acero.TableSourceNodeOptions(basic_tbl))
   
   basic_tbl2 = pa.table({'x': [1, 2, 3], 'z': [True, False, True]})
   basic_tbl2_src = acero.Declaration("table_source", 
options=acero.TableSourceNodeOptions(basic_tbl2))
   
   list_tbl = pa.table({'z': [['first', 'list', 'col', 'row'], ['second row', 
'here']], 'x': [1, 2]})
   list_tbl_src = acero.Declaration("table_source", 
options=acero.TableSourceNodeOptions(list_tbl))
   
   join_keys = ["x"]
   
   hash_join_options = acero.HashJoinNodeOptions('left outer', 
left_keys=join_keys, right_keys=join_keys)
   
   joined = acero.Declaration(
           "hashjoin", options=hash_join_options, inputs=[basic_tbl_src, 
basic_tbl2_src])
   
   result = joined.to_table()
   print(result)
   
   
   # list table
   joined = acero.Declaration(
           "hashjoin", options=hash_join_options, inputs=[basic_tbl_src, 
list_tbl_src])
   
   result = joined.to_table()
   print(result)
   
   ```
   
   R code here: https://issues.apache.org/jira/browse/ARROW-14519
   
   In [that link](https://issues.apache.org/jira/browse/ARROW-14519), the 
reason there currently isn't support was noted:
   
   > We cannot easily support more types in hash join right now. That is 
because we transform and encode all the input values, key and non-key 
(row_encoder.h), so it would need another specialization for each additional 
type.
   
   So to add this support, it seems like we will need to add the specialisation 
for the encoding of `ListType`.
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to