rameshkanna3 opened a new issue, #335:
URL: https://github.com/apache/iceberg-go/issues/335

   ### Apache Iceberg version
   
   None
   
   ### Please describe the bug 🐞
   
   When using the `IsIn` filter in **Iceberg-Go**, filtering on a **single 
integer value works correctly**, but filtering on **multiple values returns an 
empty result**, even when matching records exist in the table.  
   
   The issue appears to be related to how the `IsIn` predicate is constructed 
in Go, as it differs from the expected format observed in **PyIceberg**.  
   
   ---
   
   ## **Observations:**  
   
   ### ✅ **Case 1: Passing a Single Value (Works Correctly)**
   ```go
   docID := []int64{10} // Single value
   
   userFilter := iceberg.EqualTo(iceberg.Reference("user"), "ibmlhadmin")
   fmt.Println("userFilter: ", userFilter)
   
   docIDFilter := iceberg.IsIn(iceberg.Reference("doc_id"), docID...)
   fmt.Println("docIDFilter: ", docIDFilter)
   
   combinedFilter := iceberg.NewAnd(userFilter, docIDFilter)
   fmt.Println("combinedFilter: ", combinedFilter)
   
   scanner := scannertable.Scan(
       table.WithRowFilter(combinedFilter),
       table.WithSelectedFields("doc_id"),
   )
   ```
   ### Output:
   ```
   userFilter: Equal(term=Reference(name='user'), literal=ibmlhadmin)
   docIDFilter: Equal(term=Reference(name='doc_id'), literal=10)
   combinedFilter: And(left=Equal(term=Reference(name='user'), 
literal=ibmlhadmin), right=Equal(term=Reference(name='doc_id'), literal=10))
   Collected doc_ids: [10] ✅
   
   ```
   
   ### ❌ Case 2: Passing Multiple Values (Not Working)
   ```
   docID := []int64{10, 20} // Multiple values
   
   userFilter := iceberg.EqualTo(iceberg.Reference("user"), "ibmlhadmin")
   fmt.Println("userFilter: ", userFilter)
   
   docIDFilter := iceberg.IsIn(iceberg.Reference("doc_id"), docID...)
   fmt.Println("docIDFilter: ", docIDFilter)
   
   combinedFilter := iceberg.NewAnd(userFilter, docIDFilter)
   fmt.Println("combinedFilter: ", combinedFilter)
   
   scanner := scannertable.Scan(
       table.WithRowFilter(combinedFilter),
       table.WithSelectedFields("doc_id"),
   )
   
   ```
   ### Output:
   ```
   userFilter: Equal(term=Reference(name='user'), literal=ibmlhadmin)
   docIDFilter: In(term=Reference(name='doc_id'), {[10 20]})
   combinedFilter: And(left=Equal(term=Reference(name='user'), 
literal=ibmlhadmin), right=In(term=Reference(name='doc_id'), {[10 20]}))
   Collected doc_ids: [] ❌
   ```
   
   
   
   🚨 Issue: Even though doc_id values 10 and 20 exist in the table, the filter 
returns an empty result instead of the expected rows.
   
   
   ### Comparison with PyIceberg (Works as Expected):
   
   Using the same filter logic in PyIceberg, multiple values work correctly:
   
   ```
   def filter_data(table):
       user_filter = EqualTo("user", "alice")
   
       doc_id=[10, 20,30,60]
       doc_id_filter = In("doc_id", doc_id)
       print("doc_id_filter",doc_id_filter)
   
       combined_filter = And(user_filter, doc_id_filter)
       print("combined_filter",combined_filter)
   
       filtered_data = 
table.scan(row_filter=combined_filter,selected_fields=["doc_id"]).to_pandas()
       doc_id_list = filtered_data["doc_id"].tolist()
       print(doc_id_list)
   ```
   The above python code give the results like this 
   
   ```
   doc_id_filter: In(Reference(name='doc_id'), {10, 20, 30, 60})
   combined_filter: And(left=EqualTo(term=Reference(name='user'), 
literal=literal('ibmlhadmin')), right=In(Reference(name='doc_id'), {10, 20, 30, 
60}))
   Collected doc_ids: [10, 20, 60]
   ```
   
   ### Suspected Issue:
   
   The IsIn predicate in Iceberg-Go appears to be formatted incorrectly:
   
   - Iceberg-Go: In(term=Reference(name='doc_id'), {[10 20]}) (using a slice [])
   
   - PyIceberg: In(Reference(name='doc_id'), {10, 20, 30, 60}) (using a set {})
   
   This mismatch likely causes the filter to fail, leading to an empty result.
   
   ### Expected Behavior:
   
   - IsIn should correctly format the values in a set-like structure (similar 
to PyIceberg).
   - Queries should return matching rows instead of an empty result when valid 
data exists in the table.
   
   ### Question:
   
   Is this the correct way to filter data in Iceberg-Go, or is there a 
different approach I should be using? If this is a bug, what would be the 
recommended fix?
   Would appreciate any guidance on this! Thanks in advance. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to