Re: [PR] Spark 3.5: Make ColumnVectorWithFilter generic and refactor batch load [iceberg]

via GitHub Fri, 24 Jan 2025 09:54:40 -0800


aokolnychyi commented on code in PR #12056:
URL: https://github.com/apache/iceberg/pull/12056#discussion_r1929047717



##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ColumnVectorWithFilter.java:
##########
@@ -18,78 +18,138 @@
  */
 package org.apache.iceberg.spark.data.vectorized;
 
-import org.apache.iceberg.arrow.vectorized.VectorHolder;
 import org.apache.spark.sql.types.Decimal;
+import org.apache.spark.sql.types.StructType;
+import org.apache.spark.sql.vectorized.ColumnVector;
 import org.apache.spark.sql.vectorized.ColumnarArray;
+import org.apache.spark.sql.vectorized.ColumnarMap;
 import org.apache.spark.unsafe.types.UTF8String;
 
-public class ColumnVectorWithFilter extends IcebergArrowColumnVector {
+/**
+ * A column vector implementation that applies row-level filtering.
+ *
+ * <p>This class wraps an existing column vector and uses a row ID mapping 
array to remap row
+ * indices during data access. Each method that retrieves data for a specific 
row translates the
+ * provided row index using the mapping array, effectively filtering the 
original data to only
+ * expose the live subset of rows. This approach allows efficient row-level 
filtering without
+ * modifying the underlying data.
+ */
+public class ColumnVectorWithFilter extends ColumnVector {

Review Comment:
   This is generic now and works with any `ColumnVector` implementation, not 
with just our Arrow-based one.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark 3.5: Make ColumnVectorWithFilter generic and refactor batch load [iceberg]

Reply via email to