pvary commented on code in PR #12774: URL: https://github.com/apache/iceberg/pull/12774#discussion_r2100237439
########## core/src/main/java/org/apache/iceberg/io/WriteBuilder.java: ########## @@ -0,0 +1,120 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.io; + +import java.io.IOException; +import java.nio.ByteBuffer; +import java.util.Map; +import org.apache.iceberg.FileContent; +import org.apache.iceberg.MetricsConfig; +import org.apache.iceberg.Schema; + +/** + * Builder interface for creating file writers across supported data file formats. Each {@link + * FileAccessFactory} implementation provides appropriate {@link WriteBuilder} instances based on: + * + * <ul> + * <li>target file format (Parquet, Avro, ORC) + * <li>engine-specific object representation (spark, flink, generic, etc.) + * <li>content type ({@link FileContent#DATA}, {@link FileContent#EQUALITY_DELETES}, {@link + * FileContent#POSITION_DELETES}) + * </ul> + * + * The {@link WriteBuilder} follows the builder pattern to configure and create {@link FileAppender} + * instances that write data to the target output files. + * + * @param <B> the concrete builder type for method chaining + * @param <E> engine-specific schema type for the input data records + */ +public interface WriteBuilder<B extends WriteBuilder<B, E>, E> { + /** Set the file schema. */ + B schema(Schema newSchema); + + /** + * Set a writer configuration property which affects the writer behavior. + * + * @param property a writer config property name + * @param value config value + * @return this for method chaining + */ + B set(String property, String value); + + default B set(Map<String, String> properties) { + properties.forEach(this::set); + return (B) this; + } + + /** + * Set a file metadata property in the created file. + * + * @param property a file metadata property name + * @param value config value + * @return this for method chaining + */ + B meta(String property, String value); + + /** Sets the metrics configuration used for collecting column metrics for the created file. */ + B metricsConfig(MetricsConfig newMetricsConfig); + + /** Overwrite the file if it already exists. By default, overwrite is disabled. */ + B overwrite(); + + /** + * Overwrite the file if it already exists. The default value is <code>false</code>. + * + * @deprecated Since 1.10.0, will be removed in 1.11.0. Only provided for backward compatibility. + * Use {@link #overwrite()} instead. + */ + @Deprecated + B overwrite(boolean enabled); + + /** + * Sets the encryption key used for writing the file. If the reader does not support encryption, + * then an exception should be thrown. + */ + default B fileEncryptionKey(ByteBuffer encryptionKey) { + throw new UnsupportedOperationException("Not supported"); + } + + /** + * Sets the additional authentication data (aad) prefix used for writing the file. If the reader + * does not support encryption, then an exception should be thrown. + */ + default B aadPrefix(ByteBuffer aadPrefix) { + throw new UnsupportedOperationException("Not supported"); + } + + /** + * Sets the engine-specific schema for the input data records. + * + * <p>This method is necessary when the mapping between engine types and Iceberg types is not + * one-to-one. For example, when multiple engine types could map to the same Iceberg type, or when + * schema metadata beyond the structure is needed to properly interpret the data. + * + * <p>While the Iceberg schema defines the expected output structure, the engine schema provides + * the exact input format details needed for proper type conversion. + * + * @param newEngineSchema the native schema representation from the engine (Spark, Flink, etc.) + * @return this builder for method chaining + */ + B dataSchema(E newEngineSchema); + + /** Finalizes the configuration and builds the {@link FileAppender}. */ + <D> FileAppender<D> build() throws IOException; Review Comment: The `WriteBuilder` appender will change the input type based on the ContentType parameter passed on the constructor. If `D` is the type of the object model records, then for `DATA` and `EQUALITY_DELETES` the appender will expect `D`, but for `POSITION_DELETES` it will expect `PositionDelete<D>`. Also this would change the method signature for the existing ORC/Parquet/Avro`<D> CloseableIterable<D> ReadBuilder.build()`, <D> FileAppender<D> WriteBuilder.build()` methods. This makes creating backward compatible change even more convoluted, as we will need to create a new class, and map all the calls in the old one. We could explore this direction as well, but this deviates even further from the current solution, and we need strong community approval for it. One possibility what I can see is to extend the `FileAccessFactory` API: ``` public interface FileAccessFactory<E, D> { FileFormat format(); String objectModeName(); <B extends ReadBuilder<B, D>> B readBuilder(InputFile inputFile); // For Writing D (Data and equality delete) <B extends WriteBuilder<B, E, D>> B writeBuilder(OutputFile outputFile, FileContent content); // For Writing PositionDelete<D> <B extends WriteBuilder<B, E, PositionDelete<D>>> B positionDeleteWriteBuilder(OutputFile outputFile, FileContent content); } ``` I agree that the API would be nicer/cleaner. My question is "does it worth it"? Will I have reviewers how check the changes for this level of change? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org