amogh-jahagirdar commented on code in PR #15911:
URL: https://github.com/apache/iceberg/pull/15911#discussion_r3267283383
##########
core/src/main/java/org/apache/iceberg/deletes/BaseDVFileWriter.java:
##########
@@ -214,4 +252,47 @@ public StructLike partition() {
return partition;
}
}
+
+ private static class EncryptedOutputFileWrapper implements
EncryptedOutputFile, OutputFile {
Review Comment:
Why is this needed?
##########
core/src/main/java/org/apache/iceberg/deletes/BaseDVFileWriter.java:
##########
@@ -56,11 +63,12 @@ public class BaseDVFileWriter implements DVFileWriter {
private final Function<String, PositionDeleteIndex> loadPreviousDeletes;
private final Map<String, Deletes> deletesByPath = Maps.newHashMap();
private final Map<String, BlobMetadata> blobsByPath = Maps.newHashMap();
+ private EncryptionKeyMetadata keyMetadata = null;
private DeleteWriteResult result = null;
public BaseDVFileWriter(
OutputFileFactory fileFactory, Function<String, PositionDeleteIndex>
loadPreviousDeletes) {
- this(() -> fileFactory.newOutputFile().encryptingOutputFile(),
loadPreviousDeletes);
+ this(() -> asOutputFile(fileFactory.newOutputFile()), loadPreviousDeletes);
Review Comment:
Ok I had a bit of time to go through this a bit more. I didn't realize
EncryptedOutputFile doesn't extend OutputFile, it's an orthogonal interface
which just wraps an actual OutputFile with a key. It looks like that kind of
forced the wrapper below which is a bit much for what we're trying to achieve
here.
So after looking at this a bit more, I think what makes sense is to do the
following:
1. Add a package private constructor which takes
Supplier<EncryptedOutputFile>. I don't think we should change the public
constructors. I think one benefit with the disjoint types here is that if we're
given an OutputFile we can safely assume it's unencrypted.
2. The local state keeps track of Supplier<EncryptedOutputFile>. The
existing constructors can pass to this package-private constructor
EncryptedFiles.plainAsEncryptedOutput(OutputFile).
3. I would add a static factory helper to Deletes called `DVFileWriter
writeDVs(EncryptedOutputFile outputFile, Function<String, PositionDeleteIndex>
loadPreviousDeletes)` which calls out to the package private constructor. Over
time, we probably want to have another static factory for regular OutputFile
and then just hide the constructors on the writer, but I think that's for a
follow on and not this PR. I'm largely trying to mimic what we already do in
ManifestFiles writing APIs here for consistency.
4. Then we can invoke Deletes.writeDVs whereever is required, and remove the
wrapper below, and the key metadata can be abstracted from the encrypted
outputFile afterwards.
5. I think at least on any "read" APIs and helper methods we should only
require a FileIO and not an EncryptingFileIO because if it's an instance of
EncryptingFileIO it should already handle reading correctly so long as the
metadata is setup correctly on the FileIO. I think that'll shrink the change a
bit more too.
##########
core/src/main/java/org/apache/iceberg/deletes/BaseDVFileWriter.java:
##########
@@ -214,4 +252,47 @@ public StructLike partition() {
return partition;
}
}
+
+ private static class EncryptedOutputFileWrapper implements
EncryptedOutputFile, OutputFile {
Review Comment:
Ah okay, my bad on my previous review @manuzhang I didn't realize
EncryptedOutputFile actually doesn't even extend OutputFile. It looks like this
wrapper solely exists just to make EncryptedOuputFile adapt to
Supplier<OutputFile> in the constructor, and then later we extract it.
##########
core/src/main/java/org/apache/iceberg/DVUtil.java:
##########
@@ -159,7 +167,7 @@ private static void validateCanMerge(
* @return map of referenced data file location to the merged position
delete index
*/
private static Map<String, PositionDeleteIndex> readAndMergeDVs(
- DeleteFile[] duplicateDVs, FileIO io, ExecutorService pool) {
+ DeleteFile[] duplicateDVs, EncryptingFileIO io, ExecutorService pool) {
Review Comment:
Don't think this needs to change
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]