aokolnychyi commented on code in PR #11195:
URL: https://github.com/apache/iceberg/pull/11195#discussion_r1777999151


##########
api/src/main/java/org/apache/iceberg/util/WrapperSet.java:
##########
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.util;
+
+import java.io.Serializable;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.Set;
+import java.util.stream.Collectors;
+import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
+import org.apache.iceberg.relocated.com.google.common.collect.Iterators;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.relocated.com.google.common.collect.Sets;
+
+/**
+ * A custom set for a {@link Wrapper} of the given type that maintains 
insertion order
+ *
+ * @param <T> The type to wrap in a {@link Wrapper} instance.
+ */
+abstract class WrapperSet<T> implements Set<T>, Serializable {

Review Comment:
   I'd consider implementing `Serializable` in each child separately as 
children won't be necessarily serializable.



##########
api/src/main/java/org/apache/iceberg/util/WrapperSet.java:
##########
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.util;
+
+import java.io.Serializable;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.Set;
+import java.util.stream.Collectors;
+import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
+import org.apache.iceberg.relocated.com.google.common.collect.Iterators;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.relocated.com.google.common.collect.Sets;
+
+/**
+ * A custom set for a {@link Wrapper} of the given type that maintains 
insertion order
+ *
+ * @param <T> The type to wrap in a {@link Wrapper} instance.
+ */
+abstract class WrapperSet<T> implements Set<T>, Serializable {
+  private final Set<Wrapper<T>> set;
+
+  protected WrapperSet(Iterable<Wrapper<T>> wrappers) {
+    this.set = Sets.newLinkedHashSet(wrappers);
+  }
+
+  protected abstract Wrapper<T> wrapper();
+
+  protected abstract Wrapper<T> wrap(T file);
+
+  protected abstract boolean isInstance(Object obj);
+
+  protected interface Wrapper<T> {

Review Comment:
   Question: We usually have static classes/interfaces at the bottom. Do we put 
it here to co-locate with the rest of APIs that have to be implemented? Just 
making sure it is intentional.



##########
api/src/main/java/org/apache/iceberg/util/DataFileSet.java:
##########
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.util;
+
+import java.util.Objects;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
+import org.apache.iceberg.relocated.com.google.common.collect.Sets;
+import org.apache.iceberg.types.Comparators;
+
+public class DataFileSet extends WrapperSet<DataFile> {
+  private static final ThreadLocal<DataFileWrapper> WRAPPERS =
+      ThreadLocal.withInitial(() -> DataFileWrapper.wrap(null));
+
+  protected DataFileSet(Iterable<Wrapper<DataFile>> wrappers) {
+    super(wrappers);
+  }
+
+  public static DataFileSet empty() {
+    return new DataFileSet(Sets.newLinkedHashSet());
+  }
+
+  public static DataFileSet of(Iterable<DataFile> iterable) {
+    return new DataFileSet(
+        Sets.newLinkedHashSet(Iterables.transform(iterable, 
DataFileWrapper::wrap)));
+  }
+
+  @Override
+  protected Wrapper<DataFile> wrapper() {
+    return WRAPPERS.get();
+  }
+
+  @Override
+  protected Wrapper<DataFile> wrap(DataFile dataFile) {
+    return DataFileWrapper.wrap(dataFile);
+  }
+
+  @Override
+  protected boolean isInstance(Object obj) {
+    return obj instanceof DataFile;
+  }
+
+  private static class DataFileWrapper implements Wrapper<DataFile> {
+    private DataFile file;
+
+    private DataFileWrapper(DataFile file) {
+      this.file = file;
+    }
+
+    private static DataFileWrapper wrap(DataFile dataFile) {
+      return new DataFileWrapper(dataFile);
+    }
+
+    @Override
+    public DataFile get() {
+      return file;
+    }
+
+    @Override
+    public Wrapper<DataFile> set(DataFile dataFile) {
+      this.file = dataFile;
+      return this;
+    }
+
+    @Override
+    public boolean equals(Object o) {
+      if (this == o) {
+        return true;
+      }
+
+      if (!(o instanceof DataFileWrapper)) {
+        return false;
+      }
+
+      DataFileWrapper that = (DataFileWrapper) o;
+      if (null == file && null == that.file) {
+        return true;
+      }
+
+      if (null == file || null == that.file) {
+        return false;
+      }
+
+      return 0 == Comparators.charSequences().compare(file.location(), 
that.file.location());

Review Comment:
   Locations are Strings, can we simply use `Objects.equals()`?



##########
api/src/main/java/org/apache/iceberg/util/DataFileSet.java:
##########
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.util;
+
+import java.util.Objects;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
+import org.apache.iceberg.relocated.com.google.common.collect.Sets;
+import org.apache.iceberg.types.Comparators;
+
+public class DataFileSet extends WrapperSet<DataFile> {
+  private static final ThreadLocal<DataFileWrapper> WRAPPERS =
+      ThreadLocal.withInitial(() -> DataFileWrapper.wrap(null));
+
+  protected DataFileSet(Iterable<Wrapper<DataFile>> wrappers) {
+    super(wrappers);
+  }
+
+  public static DataFileSet empty() {
+    return new DataFileSet(Sets.newLinkedHashSet());
+  }
+
+  public static DataFileSet of(Iterable<DataFile> iterable) {
+    return new DataFileSet(
+        Sets.newLinkedHashSet(Iterables.transform(iterable, 
DataFileWrapper::wrap)));

Review Comment:
   Doesn't the parent materialize it as `Set`? Should be enough to just call 
`transform`.



##########
api/src/main/java/org/apache/iceberg/util/DeleteFileSet.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.util;
+
+import java.util.Objects;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
+import org.apache.iceberg.relocated.com.google.common.collect.Sets;
+import org.apache.iceberg.types.Comparators;
+
+public class DeleteFileSet extends WrapperSet<DeleteFile> {

Review Comment:
   Similar comments apply here.



##########
api/src/main/java/org/apache/iceberg/util/WrapperSet.java:
##########
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.util;
+
+import java.io.Serializable;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.Set;
+import java.util.stream.Collectors;
+import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
+import org.apache.iceberg.relocated.com.google.common.collect.Iterators;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.relocated.com.google.common.collect.Sets;
+
+/**
+ * A custom set for a {@link Wrapper} of the given type that maintains 
insertion order

Review Comment:
   Minor: Do we need `.` at the end?



##########
api/src/main/java/org/apache/iceberg/util/DataFileSet.java:
##########
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.util;
+
+import java.util.Objects;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
+import org.apache.iceberg.relocated.com.google.common.collect.Sets;
+import org.apache.iceberg.types.Comparators;
+
+public class DataFileSet extends WrapperSet<DataFile> {
+  private static final ThreadLocal<DataFileWrapper> WRAPPERS =
+      ThreadLocal.withInitial(() -> DataFileWrapper.wrap(null));
+
+  protected DataFileSet(Iterable<Wrapper<DataFile>> wrappers) {
+    super(wrappers);
+  }
+
+  public static DataFileSet empty() {
+    return new DataFileSet(Sets.newLinkedHashSet());
+  }
+
+  public static DataFileSet of(Iterable<DataFile> iterable) {

Review Comment:
   What about `Iterable<? extends DataFile>` so that we can pass any subclasses?



##########
api/src/main/java/org/apache/iceberg/util/WrapperSet.java:
##########
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.util;
+
+import java.io.Serializable;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.Set;
+import java.util.stream.Collectors;
+import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
+import org.apache.iceberg.relocated.com.google.common.collect.Iterators;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.relocated.com.google.common.collect.Sets;
+
+/**
+ * A custom set for a {@link Wrapper} of the given type that maintains 
insertion order
+ *
+ * @param <T> The type to wrap in a {@link Wrapper} instance.
+ */
+abstract class WrapperSet<T> implements Set<T>, Serializable {
+  private final Set<Wrapper<T>> set;
+
+  protected WrapperSet(Iterable<Wrapper<T>> wrappers) {
+    this.set = Sets.newLinkedHashSet(wrappers);
+  }
+
+  protected abstract Wrapper<T> wrapper();
+
+  protected abstract Wrapper<T> wrap(T file);
+
+  protected abstract boolean isInstance(Object obj);

Review Comment:
   I would either call it like `canStore` to make the name more generic (to 
allow some sets support null values, for instance) or remove this method and 
always cast. The `Set` API allows `ClassCastException` in all methods where we  
currently invoke this one.
   
   ```
   ClassCastException – if the class of an element of this set is incompatible 
with the specified collection (optional)
   ```
   



##########
api/src/main/java/org/apache/iceberg/util/DataFileSet.java:
##########
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.util;
+
+import java.util.Objects;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
+import org.apache.iceberg.relocated.com.google.common.collect.Sets;
+import org.apache.iceberg.types.Comparators;
+
+public class DataFileSet extends WrapperSet<DataFile> {
+  private static final ThreadLocal<DataFileWrapper> WRAPPERS =
+      ThreadLocal.withInitial(() -> DataFileWrapper.wrap(null));
+
+  protected DataFileSet(Iterable<Wrapper<DataFile>> wrappers) {
+    super(wrappers);
+  }
+
+  public static DataFileSet empty() {
+    return new DataFileSet(Sets.newLinkedHashSet());
+  }
+
+  public static DataFileSet of(Iterable<DataFile> iterable) {
+    return new DataFileSet(
+        Sets.newLinkedHashSet(Iterables.transform(iterable, 
DataFileWrapper::wrap)));
+  }
+
+  @Override
+  protected Wrapper<DataFile> wrapper() {
+    return WRAPPERS.get();
+  }
+
+  @Override
+  protected Wrapper<DataFile> wrap(DataFile dataFile) {
+    return DataFileWrapper.wrap(dataFile);
+  }
+
+  @Override
+  protected boolean isInstance(Object obj) {
+    return obj instanceof DataFile;

Review Comment:
   Careful with nulls.



##########
api/src/main/java/org/apache/iceberg/util/DataFileSet.java:
##########
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.util;
+
+import java.util.Objects;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
+import org.apache.iceberg.relocated.com.google.common.collect.Sets;
+import org.apache.iceberg.types.Comparators;
+
+public class DataFileSet extends WrapperSet<DataFile> {
+  private static final ThreadLocal<DataFileWrapper> WRAPPERS =
+      ThreadLocal.withInitial(() -> DataFileWrapper.wrap(null));
+
+  protected DataFileSet(Iterable<Wrapper<DataFile>> wrappers) {
+    super(wrappers);
+  }
+
+  public static DataFileSet empty() {
+    return new DataFileSet(Sets.newLinkedHashSet());

Review Comment:
   This can be any iterable too as the parent always creates a set.



##########
api/src/main/java/org/apache/iceberg/util/WrapperSet.java:
##########
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.util;
+
+import java.io.Serializable;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.Set;
+import java.util.stream.Collectors;
+import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
+import org.apache.iceberg.relocated.com.google.common.collect.Iterators;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.relocated.com.google.common.collect.Sets;
+
+/**
+ * A custom set for a {@link Wrapper} of the given type that maintains 
insertion order
+ *
+ * @param <T> The type to wrap in a {@link Wrapper} instance.
+ */
+abstract class WrapperSet<T> implements Set<T>, Serializable {
+  private final Set<Wrapper<T>> set;
+
+  protected WrapperSet(Iterable<Wrapper<T>> wrappers) {
+    this.set = Sets.newLinkedHashSet(wrappers);
+  }
+
+  protected abstract Wrapper<T> wrapper();
+
+  protected abstract Wrapper<T> wrap(T file);
+
+  protected abstract boolean isInstance(Object obj);
+
+  protected interface Wrapper<T> {
+    T get();
+
+    Wrapper<T> set(T object);
+  }
+
+  @Override
+  public int size() {
+    return set.size();
+  }
+
+  @Override
+  public boolean isEmpty() {
+    return set.isEmpty();
+  }
+
+  @SuppressWarnings("unchecked")
+  @Override
+  public boolean contains(Object obj) {
+    if (isInstance(obj)) {
+      Wrapper<T> wrapper = wrapper();
+      boolean result = set.contains(wrapper.set((T) obj));
+      wrapper.set(null); // don't hold a reference to the value
+      return result;
+    }
+
+    return false;
+  }
+
+  @Override
+  public Iterator<T> iterator() {
+    return Iterators.transform(set.iterator(), Wrapper::get);
+  }
+
+  @Override
+  public Object[] toArray() {
+    return Lists.newArrayList(iterator()).toArray();
+  }
+
+  @Override
+  public <X> X[] toArray(X[] destArray) {
+    return Lists.newArrayList(iterator()).toArray(destArray);
+  }
+
+  @Override
+  public boolean add(T obj) {
+    return set.add(wrap(obj));

Review Comment:
   I suspect that we can add `null` but not retrieve it back. It is up to us 
whether we want to support nulls but it has to be consistent. It is valid to 
throw `NullPointerException` if the set does not support nulls.



##########
api/src/main/java/org/apache/iceberg/util/DataFileSet.java:
##########
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.util;
+
+import java.util.Objects;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
+import org.apache.iceberg.relocated.com.google.common.collect.Sets;
+import org.apache.iceberg.types.Comparators;
+
+public class DataFileSet extends WrapperSet<DataFile> {
+  private static final ThreadLocal<DataFileWrapper> WRAPPERS =
+      ThreadLocal.withInitial(() -> DataFileWrapper.wrap(null));
+
+  protected DataFileSet(Iterable<Wrapper<DataFile>> wrappers) {
+    super(wrappers);
+  }
+
+  public static DataFileSet empty() {

Review Comment:
   What about calling it `create()` as the returned set is mutable?



##########
api/src/main/java/org/apache/iceberg/util/DataFileSet.java:
##########
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.util;
+
+import java.util.Objects;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
+import org.apache.iceberg.relocated.com.google.common.collect.Sets;
+import org.apache.iceberg.types.Comparators;
+
+public class DataFileSet extends WrapperSet<DataFile> {
+  private static final ThreadLocal<DataFileWrapper> WRAPPERS =
+      ThreadLocal.withInitial(() -> DataFileWrapper.wrap(null));
+
+  protected DataFileSet(Iterable<Wrapper<DataFile>> wrappers) {
+    super(wrappers);
+  }
+
+  public static DataFileSet empty() {
+    return new DataFileSet(Sets.newLinkedHashSet());
+  }
+
+  public static DataFileSet of(Iterable<DataFile> iterable) {
+    return new DataFileSet(
+        Sets.newLinkedHashSet(Iterables.transform(iterable, 
DataFileWrapper::wrap)));
+  }
+
+  @Override
+  protected Wrapper<DataFile> wrapper() {
+    return WRAPPERS.get();
+  }
+
+  @Override
+  protected Wrapper<DataFile> wrap(DataFile dataFile) {
+    return DataFileWrapper.wrap(dataFile);
+  }
+
+  @Override
+  protected boolean isInstance(Object obj) {
+    return obj instanceof DataFile;
+  }
+
+  private static class DataFileWrapper implements Wrapper<DataFile> {
+    private DataFile file;
+
+    private DataFileWrapper(DataFile file) {
+      this.file = file;
+    }
+
+    private static DataFileWrapper wrap(DataFile dataFile) {
+      return new DataFileWrapper(dataFile);
+    }
+
+    @Override
+    public DataFile get() {
+      return file;
+    }
+
+    @Override
+    public Wrapper<DataFile> set(DataFile dataFile) {
+      this.file = dataFile;
+      return this;
+    }
+
+    @Override
+    public boolean equals(Object o) {
+      if (this == o) {
+        return true;
+      }
+
+      if (!(o instanceof DataFileWrapper)) {
+        return false;
+      }
+
+      DataFileWrapper that = (DataFileWrapper) o;
+      if (null == file && null == that.file) {

Review Comment:
   I wonder whether we want to allow null files in this set. Any thoughts?



##########
api/src/main/java/org/apache/iceberg/util/DataFileSet.java:
##########
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.util;
+
+import java.util.Objects;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
+import org.apache.iceberg.relocated.com.google.common.collect.Sets;
+import org.apache.iceberg.types.Comparators;
+
+public class DataFileSet extends WrapperSet<DataFile> {

Review Comment:
   Can we, please, add tests that these sets are serializable using both Java 
and Kryo serialization? We will use them in a Spark broadcast later. We already 
have `TestDataFileSerialization` and similar classes.



##########
core/src/test/java/org/apache/iceberg/util/TestDataFileSet.java:
##########
@@ -0,0 +1,209 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.util;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+import java.util.Collections;
+import java.util.Set;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.DataFiles;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
+import org.junit.jupiter.api.Test;
+
+/**
+ * Testing {@link DataFileSet} is easier in iceberg-core since the data file 
builders are located
+ * here
+ */
+public class TestDataFileSet {

Review Comment:
   Yeah, I wouldn't complicate it too much.



##########
api/src/main/java/org/apache/iceberg/util/WrapperSet.java:
##########
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.util;
+
+import java.io.Serializable;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.Set;
+import java.util.stream.Collectors;
+import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
+import org.apache.iceberg.relocated.com.google.common.collect.Iterators;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.relocated.com.google.common.collect.Sets;
+
+/**
+ * A custom set for a {@link Wrapper} of the given type that maintains 
insertion order
+ *
+ * @param <T> The type to wrap in a {@link Wrapper} instance.
+ */
+abstract class WrapperSet<T> implements Set<T>, Serializable {
+  private final Set<Wrapper<T>> set;
+
+  protected WrapperSet(Iterable<Wrapper<T>> wrappers) {
+    this.set = Sets.newLinkedHashSet(wrappers);
+  }
+
+  protected abstract Wrapper<T> wrapper();
+
+  protected abstract Wrapper<T> wrap(T file);
+
+  protected abstract boolean isInstance(Object obj);

Review Comment:
   One more alternative is to have `abstract Class<T> elementClass();` to rely 
on `isInstance` and use various utils like `Iterators.toArray` that accept 
`Class`. Up to you.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to