aokolnychyi commented on code in PR #11195: URL: https://github.com/apache/iceberg/pull/11195#discussion_r1779311673
########## api/src/main/java/org/apache/iceberg/util/DataFileSet.java: ########## @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.util; + +import java.io.IOException; +import java.io.ObjectInputStream; +import java.io.ObjectOutputStream; +import java.io.Serializable; +import java.util.Objects; +import org.apache.iceberg.DataFile; +import org.apache.iceberg.relocated.com.google.common.collect.Iterables; + +public class DataFileSet extends WrapperSet<DataFile> implements Serializable { + private static final ThreadLocal<DataFileWrapper> WRAPPERS = + ThreadLocal.withInitial(() -> DataFileWrapper.wrap(null)); + + private DataFileSet() { + // needed for serialization/deserialization + } + + private DataFileSet(Iterable<Wrapper<DataFile>> wrappers) { + super(wrappers); + } + + public static DataFileSet create() { + return new DataFileSet(); + } + + public static DataFileSet of(Iterable<? extends DataFile> iterable) { + return new DataFileSet( + Iterables.transform(Iterables.filter(iterable, Objects::nonNull), DataFileWrapper::wrap)); + } + + @Override + protected Wrapper<DataFile> wrapper() { + return WRAPPERS.get(); + } + + @Override + protected Wrapper<DataFile> wrap(DataFile dataFile) { + return DataFileWrapper.wrap(dataFile); + } + + @Override + protected Class<DataFile> elementClass() { + return DataFile.class; + } + + /** + * Since {@link WrapperSet} itself isn't {@link Serializable}, this requires custom logic to write + * {@link DataFileSet} to the given stream. + * + * @param out The output stream to write to + * @throws IOException in case the object can't be written + */ + private void writeObject(ObjectOutputStream out) throws IOException { Review Comment: It was my bad to suggest moving `Serializable` to children. It is not worth the extra complexity. Let's simply make the parent `Serializable` and remove this logic. Sorry! ########## api/src/main/java/org/apache/iceberg/util/DataFileSet.java: ########## @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.util; + +import java.io.IOException; +import java.io.ObjectInputStream; +import java.io.ObjectOutputStream; +import java.io.Serializable; +import java.util.Objects; +import org.apache.iceberg.DataFile; +import org.apache.iceberg.relocated.com.google.common.collect.Iterables; + +public class DataFileSet extends WrapperSet<DataFile> implements Serializable { + private static final ThreadLocal<DataFileWrapper> WRAPPERS = + ThreadLocal.withInitial(() -> DataFileWrapper.wrap(null)); + + private DataFileSet() { + // needed for serialization/deserialization + } + + private DataFileSet(Iterable<Wrapper<DataFile>> wrappers) { + super(wrappers); + } + + public static DataFileSet create() { + return new DataFileSet(); + } + + public static DataFileSet of(Iterable<? extends DataFile> iterable) { + return new DataFileSet( + Iterables.transform(Iterables.filter(iterable, Objects::nonNull), DataFileWrapper::wrap)); Review Comment: I think the extra filtering logic for null values may be confusing. I would prefer simplicity and an NPE in this case. I would also remove the extra logic in the wrapper to check for nulls in equals and hashCode. NPE is a perfectly valid exception in set implementations that don't support null values. The following code would be easier to understand. ``` public static DataFileSet of(Iterable<? extends DataFile> iterable) { return new DataFileSet(Iterables.transform(iterable, DataFileWrapper::wrap)); } ... @Override public boolean equals(Object o) { if (this == o) { return true; } if (!(o instanceof DataFileWrapper)) { return false; } DataFileWrapper that = (DataFileWrapper) o; return Objects.equals(file.location(), that.file.location()); } ``` ########## api/src/main/java/org/apache/iceberg/util/WrapperSet.java: ########## @@ -0,0 +1,192 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.util; + +import java.util.Collection; +import java.util.Iterator; +import java.util.Set; +import java.util.stream.Collectors; +import org.apache.iceberg.relocated.com.google.common.base.Preconditions; +import org.apache.iceberg.relocated.com.google.common.collect.Iterables; +import org.apache.iceberg.relocated.com.google.common.collect.Iterators; +import org.apache.iceberg.relocated.com.google.common.collect.Lists; +import org.apache.iceberg.relocated.com.google.common.collect.Sets; +import org.apache.iceberg.relocated.com.google.common.collect.Streams; + +/** + * A custom set for a {@link Wrapper} of the given type that maintains insertion order and does not + * allow null elements. + * + * @param <T> The type to wrap in a {@link Wrapper} instance. + */ +abstract class WrapperSet<T> implements Set<T> { + private final Set<Wrapper<T>> set = Sets.newLinkedHashSet(); + + protected WrapperSet(Iterable<Wrapper<T>> wrappers) { + wrappers.forEach(set::add); + } + + protected WrapperSet() {} + + protected abstract Wrapper<T> wrapper(); + + protected abstract Wrapper<T> wrap(T file); + + protected abstract Class<T> elementClass(); + + protected interface Wrapper<T> { + T get(); + + Wrapper<T> set(T object); + } + + protected Set<Wrapper<T>> set() { + return set; + } + + @Override + public int size() { + return set.size(); + } + + @Override + public boolean isEmpty() { + return set.isEmpty(); + } + + @SuppressWarnings("unchecked") + @Override + public boolean contains(Object obj) { Review Comment: Given that `Set` allows `ClassCastException` in the API, I wonder whether we should simplify this. ``` @Override public boolean contains(Object obj) { Preconditions.checkNotNull(obj, "Invalid object: null"); Wrapper<T> wrapper = wrapper(); boolean result = set.contains(wrapper.set(elementClass().cast(obj))); wrapper.set(null); // don't hold a reference to the value return result; } ``` Up to you. Either way, we should use `elementClass().cast(obj)` to get rid of `@SuppressWarnings`. ########## api/src/main/java/org/apache/iceberg/util/WrapperSet.java: ########## @@ -0,0 +1,192 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.util; + +import java.util.Collection; +import java.util.Iterator; +import java.util.Set; +import java.util.stream.Collectors; +import org.apache.iceberg.relocated.com.google.common.base.Preconditions; +import org.apache.iceberg.relocated.com.google.common.collect.Iterables; +import org.apache.iceberg.relocated.com.google.common.collect.Iterators; +import org.apache.iceberg.relocated.com.google.common.collect.Lists; +import org.apache.iceberg.relocated.com.google.common.collect.Sets; +import org.apache.iceberg.relocated.com.google.common.collect.Streams; + +/** + * A custom set for a {@link Wrapper} of the given type that maintains insertion order and does not + * allow null elements. + * + * @param <T> The type to wrap in a {@link Wrapper} instance. + */ +abstract class WrapperSet<T> implements Set<T> { + private final Set<Wrapper<T>> set = Sets.newLinkedHashSet(); + + protected WrapperSet(Iterable<Wrapper<T>> wrappers) { + wrappers.forEach(set::add); + } + + protected WrapperSet() {} + + protected abstract Wrapper<T> wrapper(); + + protected abstract Wrapper<T> wrap(T file); + + protected abstract Class<T> elementClass(); + + protected interface Wrapper<T> { + T get(); + + Wrapper<T> set(T object); + } + + protected Set<Wrapper<T>> set() { + return set; + } + + @Override + public int size() { + return set.size(); + } + + @Override + public boolean isEmpty() { + return set.isEmpty(); + } + + @SuppressWarnings("unchecked") + @Override + public boolean contains(Object obj) { + Preconditions.checkNotNull(obj, "Invalid object: null"); + if (elementClass().isInstance(obj)) { + Wrapper<T> wrapper = wrapper(); + boolean result = set.contains(wrapper.set((T) obj)); + wrapper.set(null); // don't hold a reference to the value + return result; + } + + return false; + } + + @Override + public Iterator<T> iterator() { + return Iterators.transform(set.iterator(), Wrapper::get); + } + + @Override + public Object[] toArray() { + return Lists.newArrayList(iterator()).toArray(); + } + + @Override + public <X> X[] toArray(X[] destArray) { + return Lists.newArrayList(iterator()).toArray(destArray); + } + + @Override + public boolean add(T obj) { + Preconditions.checkNotNull(obj, "Invalid object: null"); + return set.add(wrap(obj)); + } + + @SuppressWarnings("unchecked") + @Override + public boolean remove(Object obj) { + Preconditions.checkNotNull(obj, "Invalid object: null"); + if (elementClass().isInstance(obj)) { + Wrapper<T> wrapper = wrapper(); + boolean result = set.remove(wrapper.set((T) obj)); + wrapper.set(null); // don't hold a reference to the value + return result; + } + + return false; + } + + @Override + public boolean containsAll(Collection<?> collection) { + Preconditions.checkNotNull(collection, "Invalid collection: null"); + return Iterables.all(collection, this::contains); + } + + @Override + public boolean addAll(Collection<? extends T> collection) { + Preconditions.checkNotNull(collection, "Invalid collection: null"); + return collection.stream().filter(this::add).count() != 0; + } + + @SuppressWarnings("unchecked") + @Override + public boolean retainAll(Collection<?> collection) { + Preconditions.checkNotNull(collection, "Invalid collection: null"); + Set<Wrapper<T>> toRetain = + collection.stream() + .map(obj -> Preconditions.checkNotNull(obj, "Invalid object: null")) + .filter(elementClass()::isInstance) + .map(obj -> (T) obj) Review Comment: `(T) obj` -> `elementClass()::cast` to get rid of `@SuppressWarnings`. ########## api/src/main/java/org/apache/iceberg/util/DataFileSet.java: ########## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.util; + +import java.util.Objects; +import org.apache.iceberg.DataFile; +import org.apache.iceberg.relocated.com.google.common.collect.Iterables; +import org.apache.iceberg.relocated.com.google.common.collect.Sets; +import org.apache.iceberg.types.Comparators; + +public class DataFileSet extends WrapperSet<DataFile> { + private static final ThreadLocal<DataFileWrapper> WRAPPERS = + ThreadLocal.withInitial(() -> DataFileWrapper.wrap(null)); + + protected DataFileSet(Iterable<Wrapper<DataFile>> wrappers) { + super(wrappers); + } + + public static DataFileSet empty() { + return new DataFileSet(Sets.newLinkedHashSet()); + } + + public static DataFileSet of(Iterable<DataFile> iterable) { + return new DataFileSet( + Sets.newLinkedHashSet(Iterables.transform(iterable, DataFileWrapper::wrap))); + } + + @Override + protected Wrapper<DataFile> wrapper() { + return WRAPPERS.get(); + } + + @Override + protected Wrapper<DataFile> wrap(DataFile dataFile) { + return DataFileWrapper.wrap(dataFile); + } + + @Override + protected boolean isInstance(Object obj) { + return obj instanceof DataFile; + } + + private static class DataFileWrapper implements Wrapper<DataFile> { + private DataFile file; + + private DataFileWrapper(DataFile file) { + this.file = file; + } + + private static DataFileWrapper wrap(DataFile dataFile) { + return new DataFileWrapper(dataFile); + } + + @Override + public DataFile get() { + return file; + } + + @Override + public Wrapper<DataFile> set(DataFile dataFile) { + this.file = dataFile; + return this; + } + + @Override + public boolean equals(Object o) { + if (this == o) { + return true; + } + + if (!(o instanceof DataFileWrapper)) { + return false; + } + + DataFileWrapper that = (DataFileWrapper) o; + if (null == file && null == that.file) { Review Comment: I would go for simpler logic as we already guard against null elements. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org