Jackie-Jiang commented on code in PR #10191: URL: https://github.com/apache/pinot/pull/10191#discussion_r1092340224
########## pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/FieldIndexConfigs.java: ########## @@ -0,0 +1,131 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.pinot.segment.spi.index; + +import com.fasterxml.jackson.annotation.JsonIgnore; +import com.google.common.base.Preconditions; +import java.util.Collections; +import java.util.HashMap; +import java.util.Map; +import java.util.Set; +import javax.annotation.Nullable; + + +/** + * FieldIndexConfigs are a map like structure that relates index types with their configuration, providing a type safe + * interface. + */ +public class FieldIndexConfigs { Review Comment: Is this readable from JSON? If so, we should probably annotate it, and consider make it extending `BaseJsonConfig`; if not, we should remove the `JsonIgnore` below ########## pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/FieldIndexConfigs.java: ########## @@ -0,0 +1,131 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.pinot.segment.spi.index; + +import com.fasterxml.jackson.annotation.JsonIgnore; +import com.google.common.base.Preconditions; +import java.util.Collections; +import java.util.HashMap; +import java.util.Map; +import java.util.Set; +import javax.annotation.Nullable; + + +/** + * FieldIndexConfigs are a map like structure that relates index types with their configuration, providing a type safe + * interface. + */ +public class FieldIndexConfigs { + + private final Map<IndexType, IndexDeclaration<?>> _configMap; + public static final FieldIndexConfigs EMPTY = new FieldIndexConfigs(new HashMap<>()); Review Comment: (nit, convention) Suggest putting the constant on top for readability ########## pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexCreator.java: ########## @@ -0,0 +1,91 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.pinot.segment.spi.index; + +import java.io.Closeable; +import java.io.IOException; +import javax.annotation.Nonnull; +import javax.annotation.Nullable; + + +/** + * The interface used to create indexes. + * + * The lifecycle for an IndexCreator is to be created, receive one or more calls to either + * {@link #addSingleValueCell(Object, int)} or {@link #addMultiValueCell(Object[], int[])} (but not mix them), + * a call to {@link #seal()} and finally be closed. Calls to add cell methods must be done in document id order, + * starting from the first document id. + */ +public interface IndexCreator extends Closeable { + + /** + * This method can be implemented to offer a different value to the raw one included in the row source. + * + * Some specific indexes may need to change the actual value to be stored. For example, text indexes may offer a + * default value when {@link org.apache.pinot.spi.config.table.FieldConfig#TEXT_INDEX_NO_RAW_DATA} is defined. + * + * @param value the raw value included in the row + * @return Null in case this creator does not offer a new value. + */ + @Nullable + default Object alternativeSingleValue(Object value) { Review Comment: This is anti-pattern, and we are trying to solve the current text index value alternate with disabled forward index. After that is done, we can remove the alternate APIs. cc @siddharthteotia ########## pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexType.java: ########## @@ -0,0 +1,171 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.segment.spi.index; + +import com.fasterxml.jackson.databind.JsonNode; +import java.io.IOException; +import java.util.Map; +import javax.annotation.Nullable; +import org.apache.pinot.segment.spi.ColumnMetadata; +import org.apache.pinot.segment.spi.creator.IndexCreationContext; +import org.apache.pinot.segment.spi.index.column.ColumnIndexContainer; +import org.apache.pinot.segment.spi.store.SegmentDirectory; +import org.apache.pinot.spi.config.table.FieldConfig; +import org.apache.pinot.spi.config.table.TableConfig; +import org.apache.pinot.spi.data.Schema; +import org.apache.pinot.spi.utils.JsonUtils; + + +/** + * TODO: implement mutable indexes. + * @param <C> the class that represents how this object is configured. + * @param <IR> the {@link IndexReader} subclass that should be used to read indexes of this type. + * @param <IC> the {@link IndexCreator} subclass that should be used to create indexes of this type. + */ +public interface IndexType<C, IR extends IndexReader, IC extends IndexCreator> { + + /** + * The unique id that identifies this index type. + * In case there is more than one implementation for a given index, then all should share the same id in order to be + * correctly registered in the {@link IndexService}. + * This is also the value being used as the default toString implementation and the one used as keys when config is + * specified. + * + * <p>Therefore the returned value for each index should be constant across different Pinot versions.</p> + */ + String getId(); Review Comment: What is the difference between id and index name? I feel id is a little bit confusing, especially when we actually return the index type name here ########## pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexType.java: ########## @@ -0,0 +1,171 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.segment.spi.index; + +import com.fasterxml.jackson.databind.JsonNode; +import java.io.IOException; +import java.util.Map; +import javax.annotation.Nullable; +import org.apache.pinot.segment.spi.ColumnMetadata; +import org.apache.pinot.segment.spi.creator.IndexCreationContext; +import org.apache.pinot.segment.spi.index.column.ColumnIndexContainer; +import org.apache.pinot.segment.spi.store.SegmentDirectory; +import org.apache.pinot.spi.config.table.FieldConfig; +import org.apache.pinot.spi.config.table.TableConfig; +import org.apache.pinot.spi.data.Schema; +import org.apache.pinot.spi.utils.JsonUtils; + + +/** + * TODO: implement mutable indexes. + * @param <C> the class that represents how this object is configured. + * @param <IR> the {@link IndexReader} subclass that should be used to read indexes of this type. + * @param <IC> the {@link IndexCreator} subclass that should be used to create indexes of this type. + */ +public interface IndexType<C, IR extends IndexReader, IC extends IndexCreator> { + + /** + * The unique id that identifies this index type. + * In case there is more than one implementation for a given index, then all should share the same id in order to be + * correctly registered in the {@link IndexService}. + * This is also the value being used as the default toString implementation and the one used as keys when config is + * specified. + * + * <p>Therefore the returned value for each index should be constant across different Pinot versions.</p> + */ + String getId(); + + /** + * Returns an internal name used in some parts of the code (mainly in format v1 and metadata) that is persisted on + * disk. + * + * <p>Therefore the returned value for each index should be constant across different Pinot versions.</p> + */ + String getIndexName(); + + default Class<C> getIndexConfigClass() { + throw new UnsupportedOperationException(); + } + + /** + * The default config when it is not explicitly defined by the user. + * + * Can return null if the index should be disabled by default. + */ + @Nullable + default C getDefaultConfig() { + return null; + } + + /** + * This method is called to transform from a JSON node to a config object. + * + * This is usually used to deserialize {@link FieldConfig#getIndexes() fieldConfigLists.indexes.(indexId)}. + * @throws IOException + */ + @Nullable Review Comment: Is this nullable? ########## pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexDeclaration.java: ########## @@ -0,0 +1,114 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.pinot.segment.spi.index; + +import com.google.common.base.Preconditions; +import java.util.Objects; +import javax.annotation.Nullable; + + +// This should be a sealed class in modern Java +public class IndexDeclaration<C> { Review Comment: Do we need to know if an index is declared or default? In what scenario are we going to get an undeclared index? ########## pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexCreator.java: ########## @@ -0,0 +1,91 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.pinot.segment.spi.index; + +import java.io.Closeable; +import java.io.IOException; +import javax.annotation.Nonnull; Review Comment: (nit) Let's not use `Nonnull` annotation and assume everything is non-null without annotation because it is very easy to mix `Nonnull` and `Nullable` ########## pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/FieldIndexConfigs.java: ########## @@ -0,0 +1,131 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.pinot.segment.spi.index; + +import com.fasterxml.jackson.annotation.JsonIgnore; +import com.google.common.base.Preconditions; +import java.util.Collections; +import java.util.HashMap; +import java.util.Map; +import java.util.Set; +import javax.annotation.Nullable; + + +/** + * FieldIndexConfigs are a map like structure that relates index types with their configuration, providing a type safe + * interface. + */ +public class FieldIndexConfigs { + + private final Map<IndexType, IndexDeclaration<?>> _configMap; + public static final FieldIndexConfigs EMPTY = new FieldIndexConfigs(new HashMap<>()); + + private FieldIndexConfigs(Map<IndexType, IndexDeclaration<?>> configMap) { + _configMap = Collections.unmodifiableMap(configMap); + } + + + /** + * The method most of the code should call to know how is configured a given index. + * + * @param indexType the type of the index we are interested in. + * @return How the index has been configured, which can be {@link IndexDeclaration#isDeclared() not declared}, + * {@link IndexDeclaration#isEnabled()} not enabled} or an actual configuration object (which can be obtained with + * {@link IndexDeclaration#getEnabledConfig()}. + */ + @JsonIgnore + public <C, I extends IndexType<C, ?, ?>> IndexDeclaration<C> getConfig(I indexType) { + @SuppressWarnings("unchecked") + IndexDeclaration<C> config = (IndexDeclaration<C>) _configMap.get(indexType); + if (config == null) { + return IndexDeclaration.notDeclared(indexType); + } + return config; + } + + @JsonIgnore + public Set<IndexType> getIndexes() { Review Comment: (minor) ```suggestion public Set<IndexType> getIndexTypes() { ``` ########## pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexReaderFactory.java: ########## @@ -0,0 +1,39 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.pinot.segment.spi.index; + +import java.io.File; +import java.io.IOException; +import javax.annotation.Nullable; +import org.apache.pinot.segment.spi.ColumnMetadata; +import org.apache.pinot.segment.spi.store.SegmentDirectory; + + +public interface IndexReaderFactory<R extends IndexReader> { + + /** + * @throws IndexReaderConstraintException if the constraints of the index reader are not matched. For example, some + * indexes may require the column to be dictionary based. + */ + @Nullable + R read(SegmentDirectory.Reader segmentReader, FieldIndexConfigs fieldIndexConfigs, ColumnMetadata metadata, + File segmentDir) Review Comment: Do we need to pass in the `segmentDir`? It should be available from the reader ########## pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexType.java: ########## @@ -0,0 +1,171 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.segment.spi.index; + +import com.fasterxml.jackson.databind.JsonNode; +import java.io.IOException; +import java.util.Map; +import javax.annotation.Nullable; +import org.apache.pinot.segment.spi.ColumnMetadata; +import org.apache.pinot.segment.spi.creator.IndexCreationContext; +import org.apache.pinot.segment.spi.index.column.ColumnIndexContainer; +import org.apache.pinot.segment.spi.store.SegmentDirectory; +import org.apache.pinot.spi.config.table.FieldConfig; +import org.apache.pinot.spi.config.table.TableConfig; +import org.apache.pinot.spi.data.Schema; +import org.apache.pinot.spi.utils.JsonUtils; + + +/** + * TODO: implement mutable indexes. + * @param <C> the class that represents how this object is configured. + * @param <IR> the {@link IndexReader} subclass that should be used to read indexes of this type. + * @param <IC> the {@link IndexCreator} subclass that should be used to create indexes of this type. + */ +public interface IndexType<C, IR extends IndexReader, IC extends IndexCreator> { + + /** + * The unique id that identifies this index type. + * In case there is more than one implementation for a given index, then all should share the same id in order to be + * correctly registered in the {@link IndexService}. + * This is also the value being used as the default toString implementation and the one used as keys when config is + * specified. + * + * <p>Therefore the returned value for each index should be constant across different Pinot versions.</p> + */ + String getId(); + + /** + * Returns an internal name used in some parts of the code (mainly in format v1 and metadata) that is persisted on + * disk. + * + * <p>Therefore the returned value for each index should be constant across different Pinot versions.</p> + */ + String getIndexName(); + + default Class<C> getIndexConfigClass() { + throw new UnsupportedOperationException(); + } + + /** + * The default config when it is not explicitly defined by the user. + * + * Can return null if the index should be disabled by default. + */ + @Nullable + default C getDefaultConfig() { + return null; + } + + /** + * This method is called to transform from a JSON node to a config object. + * + * This is usually used to deserialize {@link FieldConfig#getIndexes() fieldConfigLists.indexes.(indexId)}. + * @throws IOException + */ + @Nullable + default C deserialize(JsonNode node) + throws IOException { + return JsonUtils.jsonNodeToObject(node, getIndexConfigClass()); + } + + /** + * This method can be overridden by indexes that support alternative configuration formats where the configuration is + * spread on different fields in the TableConfig. + * + * Configuration that can be read from the {@link FieldConfig#getIndexes() fieldConfigLists.indexes} shall not be + * included here. + */ + default IndexDeclaration<C> deserializeSpreadConf(TableConfig tableConfig, Schema schema, String column) { Review Comment: Should we consider always extracting the config this way, and provide a default implementation which looks at the `FieldIndexConfigs` and deserialize the json node when it exists? Something like ``` @Nullable default C getConfig(TableConfig tableConfig, Schema schema, String column) { FieldIndexConfigs fieldIndexConfigs = tableConfig.getFieldIndexConfigs(column); if (fieldIndexConfigs != null && fieldIndexConfigs.containsKey(getId())) { return deserialize(...); } return null; } ########## pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexHandler.java: ########## @@ -0,0 +1,48 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.segment.spi.index; + +import org.apache.pinot.segment.spi.store.SegmentDirectory; + + +/** + * Interface for index handlers, which update the corresponding type of indices, + * like adding, removing or converting the format. + */ +public interface IndexHandler { + /** + * Adds new indices and removes obsolete indices. + */ + void updateIndices(SegmentDirectory.Writer segmentWriter) + throws Exception; + + /** + * Check if there is a need to add new indices or removes obsolete indices. + * @return true if there is a need to update. + */ + boolean needUpdateIndices(SegmentDirectory.Reader segmentReader) Review Comment: (nit) Put this API above for readability as we always follow the sequence of: check -> update -> post update -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org