gortiz opened a new pull request, #10687: URL: https://github.com/apache/pinot/pull/10687
Tags should be: - `feature` - `backward-incompat` When #10183 was added, mutable indexes were explicitly not migrated to this new system in order to reduce the changes and make the initial feature easier to merge. This new PR introduces mutable indexes in the Index Service Interface. `IndexType` subclasses can optionally implement `createMutableIndex`. In case they return a value different than null, the `MutableIndex.add` method will be called whenever the `MutableSegmentImpl` receives a new row. Things to note: - MutableFSTIndex were (sometimes) created in MutableSegmentImpl but never used. Specifically: - They were created only when the type was FST ([code](https://github.com/apache/pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/mutable/MutableSegmentImpl.java#L322)). This means that we could create a FST index specifying the lucene type and it would just ignore it!) - The code distinguish between text and fst indexes. Text indexes were notified each time a row was added ([code](https://github.com/apache/pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/mutable/MutableSegmentImpl.java#L773)) but fst does not. - TBH I don't know the semantics of FST indexes in mutable segments. It seems they weren't actually implemented. - Actual MutableDictionary extra space - In order to reduce the chances of resizing, MutableDictionaries were not created with a size equal to the estimated size but a 10% extra... but this margin was applied twice: The first time in [MutableSegmentImpl#L304](https://github.com/apache/pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/mutable/MutableSegmentImpl.java#L304) and the second one in [DefaultMutableIndexProvider.java#L127](https://github.com/apache/pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/mutable/DefaultMutableIndexProvider.java#L127). - Therefore the actual margin was 21%. In order to do not change the current behavior, I'm only applying a single 21% size increase. We should discuss whether we actually want to use 10% as indicated by (both) comments or keep the actual 21% actual code was using. - There is a comment on `MutableSegmentImpl` that talks about a check that doesn't exist. It seems that the comment was introduced in 2019 and the code was changed between 2022 and 2022. I guess we want to delete it, but I'm leaving the `// TODO (mutable-index-spi)` there to make it easier to reviewers to find the place where the comment is. - I've moved this check [MutableSegmentImpl.java#L868](https://github.com/apache/pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/mutable/MutableSegmentImpl.java#L868) to the initialization part. Neither the type or its single/multiple value nature can change during indexing, so there is no need to do the check each time a row is added. Instead it can be done at construction time. - The partial `FieldSpec.IndexType` enum was used in `MutableSegmentImpl` to show some errors and export some metrics. That enum is partial. It doesn't include all possible types and it will never do (given that now index types are not know at compile time). Therefore I've changed that code to use `IndexType`. There is a compatibility issue with that: metrics will have a slightly different name. Therefore that is a change that breaks compatibility and why `backward-incompat` has to be added. - `MemoryEstimator` requires a `Schema` now. That means that `RealtimeProvisioningHelperCommand` needs to provide one. `RealtimeProvisioningHelperCommand` was already able to read the schema when - `-schemaWithMetadataFile` was supplied. In fact that argument was mandatory when `-sampleCompletedSegmentDir` was not supplied. The easiest way I've found to make everything compile now is to always require `-schemaWithMetadataFile` even when `-sampleCompletedSegmentDir` was used. I don't know how much this tool is used or the impact it may have. The file was mainly touched by @npawar and @walterddr, so please take a look in case I break something. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org