gortiz opened a new pull request, #10687:
URL: https://github.com/apache/pinot/pull/10687

   Tags should be:
   - `feature`
   -  `backward-incompat`
   
   When #10183 was added, mutable indexes were explicitly not migrated to this 
new system in order to reduce the changes and make the initial feature easier 
to merge.
   
   This new PR introduces mutable indexes in the Index Service Interface. 
`IndexType` subclasses can optionally implement `createMutableIndex`. In case 
they return a value different than null, the `MutableIndex.add` method will be 
called whenever the `MutableSegmentImpl` receives a new row.
   
   Things to note:
   - MutableFSTIndex were (sometimes) created in MutableSegmentImpl but never 
used. Specifically:
      - They were created only when the type was FST 
([code](https://github.com/apache/pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/mutable/MutableSegmentImpl.java#L322)).
 This means that we could create a FST index specifying the lucene type and it 
would just ignore it!)
      - The code distinguish between text and fst indexes. Text indexes were 
notified each time a row was added 
([code](https://github.com/apache/pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/mutable/MutableSegmentImpl.java#L773))
 but fst does not.
      - TBH I don't know the semantics of FST indexes in mutable segments. It 
seems they weren't actually implemented.
   - Actual MutableDictionary extra space
      - In order to reduce the chances of resizing, MutableDictionaries were 
not created with a size equal to the estimated size but a 10% extra... but this 
margin was applied twice: The first time in 
[MutableSegmentImpl#L304](https://github.com/apache/pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/mutable/MutableSegmentImpl.java#L304)
 and the second one in 
[DefaultMutableIndexProvider.java#L127](https://github.com/apache/pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/mutable/DefaultMutableIndexProvider.java#L127).
      - Therefore the actual margin was 21%. In order to do not change the 
current behavior, I'm only applying a single 21% size increase. We should 
discuss whether we actually want to use 10% as indicated by (both) comments or 
keep the actual 21% actual code was using.
   - There is a comment on `MutableSegmentImpl` that talks about a check that 
doesn't exist. It seems that the comment was introduced in 2019 and the code 
was changed between 2022 and 2022. I guess we want to delete it, but I'm 
leaving the `// TODO (mutable-index-spi)` there to make it easier to reviewers 
to find the place where the comment is.
   - I've moved this check 
[MutableSegmentImpl.java#L868](https://github.com/apache/pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/mutable/MutableSegmentImpl.java#L868)
 to the initialization part. Neither the type or its single/multiple value 
nature can change during indexing, so there is no need to do the check each 
time a row is added. Instead it can be done at construction time.
   - The partial `FieldSpec.IndexType` enum was used in `MutableSegmentImpl` to 
show some errors and export some metrics. That enum is partial. It doesn't 
include all possible types and it will never do (given that now index types are 
not know at compile time). Therefore I've changed that code to use `IndexType`. 
There is a compatibility issue with that: metrics will have a slightly 
different name. Therefore that is a change that breaks compatibility and why  
`backward-incompat` has to be added.
   - `MemoryEstimator` requires a `Schema` now. That means that 
`RealtimeProvisioningHelperCommand` needs to provide one. 
`RealtimeProvisioningHelperCommand` was already able to read the schema when - 
   `-schemaWithMetadataFile` was supplied. In fact that argument was mandatory 
when `-sampleCompletedSegmentDir` was not supplied. The easiest way I've found 
to make everything compile now is to always require `-schemaWithMetadataFile` 
even when `-sampleCompletedSegmentDir` was used. I don't know how much this 
tool is used or the impact it may have. The file was mainly touched by @npawar 
and @walterddr, so please take a look in case I break something.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to