[GitHub] [pinot] siddharthteotia commented on issue #7395: Support for Native Text Indexing in Pinot

GitBox Fri, 10 Sep 2021 10:20:34 -0700


siddharthteotia commented on issue #7395:
URL: https://github.com/apache/pinot/issues/7395#issuecomment-917075229



   Wanted to understand a few things better. 
   
   IIUC, this is our current state
   
   - Lucene text index - wrapper over Lucene to get phrase, term, regex etc 
search query functionality
   - Lucene FST index - this was added later (by Confluera I guess) to use FST 
libraries of Lucene to purely get regex search functionality from Lucene.
   
   In both the above cases, we get FST and regexp automaton as part of using 
Lucene. We also advise users to not use Lucene text index if they want to do 
exact matches since Pinot's native inverted index is way faster for exact 
matches. When we say we are implementing native FST index, what exact 
functionality are we adding and/or improving ? This is not clear in the design 
doc. The doc talks about control/flexibility and potential future improvements 
but they are a bit vague IMHO and few more details can be added in those 
sections.
   
   My guess is that this is about improving phrase, regex and fuzzy search by 
building a native FST index which can work on top of existing Pinot's native 
structures -- inverted index and dictionary. So it seems like a bridge is 
missing between Pinot's native inv index and dictionary structure and Lucene 
FST. Is this correct ? If so, can this not be achieved by continuing to use 
Lucene FST library as opposed to putting it into Pinot. Something we already do 
as part of Lucene FST index. 
   
   Also, how will this new work be different from what is currently offered by 
Lucene FST index in terms of functionality and performance. There are some 
performance charts but if I am reading them right, the improvement seems 
marginal. 
   
   Also, thanks for clarifying in the doc that this work won't regress the 
TEXT_MATCH functionality (query syntax etc) and performance. In case, we go 
ahead with this new work, I think from the end state, we should not have the 
mandatory step of removing current Lucene text index and TEXT_MATCH. If someone 
wants to migrate, there should be a migration path. Rest of the users can 
continue to use what we have today 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [pinot] siddharthteotia commented on issue #7395: Support for Native Text Indexing in Pinot

Reply via email to