[GitHub] [pinot] siddharthteotia commented on issue #7395: Support for Native Text Indexing in Pinot

GitBox Thu, 16 Sep 2021 17:17:49 -0700


siddharthteotia commented on issue #7395:
URL: https://github.com/apache/pinot/issues/7395#issuecomment-921350642



   I had followed up for clarifying few additional things with @atris in slack 
channel. Copying here for reference and visibility
   
   Can we all confirm the following ? I am sorry to have asked this couple of 
times as part of different threads in the doc but since doc still indicates 
some sort of migration _Note that till completion of phase 4,  we will be 
maintaining the existing text indices within Pinot_. I just want to make sure
   
   
   - Existing Lucene text index functionality offered via TEXT_MATCH will 
continue to work as is and is essentially untouched by this work
   - Both indexes can co-exist and we are not removing Lucene dependency ?
   - Upon segment reload, existing Lucene index can potentially be converted to 
new format (if need be). However, if someone wishes to do this, how will the 
query syntax used in TEXT_MATCH from lucene based remain compliant for native 
FST index (which I believe will follow SQL LIKE semantics). I am guessing the 
users will have to change queries if they wish to migrate ?
   - For the native FST index, the plan is to eventually support all kinds of 
searches -- phrase, term, regex, fuzzy etc. So for example, phrase search needs 
position info which I am not sure if it comes for free as part of FST. 
Regardless, all of that is the end state and comprehensive text search 
functionality will be available through this native index ?
   - - This is important for us because eventually (and this is a big eventual 
for us :slightly_smiling_face: ) we might want to migrate our production Li 
users from Lucene text index to native FST index if performance is better. I 
can't promise if that will happen as it will certainly be a lot of work (hence 
seeking confirmation that we are not removing anything). Our production users 
use a lot of phrase queries.
   - General question - are you planning to make this functionality available 
both via LIKE and TEXT_MATCH or want to keep it separate and just use LIKE ? 
Latter can also be overloaded as long as user docs clearly indicate that 
TEXT_MATCH can be used for both native and lucene text index
   - Request on code - since FST is like a black box (for me except for 
whatever I learned from paper and online presentations), can you please make 
sure that code is sufficiently documented and explains algorithm as and when 
needed. Initially, we were just relying on Lucene committers but now we will 
have to maintain. This will also help with easy review
   
   
   @atris 's response
   
   - Yes, Lucene Indices and TEXT_MATCH will be completely untouched and 
unaffected by this effort.
   - No, we are not removing the dependency and both indices can coexist, 
oblivious of each other.
   - Here is the interesting one. Native FST can support all queries that 
Lucene does. However, since our indices do not store some metadata (such as 
positional index) that Lucene Indices do, we will have to implement custom 
operators on top of native FST. However, syntactically, native FST shall pose 
no challenges in that implementation. If there are specific operators outside 
of the four planned currently (regexp, like, phrase and fuzzy) that will be 
needed for users to migrate, I will be more than happy to support.
   - Yes, in the end state, comprehensive text search will be natively 
available.
   - I was actually not planning to overload TEXT_MATCH since it basically 
supports Lucene syntax, but rather have custom functions for phrase, fuzzy and 
regexp, and let the LIKE operator deal with the rest. However, there is no 
reason why we can't go down that route.
   - I completely agree. I have tried to document the code as elaborately as 
possible and also written supporting documents (e.g. On the Regexp compilation 
process). If there is more needed on specific areas, I will gladly write more :)
   
   Based on above clarifications, I am ok with proceeding 
   
   @amrishlal , @jackjlli  please feel free to add any additional discussion 
notes 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [pinot] siddharthteotia commented on issue #7395: Support for Native Text Indexing in Pinot

Reply via email to