siddharthteotia commented on issue #7395: URL: https://github.com/apache/pinot/issues/7395#issuecomment-917075229
Wanted to understand a few things better. IIUC, this is our current state - Lucene text index - wrapper over Lucene to get phrase, term, regex etc search query functionality - Lucene FST index - this was added later (by Confluera I guess) to use FST libraries of Lucene to purely get regex search functionality from Lucene. In both the above cases, we get FST and regexp automaton as part of using Lucene. We also advise users to not use Lucene text index if they want to do exact matches since Pinot's native inverted index is way faster for exact matches. When we say we are implementing native FST index, what exact functionality are we adding and/or improving ? This is not clear in the design doc. The doc talks about control/flexibility and potential future improvements but they are a bit vague IMHO and few more details can be added in those sections. My guess is that this is about improving phrase, regex and fuzzy search by building a native FST index which can work on top of existing Pinot's native structures -- inverted index and dictionary. So it seems like a bridge is missing between Pinot's native inv index and dictionary structure and Lucene FST. Is this correct ? If so, can this not be achieved by continuing to use Lucene FST library as opposed to putting it into Pinot. Something we already do as part of Lucene FST index. Also, how will this new work be different from what is currently offered by Lucene FST index in terms of functionality and performance. There are some performance charts but if I am reading them right, the improvement seems marginal. Also, thanks for clarifying in the doc that this work won't regress the TEXT_MATCH functionality (query syntax etc) and performance. In case, we go ahead with this new work, I think from the end state, we should not have the mandatory step of removing current Lucene text index and TEXT_MATCH. If someone wants to migrate, there should be a migration path. Rest of the users can continue to use what we have today -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org