gsmiller commented on PR #12312: URL: https://github.com/apache/lucene/pull/12312#issuecomment-1555244204
OK, here's a method profiler diff for the "High Cardinality PK" task, comparing two postings approaches—one that's using the current MultiTermQuery version, and one using AutomatonQuery ("LegacyTermInSetQuery" is our current version extending MultiTermQuery). The green colored frames show where the AutomatonQuery is spending less time compared to the MultiTermQuery version, and red is the opposite. Eyeballing this, it seems to confirm that it's a little more expensive to build the automaton than to do the prefix-coding, but a little less expensive to do the term intersection with the automaton approach (down in the codec's optimized intersect). The net/net though is that the AutomatonQuery approach in this case is slower by ~35%. <img width="1733" alt="Screen Shot 2023-05-19 at 1 55 36 PM" src="https://github.com/apache/lucene/assets/16479560/c251415c-877a-4c5e-b8a1-ecb5c8546282"> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org