mikemccand opened a new issue, #13004: URL: https://github.com/apache/lucene/issues/13004
### Description [I'm not sure how general this is but figured I'd open this to see if there is interest / other use cases:] At Amazon product search team we have synonyms that are sometimes conditionally applied depending on some context about the document or query. For example, `apple` might be a fruit in the grocery subset of Amazon's catalog, or a computer in the electronics catalog. Today we implement this very inefficiently: we compile N massive synonym maps, mostly (wastefully) with the same synonyms except for a few that are catalog / query context specific. This is wasteful and takes gobbs of heap. (Hmm, separately: `SynonymGraphFilter` should be fixed to use off-heap FSTs -- I'll open a crab spinoff issue). Maybe instead we could allow each synonym rule to optionally have some metadata that may be used, at matching time, to post-filter, only applying the synonym based on the context of the current document/query? I'm not sure how this would work -- maybe N labels that are compiled to an int/long bitset recorded into each FST rule? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org