mikemccand opened a new issue, #13004:
URL: https://github.com/apache/lucene/issues/13004

   ### Description
   
   [I'm not sure how general this is but figured I'd open this to see if there 
is interest / other use cases:]
   
   At Amazon product search team we have synonyms that are sometimes 
conditionally applied depending on some context about the document or query.  
For example, `apple` might be a fruit in the grocery subset of Amazon's 
catalog, or a computer in the electronics catalog.
   
   Today we implement this very inefficiently: we compile N massive synonym 
maps, mostly (wastefully) with the same synonyms except for a few that are 
catalog / query context specific.  This is wasteful and takes gobbs of heap.  
(Hmm, separately: `SynonymGraphFilter` should be fixed to use off-heap FSTs -- 
I'll open a crab spinoff issue).
   
   Maybe instead we could allow each synonym rule to optionally have some 
metadata that may be used, at matching time, to post-filter, only applying the 
synonym based on the context of the current document/query?  I'm not sure how 
this would work -- maybe N labels that are compiled to an int/long bitset 
recorded into each FST rule?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to