dantuzi opened a new issue, #12168:
URL: https://github.com/apache/lucene/issues/12168

   ### Description
   
   If you want to expand your query/documents with synonyms in Apache Lucene, 
you need a predefined file containing the list of terms that share the same 
semantics.
   It's not always easy to find a list of basic synonyms for a language and, 
even if you find it, this doesn’t necessarily match your contextual domain.
   The term "daemon" in the domain of operating system articles is not a 
synonym of "devil" but it's closer to the term "process".
   
   Word2Vec is a two-layer neural network that takes as input a text and 
outputs a vector representation for each word in the dictionary.
   Two words with similar meanings are identified with two vectors close to 
each other.
   
   This contribution integrates this technique with the text analysis pipeline. 
It automatically generates synonyms on the fly from a Word2Vec model generated 
using the library DL4J.
   Please see our presentation at the Berlin Buzzwords conference: 
https://pretalx.com/bbuzz22/talk/UYZAUX/
   
   We also created a tool to generate a Word2vec model from a Lucene index: 
https://github.com/SeaseLtd/LuceneWord2VecModelTrainer
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to