[ https://issues.apache.org/jira/browse/LUCENE-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201089#comment-17201089 ]
Cameron VandenBerg commented on LUCENE-9537: -------------------------------------------- Hi Adrien, Unfortunately, the smoothing score that we use is document specific, so I am not sure if I could make it "transferable". I am definitely interested in brainstorming ways that we can make Indri fit into the Lucene architecture better though. Perhaps an example of how Indri smoothing scores would be helpful. Supposed we have an index with 4 documents (so sorry for the political nature of the documents... it's just what I can easily think of at the moment): 1) Donald Trump is the president of the United States. 2) There are three branches of government. The president is the head of the executive branch. 3) Jane Doe is president of the PTO. 4) Trump was elected in the 2016 election. Say that the query is: President Trump. In this index, the term president occurs more than the term Trump. The smoothing score acts like and idf for the query terms so that documents with just the term Trump will be ranked higher than documents with just the term president. Consider documents 3&4, which have the same length and each have one search term, but Document 4 has the more rare search term. Therefore the smoothing score for the term Trump in Document 3, will be lower than the smoothing score for the term president in Document 4. The addition of the smoothing scores for the terms that don't exist allows Document 4 to get a higher score and be ranked above Document 3. Let me know whether this example makes sense. Can you see a way that I can refactor the smoothing score so that it better fits into Lucene's existing architecture? Or let me know if I misunderstood your comment and you still feel that what you suggested will work. Thank you! > Add Indri Search Engine Functionality to Lucene > ----------------------------------------------- > > Key: LUCENE-9537 > URL: https://issues.apache.org/jira/browse/LUCENE-9537 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search > Reporter: Cameron VandenBerg > Priority: Major > Labels: patch > Attachments: LUCENE-INDRI.patch > > > Indri ([http://lemurproject.org/indri.php]) is an academic search engine > developed by The University of Massachusetts and Carnegie Mellon University. > The major difference between Lucene and Indri is that Indri will give a > document a "smoothing score" to a document that does not contain the search > term, which has improved the search ranking accuracy in our experiments. I > have created an Indri patch, which adds the search code needed to implement > the Indri AND logic as well as Indri's implementation of Dirichlet Smoothing. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org