[ 
https://issues.apache.org/jira/browse/LUCENE-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200173#comment-17200173
 ] 

Cameron VandenBerg commented on LUCENE-9537:
--------------------------------------------

Hi Michael!  Thank you for taking the time to look at this so quickly!

 

I do like the idea of a tighter integration allowing for less forking.  At 
first, I was trying to shy away from changing Lucene classes, but I can see a 
few ways to make small changes in Lucene, which could eliminate some of need 
for forking the complex classes.  The only issue with this is that I will need 
to touch a lot more classes in order to preserve the test functionality.  If 
you think this is worth a shot, I am happy to refactor this patch to try this 
approach.

 

For the IndriDisjuctionScorer, I was using Lucene's DisjunctionScorer as an 
example for implementing the docID function.  I am happy to look into this 
functionality and see if I can get the ID from the DocIdSetIterator instead.

 

Indri uses boost in a slightly different way for scoring than Lucene, which is 
why I have it added to the scorer.  However, the main functionality that we 
would like to add at this point is the smoothing score and Indri's 
implementation of Dirichlet smoothing so we are happy to work with Lucene's 
existing smoothing if that makes things easier.

 

I will be happy to add some class-level javadocs.  I could certainly create a 
indri subpackage for these changes as well.  I am not sure what Lucene's 
sandbox module is.  Would you be able to let me know how I could contribute to 
that?

 

Thanks again for working with me!  I will start looking through how to make 
some of the changes you have suggested.  Let me know whether you think it makes 
more sense for me to try to create a new patch or add to the sandbox module.

> Add Indri Search Engine Functionality to Lucene
> -----------------------------------------------
>
>                 Key: LUCENE-9537
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9537
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>            Reporter: Cameron VandenBerg
>            Priority: Major
>              Labels: patch
>         Attachments: LUCENE-INDRI.patch
>
>
> Indri ([http://lemurproject.org/indri.php]) is an academic search engine 
> developed by The University of Massachusetts and Carnegie Mellon University.  
> The major difference between Lucene and Indri is that Indri will give a 
> document a "smoothing score" to a document that does not contain the search 
> term, which has improved the search ranking accuracy in our experiments.  I 
> have created an Indri patch, which adds the search code needed to implement 
> the Indri AND logic as well as Indri's implementation of Dirichlet Smoothing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to