The easiest thing is to look at Lucene javadoc and look for Similarity and DefaultSimilarity classes. Then have a peek at Lucene contrib to get some other examples of custom Similarity. You'll just need to override one method, for example:
-- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Jake Conk <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Tuesday, September 30, 2008 3:11:01 PM > Subject: Re: Searching Question > > How would I write a custom Similarity factor that overrides the TF > function? Is there some documentation on that somewhere? > > On Sat, Sep 27, 2008 at 5:14 AM, Grant Ingersoll wrote: > > > > On Sep 26, 2008, at 2:10 PM, Otis Gospodnetic wrote: > > > >> It might be easiest to store the thread ID and the number of replies in > >> the thread in each post Document in Solr. > > > > Yeah, but that would mean updating every document in a thread every time a > > new reply is added. > > > > I still keep going back to the solution as putting all the replies in a > > single document, and then using a custom Similarity factor that overrides > > the TF function and/or the length normalization. Still, this suffers from > > having to update the document for every new reply. > > > > Let's take a step back... > > > > Can I ask why you want the scoring this way? What have you seen in your > > results that leads you to believe it is the correct way? Note, I'm not > > trying to convince you it's wrong, I just want to better understand what's > > going on. > > > > > >> > >> > >> Otherwise it sounds like you'll have to combine some search results or > >> data post-search. > >> > >> Otis > >> -- > >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > >> > >> > >> > >> ----- Original Message ---- > >>> > >>> From: Jake Conk > >>> To: solr-user@lucene.apache.org > >>> Sent: Friday, September 26, 2008 1:50:37 PM > >>> Subject: Re: Searching Question > >>> > >>> Grant, > >>> > >>> Each post is its own document but I can merge them all into a single > >>> document under one thread if that will allow me to do what I want. > >>> The number of replies is stored both in Solr and the DB. > >>> > >>> Thanks, > >>> > >>> - JC > >>> > >>> On Fri, Sep 26, 2008 at 5:24 AM, Grant Ingersoll wrote: > >>>> > >>>> Is a thread and all of it's posts a single document? In other words, > >>>> how > >>>> are you modeling your posts as Solr documents? Also, where are you > >>>> keeping > >>>> track of the number of replies? Is that in Solr or in a DB? > >>>> > >>>> -Grant > >>>> > >>>> On Sep 25, 2008, at 8:51 PM, Jake Conk wrote: > >>>> > >>>>> Hello, > >>>>> > >>>>> We are using Solr for our new forums search feature. If possible when > >>>>> searching for the word "Halo" we would like threads that contain the > >>>>> word "Halo" the most with the least amount of posts in that thread to > >>>>> have a higher score. > >>>>> > >>>>> For instance, if we have a thread with 10 posts and the word "Halo" > >>>>> shows up 5 times then that should have a lower score than a thread > >>>>> that has the word "Halo" 3 times within its posts and has 5 replies. > >>>>> Basically the thread that shows the search string most frequently > >>>>> amongst the number of posts in the thread should be the one with the > >>>>> highest score. > >>>>> > >>>>> Is something like this possible? > >>>>> > >>>>> Thanks, > >>>>> > >>>>> > >