How would I write a custom Similarity factor that overrides the TF function? Is there some documentation on that somewhere?
On Sat, Sep 27, 2008 at 5:14 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > > On Sep 26, 2008, at 2:10 PM, Otis Gospodnetic wrote: > >> It might be easiest to store the thread ID and the number of replies in >> the thread in each post Document in Solr. > > Yeah, but that would mean updating every document in a thread every time a > new reply is added. > > I still keep going back to the solution as putting all the replies in a > single document, and then using a custom Similarity factor that overrides > the TF function and/or the length normalization. Still, this suffers from > having to update the document for every new reply. > > Let's take a step back... > > Can I ask why you want the scoring this way? What have you seen in your > results that leads you to believe it is the correct way? Note, I'm not > trying to convince you it's wrong, I just want to better understand what's > going on. > > >> >> >> Otherwise it sounds like you'll have to combine some search results or >> data post-search. >> >> Otis >> -- >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> >> >> >> ----- Original Message ---- >>> >>> From: Jake Conk <[EMAIL PROTECTED]> >>> To: solr-user@lucene.apache.org >>> Sent: Friday, September 26, 2008 1:50:37 PM >>> Subject: Re: Searching Question >>> >>> Grant, >>> >>> Each post is its own document but I can merge them all into a single >>> document under one thread if that will allow me to do what I want. >>> The number of replies is stored both in Solr and the DB. >>> >>> Thanks, >>> >>> - JC >>> >>> On Fri, Sep 26, 2008 at 5:24 AM, Grant Ingersoll wrote: >>>> >>>> Is a thread and all of it's posts a single document? In other words, >>>> how >>>> are you modeling your posts as Solr documents? Also, where are you >>>> keeping >>>> track of the number of replies? Is that in Solr or in a DB? >>>> >>>> -Grant >>>> >>>> On Sep 25, 2008, at 8:51 PM, Jake Conk wrote: >>>> >>>>> Hello, >>>>> >>>>> We are using Solr for our new forums search feature. If possible when >>>>> searching for the word "Halo" we would like threads that contain the >>>>> word "Halo" the most with the least amount of posts in that thread to >>>>> have a higher score. >>>>> >>>>> For instance, if we have a thread with 10 posts and the word "Halo" >>>>> shows up 5 times then that should have a lower score than a thread >>>>> that has the word "Halo" 3 times within its posts and has 5 replies. >>>>> Basically the thread that shows the search string most frequently >>>>> amongst the number of posts in the thread should be the one with the >>>>> highest score. >>>>> >>>>> Is something like this possible? >>>>> >>>>> Thanks, >>>>> >>>>> >