Re: Removing duplicate terms from query

2017-02-10 Thread Ere Maijala
Thanks for the insight. You're right, of course, regarding the score calculation. I'll think about it. There are certain cases where the search is human-obviously bad and could be cleaned up, but it's not too easy to write rules for that. --Ere 9.2.2017, 18.37, Walter Underwood kirjoitti: 1.

Re: Removing duplicate terms from query

2017-02-09 Thread Erick Erickson
t, so i never tried it and just overridden the similarity in > place. > > M. > > -Original message- >> From:Alexandre Rafalovitch >> Sent: Thursday 9th February 2017 18:00 >> To: solr-user >> Subject: Re: Removing duplicate terms from query >>

RE: Removing duplicate terms from query

2017-02-09 Thread Markus Jelsma
; Sent: Thursday 9th February 2017 18:00 > To: solr-user > Subject: Re: Removing duplicate terms from query > > Would omitTermFreqAndPositions help here? Though that's probably an > overkill as that disables phrase searches too. I am not sure if it is > possible to do omitTermFreqAn

Re: Removing duplicate terms from query

2017-02-09 Thread Alexandre Rafalovitch
Would omitTermFreqAndPositions help here? Though that's probably an overkill as that disables phrase searches too. I am not sure if it is possible to do omitTermFreqAndPositions=true omitPositions=false to just skip frequencies. Regards, Alex. http://www.solr-start.com/ - Resources for Sol

Re: Removing duplicate terms from query

2017-02-09 Thread Walter Underwood
1. I don’t think this is a good idea. It means that a search for “hey hey hey” won’t score that document higher. 2. Maybe you want to change how tf is calculated. Ignore multiple occurrences of a word. I ran into this with the movie title “New York, New York” at Netflix. It isn’t twice as much

Re: Removing duplicate terms from query

2017-02-09 Thread Ere Maijala
Thanks Emir. I was thinking of something very simple like doing what RemoveDuplicatesTokenFilter does but ignoring positions. It would of course still be possible to have the same term multiple times, but at least the adjacent ones could be deduplicated. The reason I'm not too eager to do it

RE: Removing duplicate terms from query

2017-02-09 Thread Markus Jelsma
ect: Re: Removing duplicate terms from query > > Hi Ere, > > I don't think that there is such filter. Implementing such filter would > require looking backward which violates streaming approach of token > filters and unpredictable memory usage. > > I would do it as par

Re: Removing duplicate terms from query

2017-02-09 Thread Emir Arnautovic
Hi Ere, I don't think that there is such filter. Implementing such filter would require looking backward which violates streaming approach of token filters and unpredictable memory usage. I would do it as part of query preprocessor and not necessarily as part of Solr. HTH, Emir On 09.02.