RE: Boost matches occurring early in the field (offset)

Markus Jelsma Wed, 29 Aug 2018 13:52:04 -0700

Hello Jan,

Many years ago i made an extension of SpanFirstQuery called 
GradientSpanFirstQuery that did just that, decrease the boost for each advanced 
position in the text. Then Lucene 4 or 5 came and this code wouldn't compile 
any more.


  @Override
  protected AcceptStatus acceptPosition(Spans spans) throws IOException {
    assert spans.startPosition() != spans.endPosition() : "start equals end: " 
+ spans.startPosition();
    if (spans.startPosition() >= end) {
      return AcceptStatus.NO_MORE_IN_CURRENT_DOC;
    }
    
    else if (spans.endPosition() <= end) {
      super.setBoost(this.boost / (spans.endPosition() / fraction));
      return AcceptStatus.YES;
    } else {
      return AcceptStatus.NO;
    }
  }

We never actually used this class in production but i did ask either the Lucene 
or Solr list what could be done to quick fix something i didn't use anyway. 

Despite thread being public, i cannot find it. But i do remember, probably 
Adrien Grand saying, i had to implement a custom scorer to get the class back 
to work. 

Hope it helps.
Markus
 
-----Original message-----
> From:Jan Høydahl <jan....@cominvent.com>
> Sent: Wednesday 29th August 2018 22:18
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: Boost matches occurring early in the field (offset)
> 
> I also tend to use "sentinel tokens" for exact match or to anchor a search. 
> But in order to obtain decaying boost the further down in the article a match 
> is, you'd need to write several such span/slop queries with varying slops, 
> e.g. highest boost for first 10 words, medium boost for first 50 words, low 
> boost for first 150 words, no boost below that.
> 
> As I wrote in my initial mail, we can do such workarounds, or play with 
> payloads etc. But my real question is whether/how it is possible to factor 
> the actual term offset information from a matching term into the scoring 
> algorithm? Would you need to implement your own Scorer/Weight impl?
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
> > 29. aug. 2018 kl. 15:37 skrev Doug Turnbull 
> > <dturnb...@opensourceconnections.com>:
> > 
> > You can also insert a token at the beginning of the query during analysis
> > using a char filter. I call these sort of boundary tokens "sentinel
> > tokens". So a phrase search for "red shoes" becomes "<SENT_BEG> red shoes".
> > You can add some slop to allow for permissible distance (with
> > 
> > You can also use the Limit Token Count Token Filter and create a copyField,
> > so if you want to boost on first 10 matches, just limit to 10 tokens then
> > use this as a boost query
> > https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-LimitTokenCountFilter
> > 
> > -Doug
> > 
> > On Wed, Aug 29, 2018 at 6:26 AM Mikhail Khludnev <m...@apache.org> wrote:
> > 
> >> <SpanFirst>
> >> <
> >> https://lucene.apache.org/solr/guide/6_6/other-parsers.html#OtherParsers-XMLQueryParser
> >>> 
> >> 
> >> On Wed, Aug 29, 2018 at 1:19 PM Jan Høydahl <jan....@cominvent.com> wrote:
> >> 
> >>> Hi,
> >>> 
> >>> Is there an ootb way to boost term matches based on their position/offset
> >>> inside a field, so that the term gets a higher score if it occurs in the
> >>> befinning of the field and lower boost or a deboost if it occurs towards
> >>> the end of a field?
> >>> 
> >>> I know that I could index the first part of the text in a new field and
> >>> boost on that, but that is kind of "binary".
> >>> I could also add the term offset as payload for every term and boost on
> >>> that, but this should not be necessary since offset info is already part
> >> of
> >>> the index?
> >>> 
> >>> --
> >>> Jan Høydahl, search solution architect
> >>> Cominvent AS - www.cominvent.com
> >>> 
> >>> 
> >> 
> >> --
> >> Sincerely yours
> >> Mikhail Khludnev
> >> 
> > -- 
> > CTO, OpenSource Connections
> > Author, Relevant Search
> > http://o19s.com/doug
> 
>

RE: Boost matches occurring early in the field (offset)

Reply via email to