Hello Jan,
Many years ago i made an extension of SpanFirstQuery called
GradientSpanFirstQuery that did just that, decrease the boost for each advanced
position in the text. Then Lucene 4 or 5 came and this code wouldn't compile
any more.
@Override
protected AcceptStatus acceptPosition(Spans spans) throws IOException {
assert spans.startPosition() != spans.endPosition() : "start equals end: "
+ spans.startPosition();
if (spans.startPosition() >= end) {
return AcceptStatus.NO_MORE_IN_CURRENT_DOC;
}
else if (spans.endPosition() <= end) {
super.setBoost(this.boost / (spans.endPosition() / fraction));
return AcceptStatus.YES;
} else {
return AcceptStatus.NO;
}
}
We never actually used this class in production but i did ask either the Lucene
or Solr list what could be done to quick fix something i didn't use anyway.
Despite thread being public, i cannot find it. But i do remember, probably
Adrien Grand saying, i had to implement a custom scorer to get the class back
to work.
Hope it helps.
Markus
-----Original message-----
> From:Jan Høydahl <[email protected]>
> Sent: Wednesday 29th August 2018 22:18
> To: solr-user <[email protected]>
> Subject: Re: Boost matches occurring early in the field (offset)
>
> I also tend to use "sentinel tokens" for exact match or to anchor a search.
> But in order to obtain decaying boost the further down in the article a match
> is, you'd need to write several such span/slop queries with varying slops,
> e.g. highest boost for first 10 words, medium boost for first 50 words, low
> boost for first 150 words, no boost below that.
>
> As I wrote in my initial mail, we can do such workarounds, or play with
> payloads etc. But my real question is whether/how it is possible to factor
> the actual term offset information from a matching term into the scoring
> algorithm? Would you need to implement your own Scorer/Weight impl?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 29. aug. 2018 kl. 15:37 skrev Doug Turnbull
> > <[email protected]>:
> >
> > You can also insert a token at the beginning of the query during analysis
> > using a char filter. I call these sort of boundary tokens "sentinel
> > tokens". So a phrase search for "red shoes" becomes "<SENT_BEG> red shoes".
> > You can add some slop to allow for permissible distance (with
> >
> > You can also use the Limit Token Count Token Filter and create a copyField,
> > so if you want to boost on first 10 matches, just limit to 10 tokens then
> > use this as a boost query
> > https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-LimitTokenCountFilter
> >
> > -Doug
> >
> > On Wed, Aug 29, 2018 at 6:26 AM Mikhail Khludnev <[email protected]> wrote:
> >
> >> <SpanFirst>
> >> <
> >> https://lucene.apache.org/solr/guide/6_6/other-parsers.html#OtherParsers-XMLQueryParser
> >>>
> >>
> >> On Wed, Aug 29, 2018 at 1:19 PM Jan Høydahl <[email protected]> wrote:
> >>
> >>> Hi,
> >>>
> >>> Is there an ootb way to boost term matches based on their position/offset
> >>> inside a field, so that the term gets a higher score if it occurs in the
> >>> befinning of the field and lower boost or a deboost if it occurs towards
> >>> the end of a field?
> >>>
> >>> I know that I could index the first part of the text in a new field and
> >>> boost on that, but that is kind of "binary".
> >>> I could also add the term offset as payload for every term and boost on
> >>> that, but this should not be necessary since offset info is already part
> >> of
> >>> the index?
> >>>
> >>> --
> >>> Jan Høydahl, search solution architect
> >>> Cominvent AS - www.cominvent.com
> >>>
> >>>
> >>
> >> --
> >> Sincerely yours
> >> Mikhail Khludnev
> >>
> > --
> > CTO, OpenSource Connections
> > Author, Relevant Search
> > http://o19s.com/doug
>
>