Hello Jan, Many years ago i made an extension of SpanFirstQuery called GradientSpanFirstQuery that did just that, decrease the boost for each advanced position in the text. Then Lucene 4 or 5 came and this code wouldn't compile any more.
@Override protected AcceptStatus acceptPosition(Spans spans) throws IOException { assert spans.startPosition() != spans.endPosition() : "start equals end: " + spans.startPosition(); if (spans.startPosition() >= end) { return AcceptStatus.NO_MORE_IN_CURRENT_DOC; } else if (spans.endPosition() <= end) { super.setBoost(this.boost / (spans.endPosition() / fraction)); return AcceptStatus.YES; } else { return AcceptStatus.NO; } } We never actually used this class in production but i did ask either the Lucene or Solr list what could be done to quick fix something i didn't use anyway. Despite thread being public, i cannot find it. But i do remember, probably Adrien Grand saying, i had to implement a custom scorer to get the class back to work. Hope it helps. Markus -----Original message----- > From:Jan Høydahl <jan....@cominvent.com> > Sent: Wednesday 29th August 2018 22:18 > To: solr-user <solr-user@lucene.apache.org> > Subject: Re: Boost matches occurring early in the field (offset) > > I also tend to use "sentinel tokens" for exact match or to anchor a search. > But in order to obtain decaying boost the further down in the article a match > is, you'd need to write several such span/slop queries with varying slops, > e.g. highest boost for first 10 words, medium boost for first 50 words, low > boost for first 150 words, no boost below that. > > As I wrote in my initial mail, we can do such workarounds, or play with > payloads etc. But my real question is whether/how it is possible to factor > the actual term offset information from a matching term into the scoring > algorithm? Would you need to implement your own Scorer/Weight impl? > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > > > 29. aug. 2018 kl. 15:37 skrev Doug Turnbull > > <dturnb...@opensourceconnections.com>: > > > > You can also insert a token at the beginning of the query during analysis > > using a char filter. I call these sort of boundary tokens "sentinel > > tokens". So a phrase search for "red shoes" becomes "<SENT_BEG> red shoes". > > You can add some slop to allow for permissible distance (with > > > > You can also use the Limit Token Count Token Filter and create a copyField, > > so if you want to boost on first 10 matches, just limit to 10 tokens then > > use this as a boost query > > https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-LimitTokenCountFilter > > > > -Doug > > > > On Wed, Aug 29, 2018 at 6:26 AM Mikhail Khludnev <m...@apache.org> wrote: > > > >> <SpanFirst> > >> < > >> https://lucene.apache.org/solr/guide/6_6/other-parsers.html#OtherParsers-XMLQueryParser > >>> > >> > >> On Wed, Aug 29, 2018 at 1:19 PM Jan Høydahl <jan....@cominvent.com> wrote: > >> > >>> Hi, > >>> > >>> Is there an ootb way to boost term matches based on their position/offset > >>> inside a field, so that the term gets a higher score if it occurs in the > >>> befinning of the field and lower boost or a deboost if it occurs towards > >>> the end of a field? > >>> > >>> I know that I could index the first part of the text in a new field and > >>> boost on that, but that is kind of "binary". > >>> I could also add the term offset as payload for every term and boost on > >>> that, but this should not be necessary since offset info is already part > >> of > >>> the index? > >>> > >>> -- > >>> Jan Høydahl, search solution architect > >>> Cominvent AS - www.cominvent.com > >>> > >>> > >> > >> -- > >> Sincerely yours > >> Mikhail Khludnev > >> > > -- > > CTO, OpenSource Connections > > Author, Relevant Search > > http://o19s.com/doug > >