I have considerd this problem and tried to solve it using 2 methods
By these methods, we also can boost a doc by the relative positions of
query terms.
1: add term Position when indexing
modify TermScorer.score
public float score() {
assert doc != -1;
int f = freqs[pointer];
float raw = // compute tf(f)*weight
f < SCORE_CACHE_SIZE // check cache
? scoreCache[f] // cache hit
: getSimilarity().tf(f)*weightValue; // cache miss
//modified by LiLi
try {
int[] positions=this.getPositions(f);
float positionBoost=1.0f;
for(int pos:positions){
positionBoost*=this.getPositionBoost(pos);
}
raw*=positionBoost;
} catch (IOException e) {
}
//modified
return norms == null ? raw : raw * SIM_NORM_DECODER[norms[doc] &
0xFF]; // normalize for field
}
private int[] getPositions(int f) throws IOException{
termPositions.skipTo(doc);
int[] positions=new int[f];
int docId = termPositions.doc();
assert docId==doc;
int tf=termPositions.freq();
assert tf==f;
for(int i=0;i<tf;i++){
positions[i]=termPositions.nextPosition();
}
return positions;
}
Then you must pass a TermPositions
termPositions=reader.termPositions(term); to it. I modified this
construction of TermScorer to add this param.
2. use payload
I tried to use payload to save whether a term occured in first 128
positions by a bitset. This method save more space than first one.
Then Using my Similarity:
public float scorePayload(int docID, String fieldName, int start, int
end,
byte[] payload, int offset, int length) {
if (payload != null) {
float boost = 1.0F;
int firstOccur=PayloadHelper.decodeInt(payload, 0);
BitSet bitSet=MyAnalyzer.fromByteArray(payload,
4,length-4);
for(int i=0;i<bitSet.length();i++){
if(bitSet.get(i)){
boost*=positionBoost[i];
}
}
return boost;
} else {
return 1.0F;
}
}
2010/7/20 Papiya Misra <[email protected]>:
> I need to make sure that documents with the search term occurring
> towards the beginning of the document are ranked higher.
>
> For example,
>
> Search term : ox
> Doc 1: box fox ox
> Doc 2: ox box fox
>
> Result: Doc2 will be ranked higher than Doc1.
>
> The solution I can think of is sorting by term position (after enabling
> term vectors). Is that the best way to go about it ?
>
> Thanks
> Papiya
>
>
> Pink OTC Markets Inc. provides the leading inter-dealer quotation and
> trading system in the over-the-counter (OTC) securities market. We create
> innovative technology and data solutions to efficiently connect market
> participants, improve price discovery, increase issuer disclosure, and
> better inform investors. Our marketplace, comprised of the issuer-listed
> OTCQX and broker-quoted Pink Sheets, is the third largest U.S. equity
> trading venue for company shares.
>
> This document contains confidential information of Pink OTC Markets and is
> only intended for the recipient. Do not copy, reproduce (electronically or
> otherwise), or disclose without the prior written consent of Pink OTC
> Markets. If you receive this message in error, please destroy all
> copies in your possession (electronically or otherwise) and contact the
> sender above.
>