On 28-Sep-07, at 6:31 AM, Grant Ingersoll wrote:
Another option would be to extend Solr (and donate back) to
incorporate Lucene's payload functionality, in which case you could
associate the percentile of the color as a payload and use the
BoostingTermQuery... :-) If you're interested in this, a
discussion on solr-dev is probably warranted to figure out the best
way to do this.
For reference, here is a summary of the changes needed:
1. A payload analyzer (here is an example that tokenizes strings of
<token>:<whatever>:<score> into <token> with payload <score>:
/** Returns the next token in the stream, or null at EOS. */
public final Token next() throws IOException {
Token t = input.next();
if (null == t)
return null;
String s = t.termText();
if(s.indexOf(":") > -1 ) {
String []parts = s.split(":");
assert parts.length == 3;
String colour = parts[0];
int bits = Float.floatToIntBits(Float.parseFloat(parts[1]));
byte []buf = new byte[4];
for(int shift=0, i=0; shift < 32; shift += 8, i++) {
buf[i] = (byte)( (bits>>shift) & 0xff );
}
Token gen = new Token(colour, t.startOffset(), t.endOffset());
gen.setPayload(new Payload(buf));
t = gen;
}
return t;
}
2. A payload deserializer. Add this method to your custom Similarity
class:
public float scorePayload(byte [] payload, int offset, int length) {
assert length == 4;
int accum = ((payload[0+offset]&0xff)) |
((payload[1+offset]&0xff)<<8) |
((payload[2+offset]&0xff)<<16) |
((payload[3+offset]&0xff)<<24);
return Float.intBitsToFloat(accum);
}
3. Add a relevant query clause. In a custom request handler, you
could have a parameter to add BoostingTermQueries:
q= new BoostingTermQuery(new Term("colourPayload", colour))
query.add(q, Occur.SHOULD);
How to add this generically is an interesting question. There are
many possibilities, especially on the request handler and tokenizer
side of things. If there is a consensus on a sensible way of doing
this, I could contribute the bits of code that I have.
HTH,
-Mike