On 28-Sep-07, at 6:31 AM, Grant Ingersoll wrote:

Another option would be to extend Solr (and donate back) to incorporate Lucene's payload functionality, in which case you could associate the percentile of the color as a payload and use the BoostingTermQuery... :-) If you're interested in this, a discussion on solr-dev is probably warranted to figure out the best way to do this.

For reference, here is a summary of the changes needed:

1. A payload analyzer (here is an example that tokenizes strings of <token>:<whatever>:<score> into <token> with payload <score>:

  /** Returns the next token in the stream, or null at EOS. */
  public final Token next() throws IOException {
    Token t = input.next();
    if (null == t)
      return null;

    String s = t.termText();
    if(s.indexOf(":") > -1 ) {
      String []parts = s.split(":");
      assert parts.length == 3;
      String colour = parts[0];
      int bits = Float.floatToIntBits(Float.parseFloat(parts[1]));
      byte []buf = new byte[4];
      for(int shift=0, i=0; shift < 32; shift += 8, i++) {
        buf[i] = (byte)( (bits>>shift) & 0xff );
      }
      Token gen = new Token(colour, t.startOffset(), t.endOffset());
      gen.setPayload(new Payload(buf));
      t = gen;
    }
    return t;

  }


2. A payload deserializer. Add this method to your custom Similarity class:

  public float scorePayload(byte [] payload, int offset, int length) {
    assert length == 4;
    int accum = ((payload[0+offset]&0xff)) |
                ((payload[1+offset]&0xff)<<8) |
                ((payload[2+offset]&0xff)<<16)  |
                ((payload[3+offset]&0xff)<<24);
    return Float.intBitsToFloat(accum);
 }

3. Add a relevant query clause. In a custom request handler, you could have a parameter to add BoostingTermQueries:

 q= new BoostingTermQuery(new Term("colourPayload", colour))
query.add(q, Occur.SHOULD);

How to add this generically is an interesting question. There are many possibilities, especially on the request handler and tokenizer side of things. If there is a consensus on a sensible way of doing this, I could contribute the bits of code that I have.

HTH,
-Mike

Reply via email to