On Thu, Nov 12, 2009 at 3:00 PM, Stephen Duncan Jr <stephen.dun...@gmail.com > wrote:
> On Thu, Nov 12, 2009 at 2:54 PM, Chris Hostetter <hossman_luc...@fucit.org > > wrote: > >> >> oh man, so you were parsing the Stored field values of every matching doc >> at query time? ouch. >> >> Assuming i'm understanding your goal, the conventional way to solve this >> type of problem is "payloads" ... you'll find lots of discussion on it in >> the various Lucene mailing lists, and if you look online Michael Busch has >> various slides that talk about using them. they let you say things >> like "in this document, at this postion of field 'x' the word 'microsoft' >> is worth 37.4, but at this other position (or in this other document) >> 'microsoft' is only worth 17.2" >> >> The simplest way to use them in Solr (as i understand it) is to use >> soemthing like the DelimitedPayloadTokenFilterFactory when indexing, and >> then write yourself >> a simple little custom QParser that generates a BoostingTermQuery on your >> field. >> >> should be a lot simpler to implement then the Query you are describing, >> and much faster. >> >> >> -Hoss >> >> > Thanks. I finally got around to looking at this again today and was looking > at a similar path, so I appreciate the confirmation. > > > -- > Stephen Duncan Jr > www.stephenduncanjr.com > For posterity, here's the rest of what I discovered trying to implement this: You'll need to write a PayloadSimilarity as described here: http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/(here's my updated version due to deprecation of the method mentioned in that article): @Override public float scorePayload( int docId, String fieldName, int start, int end, byte[] payload, int offset, int length) { // can ignore length here, because we know it is encoded as 4 bytes return PayloadHelper.decodeFloat(payload, offset); } You'll need to register that similarity in your Solr schema.xml (was hard to figure out, as I didn't realize that the similarity has to be applied globally to the writer/search used generally, even though I only care about payloads on one field, so I wasted time trying to figure out how to plug in the similarity in my query parser). You'll want to use the "payloads" type or something based on it that's in the example schema.xml. The latest and greatest query type to use is PayloadTermQuery. I use it in my custom query parser class, overriding getFieldQuery, checking for my field name, and then: return new PayloadTermQuery(new Term(field, queryText), new AveragePayloadFunction()); Due to the global nature of the Similarity, I guess you'd have to modify it to look at the field name and base behavior on that if you wanted different kinds of payloads on different fields in one schema. Also, whereas in my original implementation, I controlled the score completely, and therefore if I set a score of 0.8, the doc came back as score of 0.8, in this technique the payload is just used as a boost/addition to the score, so my scores came out higher than before. Since they're still in the same relative order, that still satisfied my needs, but did require updating my test cases. -- Stephen Duncan Jr www.stephenduncanjr.com