Hello Neil, if "manipulating tf" is a possible approach, why don't extend KeywordTokenizer to make it work in the following manner:
"3|wheel" -> {wheel,wheel,wheel} it will allow supply your per-term-per-doc boosts as a prefixes for field values and multiply them during indexing internally. The second consideration is - have you considered Click Scoring Tools from lucidworks as a relevant approach? Regards On Wed, May 16, 2012 at 12:02 AM, Neil Hooey <nho...@gmail.com> wrote: > Hello Hoss and the list, > > We are currently using Lucene payloads to store per-document-per-keyword > scores for our dataset. Our dataset consists of photos with keywords > assigned (only once each) to them. The index is about 90 GB, running on > 24-core machines with dedicated 10k SAS drives, and 16/32 GB allocated to > the JVM. > > When searching the payloads field, our 98 percentile query time is at 2 > seconds even with trivially low queries per second. I have asked several > Lucene committers about this and it's believed that the implementation of > payloads being so general is the cause of the slowness. > > Hoss guessed that we could override Term Frequency with PreAnalyzedField[1] > for the per-keyword scores, since keywords (tags) always have a Term > Frequency of 1 and the TF calculation is very fast. However it turns out > that you can't[2] specify TF in the PreAnalyzedField. > > Is there any other way to override Term Frequency during index time? If > not, where in the code could this be implemented? > > An obvious option is to repeat the keyword as many times as its payload > score, but that would drastically increase the amount of data per document > sent during index time. > > I'd welcome any other per-document-per-keyword score solutions, or some way > to speed up searching a payload field. > > Thanks, > > - Neil > > [1] https://issues.apache.org/jira/browse/SOLR-1535 > [2] > > https://issues.apache.org/jira/browse/SOLR-1535?focusedCommentId=13273501#comment-13273501 > -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics <http://www.griddynamics.com> <mkhlud...@griddynamics.com>