Hello Hoss and the list, We are currently using Lucene payloads to store per-document-per-keyword scores for our dataset. Our dataset consists of photos with keywords assigned (only once each) to them. The index is about 90 GB, running on 24-core machines with dedicated 10k SAS drives, and 16/32 GB allocated to the JVM.
When searching the payloads field, our 98 percentile query time is at 2 seconds even with trivially low queries per second. I have asked several Lucene committers about this and it's believed that the implementation of payloads being so general is the cause of the slowness. Hoss guessed that we could override Term Frequency with PreAnalyzedField[1] for the per-keyword scores, since keywords (tags) always have a Term Frequency of 1 and the TF calculation is very fast. However it turns out that you can't[2] specify TF in the PreAnalyzedField. Is there any other way to override Term Frequency during index time? If not, where in the code could this be implemented? An obvious option is to repeat the keyword as many times as its payload score, but that would drastically increase the amount of data per document sent during index time. I'd welcome any other per-document-per-keyword score solutions, or some way to speed up searching a payload field. Thanks, - Neil [1] https://issues.apache.org/jira/browse/SOLR-1535 [2] https://issues.apache.org/jira/browse/SOLR-1535?focusedCommentId=13273501#comment-13273501