Hello Neil,

if "manipulating tf" is a possible approach, why don't extend
KeywordTokenizer to make it work in the following manner:

"3|wheel" -> {wheel,wheel,wheel}

it will allow supply your per-term-per-doc boosts as a prefixes for field
values and multiply them during indexing internally.

The second consideration is - have you considered Click Scoring Tools from
lucidworks as a relevant approach?

Regards

On Wed, May 16, 2012 at 12:02 AM, Neil Hooey <nho...@gmail.com> wrote:

> Hello Hoss and the list,
>
> We are currently using Lucene payloads to store per-document-per-keyword
> scores for our dataset. Our dataset consists of photos with keywords
> assigned (only once each) to them. The index is about 90 GB, running on
> 24-core machines with dedicated 10k SAS drives, and 16/32 GB allocated to
> the JVM.
>
> When searching the payloads field, our 98 percentile query time is at 2
> seconds even with trivially low queries per second. I have asked several
> Lucene committers about this and it's believed that the implementation of
> payloads being so general is the cause of the slowness.
>
> Hoss guessed that we could override Term Frequency with PreAnalyzedField[1]
> for the per-keyword scores, since keywords (tags) always have a Term
> Frequency of 1 and the TF calculation is very fast. However it turns out
> that you can't[2] specify TF in the PreAnalyzedField.
>
> Is there any other way to override Term Frequency during index time? If
> not, where in the code could this be implemented?
>
> An obvious option is to repeat the keyword as many times as its payload
> score, but that would drastically increase the amount of data per document
> sent during index time.
>
> I'd welcome any other per-document-per-keyword score solutions, or some way
> to speed up searching a payload field.
>
> Thanks,
>
> - Neil
>
> [1] https://issues.apache.org/jira/browse/SOLR-1535
> [2]
>
> https://issues.apache.org/jira/browse/SOLR-1535?focusedCommentId=13273501#comment-13273501
>



-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

<http://www.griddynamics.com>
 <mkhlud...@griddynamics.com>

Reply via email to