Hi,
With help from the group here, I have been able to set up a search
application with payloads enabled. However, there is a noticeable increase
in query response times with payloads as compared to the same queries
without payloads. I am also seeing a lot more disk IO (I have a 7200 rpm
disk) and comparatively lesser cpu usage.

I am guessing this is because of the use of payloadTermQuery and
payloadNearQuery  both of which extend SpanQuery formats. SpanQueries read
the positions index which will be much larger than the index accessed by a
simple TermQuery.

Is there any way of making this system faster without having to distribute
the index. My index size is hardly 1GB (~200k documents and only one field
to search in). I am experiencing query times as high as 2 seconds (average).

Any indications on the direction in which I can experiment will also be very
helpful.

I looked at HathiTrust digital library articles. The methods indicated there
talk about avoiding reading the positions index (converting PhraseQueries to
TermQueries). That will not work in my case because, I still have to read
the positions index to get the payload information during scoring. Let me
know if my understanding is incorrect.


Thanks,
-Raghu

Reply via email to