Re: Solr performance issues

Shawn Heisey Mon, 29 Dec 2014 11:42:07 -0800

On 12/29/2014 12:07 PM, Mahmoud Almokadem wrote:
> What do you mean with "important parts of index"? and how to calculate their 
> size?

I have no formal education in what's important when it comes to doing a
query, but I can make some educated guesses.

Starting with this as a reference:

http://lucene.apache.org/core/4_10_0/core/org/apache/lucene/codecs/lucene410/package-summary.html#file-names

I would guess that the segment info (.si) files and the term index
(*.tip) files would be supremely important to *always* have in memory,
and they are fairly small. Next would be the term dictionary (*.tim)
files. The term dictionary is pretty big, and would be very important
for fast queries.

Frequencies, positions, and norms may also be important, depending on
exactly what kind of query you have. Frequencies and positions are
quite large. Frequencies are critical for relevence ranking (the
default sort by score), and positions are important for phrase queries.
Position data may also be used by relevance ranking, but I am not
familiar enough with it to say for sure.

If you have docvalues defined, then *.dvm and *.dvd files would be used
for facets and sorting on those specific fields. The *.dvd files can be
very big, depending on your schema.

The *.fdx and *.fdt files become important when actually retrieving
results after the matching documents have been determined. The stored
data is compressed, so additional CPU power is required to uncompress
that data before it is sent to the client. Stored data may be large or
small, depending on your schema. Stored data does not directly affect
search speed, but if memory space is limited, every block of stored data
that gets retrieved will result in some other part of the index being
removed from the OS disk cache, which means that it might need to be
re-read from the disk on the next query.

Thanks,
Shawn

Re: Solr performance issues

Reply via email to