Re: Performance issue.

Yonik Seeley Tue, 05 Dec 2006 18:47:04 -0800

On 12/5/06, Gmail Account <[EMAIL PROTECTED]> wrote:

> There's nothing wrong with CPU jumping to 100% each query, that just
> means you aren't IO bound :-)
What do you mean not IO bound?


There is always going to be a bottleneck somewhere.  In very large
indicies, the bottleneck may be waiting for IO (waiting for data to be
read from the disk).  If you are on a single processor system and you
aren't waiting for data to be read from the disk or the network, then
the request will be using close to 100% CPU, which is actually a good
thing.

The bad thing is how long the query takes, not the fact that it's CPU bound.

>> >    - I did an optimize index through Luke with compound format and
>> > noticed
>> > in the solrconfig file that useCompoundFile is set to false.
>
> Don't do this unless you really know what you are doing... Luke is
> probably using a different version of Lucene than Solr, and it could
> be dangerous.
Do you think I should reindex everything?


That would be the safest thing to do.

> - if you are using filters, any larger than 3000 will be double the
> size (maxDoc bits)
What do you mean larger than 3000? 3000 what and how do I tell?

From solrconfig.xml:

   <!-- This entry enables an int hash representation for filters (DocSets)
        when the number of items in the set is less than maxSize.  For smaller
        sets, this representation is more memory efficient, more efficient to
        iterate over, and faster to take intersections.  -->
   <HashDocSet maxSize="3000" loadFactor="0.75"/>

The key is that the memory consumed by a HashDocSet is independent of
maxDoc (the maximum internal lucene docid), but a BitSet based set has
maxDoc bits in it.  Thus, an unoptimized index with more deleted
documents causes a higher maxDoc and higher memory usage for any
BitSet based filters.

-Yonik

Re: Performance issue.

Reply via email to