Re: Performance issue.

Yonik Seeley Wed, 06 Dec 2006 20:13:45 -0800

Your snippet shows it as "text" not "string"
Try faceting on manu_exact and you may get better results.


-Yonil

On 12/6/06, Gmail Account <[EMAIL PROTECTED]> wrote:

It is currently a string type. Here is everything that has to do with manu
in my schema... Should it have been multi-valued? Do you see anything wrong
with this?

<field name="manu" type="text" indexed="true" stored="true"/>
<!-- copied from "manu" via copyField -->
<field name="manu_exact" type="string" indexed="true" stored="true"/>
<field name="text" type="text" indexed="true" stored="false"
multiValued="true"/>
.....

<copyField source="manu" dest="text"/>
<copyField source="manu" dest="manu_exact"/>

Thanks...

----- Original Message -----
From: "Yonik Seeley" <[EMAIL PROTECTED]>
To: <solr-user@lucene.apache.org>
Sent: Wednesday, December 06, 2006 9:55 PM
Subject: Re: Performance issue.


> It is using the cache, but the number of items is larger than the size
> of the cache.
>
> If you want to continue to use the filter method then you need to
> increase the size of the filter cache to something larger than the
> number of unique values than what you are filtering on.  I don't know
> if you will have enough memory to take this approach or not.
>
> The second option is to make brand/manu a non-multi-valued string
> type.  When you do that, Solr will use a different method to calculate
> the facet counts (it will use the FieldCache rather than filters).
> You would need to reindex to try this approach.
>
> -Yonik
>
> On 12/6/06, Gmail Account <[EMAIL PROTECTED]> wrote:
>> I reindexed and optimized and it helped. However now each query averages
>> about 1 second(down from 3-4 seconds). The bottleneck now is the
>> getFacetTermEnumCounts function. If I take that call out it is a non
>> measurable query time and the filtercache is being used. With the
>> getFacetTermEnumCounts in, the filter cache after three queries is below
>> with the hitration at 0 and everything is being evicted. This call is for
>> the brand/manufacturer so I'm sure it is going through many thousands of
>> queries. I'm thinking about pre-processing the brand/manu to get a small
>> set
>> of top brands per category and just quering them no matter what the other
>> facets are set to.(with certain filters, no brands will be shown)  If I
>> still want to call the getFacetTermEnumCounts for ALL brands, why is it
>> not
>> using the cache?
>>
>>
>> lookups : 32849
>> hits : 0
>> hitratio : 0.00
>> inserts : 32850
>> evictions : 32338
>> size : 512
>> cumulative_lookups : 32849
>> cumulative_hits : 0
>> cumulative_hitratio : 0.00
>> cumulative_inserts : 32850
>> cumulative_evictions : 32338
>>
>>
>> Thanks,
>> Mike
>> ----- Original Message -----
>> From: "Yonik Seeley" <[EMAIL PROTECTED]>
>> To: <solr-user@lucene.apache.org>
>> Sent: Tuesday, December 05, 2006 8:46 PM
>> Subject: Re: Performance issue.
>>
>>
>> > On 12/5/06, Gmail Account <[EMAIL PROTECTED]> wrote:
>> >> > There's nothing wrong with CPU jumping to 100% each query, that just
>> >> > means you aren't IO bound :-)
>> >> What do you mean not IO bound?
>> >
>> > There is always going to be a bottleneck somewhere.  In very large
>> > indicies, the bottleneck may be waiting for IO (waiting for data to be
>> > read from the disk).  If you are on a single processor system and you
>> > aren't waiting for data to be read from the disk or the network, then
>> > the request will be using close to 100% CPU, which is actually a good
>> > thing.
>> >
>> > The bad thing is how long the query takes, not the fact that it's CPU
>> > bound.
>> >
>> >> >> >    - I did an optimize index through Luke with compound format
>> >> >> > and
>> >> >> > noticed
>> >> >> > in the solrconfig file that useCompoundFile is set to false.
>> >> >
>> >> > Don't do this unless you really know what you are doing... Luke is
>> >> > probably using a different version of Lucene than Solr, and it could
>> >> > be dangerous.
>> >> Do you think I should reindex everything?
>> >
>> > That would be the safest thing to do.
>> >
>> >> > - if you are using filters, any larger than 3000 will be double the
>> >> > size (maxDoc bits)
>> >> What do you mean larger than 3000? 3000 what and how do I tell?
>> >
>> > From solrconfig.xml:
>> >    <!-- This entry enables an int hash representation for filters
>> > (DocSets)
>> >         when the number of items in the set is less than maxSize.  For
>> > smaller
>> >         sets, this representation is more memory efficient, more
>> > efficient
>> > to
>> >         iterate over, and faster to take intersections.  -->
>> >    <HashDocSet maxSize="3000" loadFactor="0.75"/>
>> >
>> > The key is that the memory consumed by a HashDocSet is independent of
>> > maxDoc (the maximum internal lucene docid), but a BitSet based set has
>> > maxDoc bits in it.  Thus, an unoptimized index with more deleted
>> > documents causes a higher maxDoc and higher memory usage for any
>> > BitSet based filters.
>> >
>> > -Yonik

Re: Performance issue.

Reply via email to