Ryan:

As it happens, there's a discssion on the dev list about this.

If at all possible, could you try a brief experiment? Turn off
all the storage, i.e. set stored="false" on all fields. It's a lot
to ask, but it'd help the discussion.

Or join the discussion at https://issues.apache.org/jira/browse/LUCENE-5914.

Best,
Erick

On Thu, Sep 4, 2014 at 1:08 AM, Shawn Heisey <s...@elyograg.org> wrote:
> On 9/3/2014 8:14 PM, Li, Ryan wrote:
>> I have a Solr server  indexes 2500 documents (up to 50MB each, ave 3MB) to 
>> Solr server. When running on Solr 4.0 I managed to finish index in 3 hours.
>>
>> However after we upgrade to Solr 4.9, the index need 3 days to finish.
>>
>> I've done some profiling, numbers I get are:
>> size figure of document,    time for adding to Solr server (4.0), time for 
>> adding to Solr server (4.9)
>> 1.18,                                   6 sec,                               
>>                     123 sec
>> 2.26                                   12sec                                 
>>                   444 sec
>> 3.35                                   18sec                                 
>>                   over 600 sec
>> 9.65                                    46sec                                
>>                   timeout.
>>
>> From what I can see index seems has an o(n) performance for Solr 4.0 and is 
>> almost o(log n) for Solr 4.9. I also tried to comment out some copied fields 
>> to narrow down the problem, seems size of the document after index(we copy 
>> fields and the more fields we copy, the bigger the index size is)  is the 
>> dominating factor for index time.
>>
>> Just wondering has any one experience similar problem? Does that sound like 
>> a bug of Solr or just we have use Solr 4.9 wrong?
>
> One possible source of problems with that particular upgrade is the fact
> that stored field compression was added in 4.1, and termvector
> compression was added in 4.2.  They are on by default and cannot be
> turned off.  The compression is typically fast, but with very large
> documents like yours, it might result in pretty major computational
> overhead.  It can also require additional java heap, which ties into
> what follows:
>
> Another problem might be RAM-related.
>
> If your java heap is very large, or just a little bit too small, there
> can be major performance issues from garbage collection.  Based on the
> fact that the earlier version performed well, a too-small heap is more
> likely than a very large heap.
>
> If your index size is such that it can't be effectively cached by the
> amount of total RAM on the machine (minus the java heap assigned to
> Solr), that can cause performance problems.  Your index size is likely
> to be several gigabytes, and might even reach double-digit gigabytes.
> Can you relate those numbers -- index size, java heap size, and total
> system RAM?  If you can, it would also be a good idea to share your
> solrconfig.xml.
>
> Here's a wiki page that goes into more detail about possible performance
> issues.  It doesn't mention the possible compression problem:
>
> http://wiki.apache.org/solr/SolrPerformanceProblems
>
> Thanks,
> Shawn
>

Reply via email to