Ryan: As it happens, there's a discssion on the dev list about this.
If at all possible, could you try a brief experiment? Turn off all the storage, i.e. set stored="false" on all fields. It's a lot to ask, but it'd help the discussion. Or join the discussion at https://issues.apache.org/jira/browse/LUCENE-5914. Best, Erick On Thu, Sep 4, 2014 at 1:08 AM, Shawn Heisey <s...@elyograg.org> wrote: > On 9/3/2014 8:14 PM, Li, Ryan wrote: >> I have a Solr server indexes 2500 documents (up to 50MB each, ave 3MB) to >> Solr server. When running on Solr 4.0 I managed to finish index in 3 hours. >> >> However after we upgrade to Solr 4.9, the index need 3 days to finish. >> >> I've done some profiling, numbers I get are: >> size figure of document, time for adding to Solr server (4.0), time for >> adding to Solr server (4.9) >> 1.18, 6 sec, >> 123 sec >> 2.26 12sec >> 444 sec >> 3.35 18sec >> over 600 sec >> 9.65 46sec >> timeout. >> >> From what I can see index seems has an o(n) performance for Solr 4.0 and is >> almost o(log n) for Solr 4.9. I also tried to comment out some copied fields >> to narrow down the problem, seems size of the document after index(we copy >> fields and the more fields we copy, the bigger the index size is) is the >> dominating factor for index time. >> >> Just wondering has any one experience similar problem? Does that sound like >> a bug of Solr or just we have use Solr 4.9 wrong? > > One possible source of problems with that particular upgrade is the fact > that stored field compression was added in 4.1, and termvector > compression was added in 4.2. They are on by default and cannot be > turned off. The compression is typically fast, but with very large > documents like yours, it might result in pretty major computational > overhead. It can also require additional java heap, which ties into > what follows: > > Another problem might be RAM-related. > > If your java heap is very large, or just a little bit too small, there > can be major performance issues from garbage collection. Based on the > fact that the earlier version performed well, a too-small heap is more > likely than a very large heap. > > If your index size is such that it can't be effectively cached by the > amount of total RAM on the machine (minus the java heap assigned to > Solr), that can cause performance problems. Your index size is likely > to be several gigabytes, and might even reach double-digit gigabytes. > Can you relate those numbers -- index size, java heap size, and total > system RAM? If you can, it would also be a good idea to share your > solrconfig.xml. > > Here's a wiki page that goes into more detail about possible performance > issues. It doesn't mention the possible compression problem: > > http://wiki.apache.org/solr/SolrPerformanceProblems > > Thanks, > Shawn >