Prasi, as per the ticket I linked to earlier, I was running into GC settings. May be worth investigating - and take a look at the GC settings I'm running with in the ticket.
Cheers, Chris On 22 October 2013 10:25, Prasi S <prasi1...@gmail.com> wrote: > bq: ...three different files each with a partial set > of data. > > WE have to index around 170 metadata. around 120 fields are int he first > file, 50 metadata in the second fiel and 6 on the third file. All the three > files have the same unique key. We use solrj to push these files to solr. > First, we index the first file for the 220 Million records. Then we take > the second file, do a partial update on the existing 220M. then the same is > repeated for the third file. > > WE commit in batches. Our batch consist of 20,000 records. Once 5 such > batches are sent to solr, we send a commit to solr from the code. We have > disabled Softcommit. The hardcommit is as below. > > <autoCommit> > <maxTime>${solr.autoCommit.maxTime:600000}</maxTime> > <openSearcher>false</openSearcher> > </autoCommit> > > > Thanks, > Prasi > > > On Tue, Oct 22, 2013 at 2:34 PM, Erick Erickson <erickerick...@gmail.com > >wrote: > > > This is not a lot of data really. > > > > bq: ...three different files each with a partial set > > of data. > > > > OK, what does this mean? Are you importing as CSV files or > > something? Are you trying to commit 10s of M documents at once? > > > > This shouldn't be merging since you're in 4.4 unless you're committing > > far too frequently. > > > > What are your commit settings? Both soft and hard? How are you > > committing? > > > > In short, there's not a lot of information to go on here, you need to > > provide > > a number of details. > > > > Best, > > Erick > > > > > > On Tue, Oct 22, 2013 at 9:25 AM, Prasi S <prasi1...@gmail.com> wrote: > > > > > Hi all, > > > We are using solrcloud 4.4 (solrcloud with external zookeeper, 2 > tomcats > > , > > > 2 solr- 1 in each tomcat) for indexing delimited files. Our index > records > > > count to 220 Million. We have three different files each with a partial > > set > > > of data. > > > > > > We index the first file completely. Then the second and thrid files are > > > partial updates. > > > > > > 1. While we are testing the indexing performance, we notice that the > solr > > > hangs frequently after 2 days. It just hangs for about an hour or 2 > hours > > > and then if we hit the admin url , it comes back and starts indexing. > > Why > > > does this happen? > > > > > > We have noticed that in the last 12 hours , the hangin was so frequent > . > > > almost 6 hours it was just in hanged state. > > > > > > 2. also, commit time also increases for the partial upload. > > > > > > > > > Do we need to tweek any parameter or is it the behavior with Cloud for > > huge > > > volume of data? > > > > > > > > > Thanks, > > > Prasi > > > > > >