bq: ...three different files each with a partial set
of data.

WE have to index around 170 metadata. around 120 fields are int he first
file, 50 metadata in the second fiel and 6 on the third file. All the three
files have the same unique key. We use solrj to push these files to solr.
First, we index the first file for the 220 Million records. Then we take
the second file, do a partial update on the existing 220M. then the same is
repeated for the third file.

WE commit in batches. Our batch consist of 20,000 records. Once 5 such
batches are sent to solr, we send a commit to solr from the code. We have
disabled Softcommit. The hardcommit is as below.

     <autoCommit>
       <maxTime>${solr.autoCommit.maxTime:600000}</maxTime>
       <openSearcher>false</openSearcher>
     </autoCommit>


Thanks,
Prasi


On Tue, Oct 22, 2013 at 2:34 PM, Erick Erickson <erickerick...@gmail.com>wrote:

> This is not a lot of data really.
>
> bq: ...three different files each with a partial set
> of data.
>
> OK, what does this mean? Are you importing as CSV files or
> something? Are you trying to commit 10s of M documents at once?
>
> This shouldn't be merging since you're in 4.4 unless you're committing
> far too frequently.
>
> What are your commit settings? Both soft and hard? How are you
> committing?
>
> In short, there's not a lot of information to go on here, you need to
> provide
> a number of details.
>
> Best,
> Erick
>
>
> On Tue, Oct 22, 2013 at 9:25 AM, Prasi S <prasi1...@gmail.com> wrote:
>
> > Hi all,
> > We are using solrcloud 4.4 (solrcloud with external zookeeper, 2 tomcats
> ,
> > 2 solr- 1 in each tomcat) for indexing delimited files. Our index records
> > count to 220 Million. We have three different files each with a partial
> set
> > of data.
> >
> > We index the first file completely. Then the second and thrid files are
> > partial updates.
> >
> > 1. While we are testing the indexing performance, we notice that the solr
> > hangs frequently after 2 days. It just hangs for about an hour or 2 hours
> >  and then if we hit the admin url , it comes back and starts indexing.
> Why
> > does this happen?
> >
> > We have noticed that in the last 12 hours , the hangin was so frequent .
> > almost 6 hours it was just in hanged state.
> >
> > 2. also, commit time also increases for the partial upload.
> >
> >
> > Do we need to tweek any parameter or is it the behavior with Cloud for
> huge
> > volume of data?
> >
> >
> > Thanks,
> > Prasi
> >
>

Reply via email to