about 2.8 m total docs were created. only the first run finishes. In my 2nd try, it hangs there forever at the end of indexing, (I guess right before commit), with cpu usage of 100%. Total 5G (2050) index files are created. Now I have two problems: 1. why it hangs there and failed? 2. how can i speed up the indexing?
Here is my solrconfig.xml <useCompoundFile>false</useCompoundFile> <ramBufferSizeMB>3000</ramBufferSizeMB> <mergeFactor>1000</mergeFactor> <maxMergeDocs>2147483647</maxMergeDocs> <maxFieldLength>10000</maxFieldLength> <unlockOnStartup>false</unlockOnStartup> --- On Thu, 5/21/09, Noble Paul നോബിള് नोब्ळ् <noble.p...@corp.aol.com> wrote: > From: Noble Paul നോബിള് नोब्ळ् <noble.p...@corp.aol.com> > Subject: Re: How to index large set data > To: solr-user@lucene.apache.org > Date: Thursday, May 21, 2009, 10:39 PM > what is the total no:of docs created > ? I guess it may not be memory > bound. indexing is mostly amn IO bound operation. You may > be able to > get a better perf if a SSD is used (solid state disk) > > On Fri, May 22, 2009 at 10:46 AM, Jianbin Dai <djian...@yahoo.com> > wrote: > > > > Hi Paul, > > > > Thank you so much for answering my questions. It > really helped. > > After some adjustment, basically setting mergeFactor > to 1000 from the default value of 10, I can finished the > whole job in 2.5 hours. I checked that during running time, > only around 18% of memory is being used, and VIRT is always > 1418m. I am thinking it may be restricted by JVM memory > setting. But I run the data import command through web, > i.e., > > > http://<host>:<port>/solr/dataimport?command=full-import, > how can I set the memory allocation for JVM? > > Thanks again! > > > > JB > > > > --- On Thu, 5/21/09, Noble Paul നോബിള് > नोब्ळ् <noble.p...@corp.aol.com> > wrote: > > > >> From: Noble Paul നോബിള് > नोब्ळ् <noble.p...@corp.aol.com> > >> Subject: Re: How to index large set data > >> To: solr-user@lucene.apache.org > >> Date: Thursday, May 21, 2009, 9:57 PM > >> check the status page of DIH and see > >> if it is working properly. and > >> if, yes what is the rate of indexing > >> > >> On Thu, May 21, 2009 at 11:48 AM, Jianbin Dai > <djian...@yahoo.com> > >> wrote: > >> > > >> > Hi, > >> > > >> > I have about 45GB xml files to be indexed. I > am using > >> DataImportHandler. I started the full import 4 > hours ago, > >> and it's still running.... > >> > My computer has 4GB memory. Any suggestion on > the > >> solutions? > >> > Thanks! > >> > > >> > JB > >> > > >> > > >> > > >> > > >> > > >> > >> > >> > >> -- > >> > ----------------------------------------------------- > >> Noble Paul | Principal Engineer| AOL | http://aol.com > >> > > > > > > > > > > > > > > -- > ----------------------------------------------------- > Noble Paul | Principal Engineer| AOL | http://aol.com >