Hi, You are saying a doc can be up to 700KB and your maxBufferedDocs is set to 900. Multiply these two numbers and I think you'll see that this number is greater than your JVM's default heap. Also, save the optimize call for the end and your overall indexing time will be shorter.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Justin <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Monday, March 3, 2008 2:43:03 PM > Subject: out of memory every time > > I'm indexing a large number of documents. > > As a server I'm using the /solr/example/start.jar > > No matter how much memory I allocate it fails around 7200 documents. > > I am committing every 100 docs, and optimizing every 300. > > all of my xml's contain on doc, and can range in size from 2k to 700k. > > when I restart the start.jar it again reports out of memory. > > > a sample document looks like this: > > > > 1851 > TRAJ20 > 12049 > > name="ft:external_ids.SourceAccession:15532">ENSG00000211869 > 28735 > HUgn28735 > TRA_ > TRAJ20 > 9953837 > > name="ft:external_ids.SourceAccession:15538">ENSG00000211869 > T cell receptor alpha > joining 20 > 14q11.2 > 14q11 > 14q11.2 > AE000662.1 > M94081.1 > CH471078.2 > NC_000014.7 > NT_026437.11 > NG_001332.2 > 8188290 > The human T-cell receptor > TCRAC/TCRDC (C alpha/C delta) region: organization,sequence, and evolution > of 97.6 kb of DNA. > Koop B.F. > Rowen L. > Hood L. > Wang K. > Kuo C.L. > Seto D. > Lenstra J.A. > Howard S. > Shan W. > Deshpande P. > 31311_at > 000000000000 > > > > > the schema is (in summary): > > > multiValued="false" omitNorms="true"/> > > multiValued="true" omitNorms="true"/> > > > stored="true" omitNorms="true"/> > > omitNorms="true"/> > > > > PK > text > > > > > > > and my conf is: > false > 100 > 900 > 2147483647 > 10000 >