On Mon, 2008-03-03 at 21:43 +0200, Justin wrote: > I'm indexing a large number of documents. > > As a server I'm using the /solr/example/start.jar > > No matter how much memory I allocate it fails around 7200 documents.
How do you allocate the memory? Something like: java -Xms512M -Xmx1500M -jar start.jar You may have a closer look as well at http://java.sun.com/j2se/1.5.0/docs/guide/vm/gc-ergonomics.html HTH salu2 > I am committing every 100 docs, and optimizing every 300. > > all of my xml's contain on doc, and can range in size from 2k to 700k. > > when I restart the start.jar it again reports out of memory. > > > a sample document looks like this: > <?xml version="1.0" encoding="UTF-8"?> > <add> > <doc> > <field name="PK">1851</field> > <field name="ft:genes.Symbol:1851">TRAJ20</field> > <field name="ft:external_ids.SourceAccession:15531">12049</field> > <field > name="ft:external_ids.SourceAccession:15532">ENSG00000211869</field> > <field name="ft:external_ids.SourceAccession:15533">28735</field> > <field name="ft:external_ids.SourceAccession:15534">HUgn28735</field> > <field name="ft:external_ids.SourceAccession:15535">TRA_</field> > <field name="ft:external_ids.SourceAccession:15536">TRAJ20</field> > <field name="ft:external_ids.SourceAccession:15537">9953837</field> > <field > name="ft:external_ids.SourceAccession:15538">ENSG00000211869</field> > <field name="ft:aliases_and_descriptions.Value:9775">T cell receptor alpha > joining 20</field> > <field name="ft:cytogenetic_locations.Cytoband:4909">14q11.2</field> > <field name="ft:cytogenetic_locations.Cytoband:4910">14q11</field> > <field name="ft:cytogenetic_locations.Cytoband:4911">14q11.2</field> > <field name="ft:location_extras.ContigRefseq:11806">AE000662.1</field> > <field name="ft:location_extras.ContigRefseq:11807">M94081.1</field> > <field name="ft:location_extras.ContigRefseq:11808">CH471078.2</field> > <field name="ft:location_extras.ContigRefseq:11809">NC_000014.7</field> > <field name="ft:location_extras.ContigRefseq:11810">NT_026437.11</field> > <field name="ft:location_extras.ContigRefseq:11811">NG_001332.2</field> > <field name="ft:articles.SourceAccession:192767">8188290</field> > <field name="ft:articles.Title:192767">The human T-cell receptor > TCRAC/TCRDC (C alpha/C delta) region: organization,sequence, and evolution > of 97.6 kb of DNA.</field> > <field name="ft:authors.AuthorName:5909">Koop B.F.</field> > <field name="ft:authors.AuthorName:6912">Rowen L.</field> > <field name="ft:authors.AuthorName:6985">Hood L.</field> > <field name="ft:authors.AuthorName:17109">Wang K.</field> > <field name="ft:authors.AuthorName:72700">Kuo C.L.</field> > <field name="ft:authors.AuthorName:84285">Seto D.</field> > <field name="ft:authors.AuthorName:166156">Lenstra J.A.</field> > <field name="ft:authors.AuthorName:216734">Howard S.</field> > <field name="ft:authors.AuthorName:285493">Shan W.</field> > <field name="ft:authors.AuthorName:346559">Deshpande P.</field> > <field name="ft:probesets.Name:6773">31311_at</field> > <field name="ft:probesets.BinaryPattern:6773">000000000000</field> > </doc> > </add> > > > the schema is (in summary): > <fields> > <field name="PK" type="sint" indexed="true" stored="true" required="true" > multiValued="false" omitNorms="true"/> > <field name="text" type="text" indexed="true" stored="false" > multiValued="true" omitNorms="true"/> > > <dynamicField name="ft:*" type="string" indexed="true" > stored="true" omitNorms="true"/> > <dynamicField name="st:*" type="string" indexed="true" stored="true" > omitNorms="true"/> > </fields> > > > <uniqueKey>PK</uniqueKey> > <defaultSearchField>text</defaultSearchField> > <solrQueryParser defaultOperator="OR"/> > > <copyField source="ft:*" dest="text"/> > <copyField source="st:*" dest="text"/> > > > and my conf is: > <useCompoundFile>false</useCompoundFile> > <mergeFactor>100</mergeFactor> > <maxBufferedDocs>900</maxBufferedDocs> > <maxMergeDocs>2147483647</maxMergeDocs> > <maxFieldLength>10000</maxFieldLength> -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions