Hi,

You are saying a doc can be up to 700KB and your maxBufferedDocs is set to 900. 
 Multiply these two numbers and I think you'll see that this number is greater 
than your JVM's default heap.  Also, save the optimize call for the end and 
your overall indexing time will be shorter.

Otis 

--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
> From: Justin <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Monday, March 3, 2008 2:43:03 PM
> Subject: out of memory every time
> 
> I'm indexing a large number of documents.
> 
> As a server I'm using the /solr/example/start.jar
> 
> No matter how much memory I allocate it fails around 7200 documents.
> 
> I am committing every 100 docs, and optimizing every 300.
> 
> all of my xml's contain on doc, and can range in size from 2k to 700k.
> 
> when I restart the start.jar it again reports out of memory.
> 
> 
> a sample document looks like this:
> 
> 
>  
>   1851
>   TRAJ20
>   12049
>   
> name="ft:external_ids.SourceAccession:15532">ENSG00000211869
>   28735
>   HUgn28735
>   TRA_
>   TRAJ20
>   9953837
>   
> name="ft:external_ids.SourceAccession:15538">ENSG00000211869
>   T cell receptor alpha
> joining 20
>   14q11.2
>   14q11
>   14q11.2
>   AE000662.1
>   M94081.1
>   CH471078.2
>   NC_000014.7
>   NT_026437.11
>   NG_001332.2
>   8188290
>   The human T-cell receptor
> TCRAC/TCRDC (C alpha/C delta) region: organization,sequence, and evolution
> of 97.6 kb of DNA.
>   Koop B.F.
>   Rowen L.
>   Hood L.
>   Wang K.
>   Kuo C.L.
>   Seto D.
>   Lenstra J.A.
>   Howard S.
>   Shan W.
>   Deshpande P.
>   31311_at
>   000000000000
> 
> 
> 
> 
> the schema is (in summary):
> 
>    
> multiValued="false" omitNorms="true"/>
>    
> multiValued="true"  omitNorms="true"/>
> 
>    
> stored="true"  omitNorms="true"/>
>    
> omitNorms="true"/>
> 
> 
> 
> PK
> text
> 
> 
> 
> 
> 
> 
> and my conf is:
>    false
>     100
>     900
>     2147483647
>     10000
> 


Reply via email to