>>and a ramBufferSize of 3GB If you had actually used great than 2GB of it, you would have seen problems as an int overflowed -
which is why its now hard limited - if (mb > 2048.0) { throw new IllegalArgumentException("ramBufferSize " + mb + " is too large; should be comfortably less than 2048"); } On Fri, Feb 19, 2010 at 5:03 AM, Glen Newton <glen.new...@gmail.com> wrote: > I've run Lucene with heap sizes as large as 28GB of RAM (on a 32GB > machine, 64bit, Linux) and a ramBufferSize of 3GB. While I haven't > noticed the GC issues mark mentioned in this configuration, I have > seen them in the ranges he discusses (on 1.6 <update 18). > > You may consider using LuSql[1] to create the indexes, if your source > content is in a JDBC accessible db. It is quite a bit faster than > Solr, as it is a tool specifically created and tuned for Lucene > indexing. But it is command-line, not RESTful like Solr. The released > version of LuSql only runs single machine (though designed for many > threads), the new release will allow distributing indexing across any > number of machines (with each machine building a shard). The new > release also has plugable sources, so it is not restricted to JDBC. > > -Glen > [1]http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql > > On 18 February 2010 21:34, Otis Gospodnetic <otis_gospodne...@yahoo.com> > wrote: > > Hi Tom, > > > > It wouldn't. I didn't see the mention of parallel indexing in the > original email. :) > > > > Otis > > ---- > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > > Hadoop ecosystem search :: http://search-hadoop.com/ > > > > > > > > ----- Original Message ---- > >> From: Tom Burton-West <tburtonw...@gmail.com> > >> To: solr-user@lucene.apache.org > >> Sent: Thu, February 18, 2010 3:30:05 PM > >> Subject: Re: What is largest reasonable setting for ramBufferSizeMB? > >> > >> > >> Thanks Otis, > >> > >> I don't know enough about Hadoop to understand the advantage of using > Hadoop > >> in this use case. How would using Hadoop differ from distributing the > >> indexing over 10 shards on 10 machines with Solr? > >> > >> Tom > >> > >> > >> > >> Otis Gospodnetic wrote: > >> > > >> > Hi Tom, > >> > > >> > 32MB is very low, 320MB is medium, and I think you could go higher, > just > >> > pick whichever garbage collector is good for throughput. I know Java > 1.6 > >> > update 18 also has some Hotspot and maybe also GC fixes, so I'd use > that. > >> > Finally, this sounds like a good use case for reindexing with Hadoop! > >> > > >> > Otis > >> > ---- > >> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > >> > Hadoop ecosystem search :: http://search-hadoop.com/ > >> > > >> > > >> > >> -- > >> View this message in context: > >> > http://old.nabble.com/What-is-largest-reasonable-setting-for-ramBufferSizeMB--tp27631231p27645167.html > >> Sent from the Solr - User mailing list archive at Nabble.com. > > > > > > > > -- > > - > -- - Mark http://www.lucidimagination.com