Your indexing project is disk-bound. My modern midrange laptop gets 30MB/s doing "cat > /dev/null" (1 7200rpm disk). The Amazon instances I'm playing with get 50-60 (I really want to know how it fits together). Your laptop might be 10-20?
On Thu, Sep 24, 2009 at 11:54 PM, Constantijn Visinescu <baeli...@gmail.com> wrote: > This may or may not help but here goes :) > > When i was running performance tests i look a look at the simple post tool > that comes with the solr examples. > > First i changed my schema.xml to fit my needs and then i deleted the old > index so solr created a blank one when i started up. > Then i had a had a process chew on my data and spit out xml files that are > formatted similarly to the xml files that the SimplePostTool example uses. > Next i used the simple Post tool to post the xml files to solr (60k-80k > records per xml file). Each file only took a couple minutes to index this > way. > Comit and optimize after that (took less then 10 minutes) and after about > 2.5 hrs i had indexed just under 8 milion records. > > This was on a 4 year old single core laptop using resin 3 as my servlet > container. > > Hope this helps. > > > On Fri, Sep 25, 2009 at 3:51 AM, Lance Norskog <goks...@gmail.com> wrote: > >> In "top", press the '1' key. This will give a list of the CPUs and how >> much load is on each. The display is otherwise a little weird for >> multi-cpu machines. But don't be surprised when Solr is I/O bound. The >> biggest fanciest RAID is often a better investment than CPUs. On one >> project we bought low-end rack servers come with 6-8 disk bays, >> filling them with 10k/15k RPM disks. >> >> On Wed, Sep 23, 2009 at 2:47 PM, Dan A. Dickey <dan.dic...@savvis.net> >> wrote: >> > On Friday 11 September 2009 11:06:20 am Dan A. Dickey wrote: >> > ... >> >> Our JBoss expert and I will be looking into why this might be occurring. >> >> Does anyone know of any JBoss related slowness with Solr? >> >> And does anyone have any other sort of suggestions to speed indexing >> >> performance? Thanks for your help all! I'll keep you up to date with >> >> further progress. >> > >> > Ok, further progress... just to keep any interested parties up to date >> > and for the record... >> > >> > I'm finding that using the "example" jetty setup (will be switching very >> > very soon to a "real" jetty installation) is about the fastest. Using >> > several processes to send posts to Solr helps a lot, and we're seeing >> > about 80 posts a second this way. >> > >> > We also stripped down JBoss to the bare bones and the Solr in it >> > is running nearly as fast - about 50 posts a second. It was our previous >> > JBoss configuration that was making it appear "slow" for some reason. >> > >> > We will be running more tests and spreading out the "pre-index" workload >> > across more machines and more processes. In our case we were seeing >> > the bottleneck being one machine running 18 processes. >> > The 2 quad core xeon system is experiencing about a 25% cpu load. >> > And I'm not certain, but I think this may be actually 25% of one of the 8 >> cores. >> > So, there's *lots* of room for Solr to be doing more work there. >> > -Dan >> > >> > -- >> > Dan A. Dickey | Senior Software Engineer >> > >> > Savvis >> > 10900 Hampshire Ave. S., Bloomington, MN 55438 >> > Office: 952.852.4803 | Fax: 952.852.4951 >> > E-mail: dan.dic...@savvis.net >> > >> >> >> >> -- >> Lance Norskog >> goks...@gmail.com >> > -- Lance Norskog goks...@gmail.com