On 2/22/2013 9:02 AM, jimtronic wrote:
Yes, these are good points. I'm using solr to leverage user preference data
and I need that data available real time. SQL just can't do the kind of
things I'm able to do in solr, so I have to wait until the write (a user
action, a user preference, etc) gets to solr from the db anyway.

I'm kind of curious about how many single documents i can send through via
the json update in a day. Millions would be nice, but I wonder what the
upper limit would be.

I have a distributed index with about 76 million documents in it, original source is MySQL. It's comprised of seven shards - six of them are large, with over 12 millon docs each. One shard is small, usually containing only a few hundred thousand docs. The full-import updates all seven shards in parallel, but other than that, it is not a multi-threaded operation.

On my dev environment, I'm absolutely positive that my bottleneck is I/O on the Solr server. That server has 7200RPM SAS drives in basic RAID1 and takes about 8 hours for a full-import. It contains the entire index.

In production, I am not sure where the bottleneck is - my guess is that it's I/O, but it might be in the database. These servers have RAID10 with six 7200RPM SATA drives, a caching RAID controller, plenty of RAM, and each one contains only half the index. On version 3.5, it takes about 3.5 hours for a full-import. On 4.2-SNAPSHOT, it takes about 4 hours. The new version has the updateLog enabled.

Thanks,
Shawn

Reply via email to