Re: index corruption / deployment strategy

Erik Hatcher Thu, 08 Apr 2010 11:18:30 -0700

Kallin,

It's a very rare report, and practically impossible I'm told, tocorrupt the index these days thanks to Lucene's improvements over thelast several releases (ignoring hardware malfunctions).

A single index is the best way to go, in my opinion - though at yourscale you're probably looking at sharding it and using distributedsearch. So you'll have multiple physical indexes, one for each shard,and a single virtual index in the eyes of your searching clients.

Backups, of course, are sensible, and Solr's replication capabilitiescan help here by requesting them periodically. You'll be usingreplication anyway to scale to your query volume.

As for hardware scaling considerations, there are variables toconsider like how faceting, sorting, and querying speed across asingle large index versus sharding. I'm guessing you'll be best withat least two shards, though possibly more considering these variables.


        Erik
        @ Lucid Imagination

p.s. have your higher-ups give us a call if they'd like to discusstheir concerns and consider commercial support for your missioncritical big scale use of Solr :)




On Apr 8, 2010, at 1:33 PM, Nagelberg, Kallin wrote:

I've been doing work evaluating Solr for use on a hightrafficwebsite for sometime and things are looking positive. I have someconcerns from my higher-ups that I need to address. I have suggestedthat we use a single index in order to keep things simple, but thereare suggestions to split are documents amongst different indexes.
The primary motivation for this split is a worry about potentialindex corruption. IE, if we only have one index and it becomescorrupt what do we do? I never considered this to be an issue sincewe would have backups etc., but I think they have had issues withother search technology in the past where one big index resulted infrequent and difficult to recover from corruption. Do you think thisis a concern with Solr? If so, what would you suggest to mitigatethe risk?
My second question involves general deployment strategy. We willexpect about 50 million documents, each on average a few paragraphs,and our website receives maybe 10 million hits a day. Can anyoneprovide an idea of # of servers, clustering/replication setup etc.that might be appropriate for this scenario? I'm interested to hearwhat other's experience is with similar situations.
Thanks,
-Kallin Nagelberg

Re: index corruption / deployment strategy

Reply via email to