Hey, I'm looking for some feedback on the following setup.
Due to the architects decision I will be working with NFS not Solr's own 
distribution scripts.

A few Solr indexing machines use Multicore to divide the 300.000 Users to 1000 
shards.
For several reasons we have to go with per user sharding (as you can see 300 
per shard) Updates come in with about 166 updates per hour on each shard. So 
not a problem.

The question lies more in this concept: I set up a few Query Slaves, using NFS 
readonly mounts.
I do not use the index directory for the readonly slaves. I patched the slaves 
to use the most recent snapshot directory to avoid all the nasty nfs issues. 
(only a quick and dirty hack for testing) On a not yet defined interval I do a 
snapshot on the masters and send a http commit to the slave, so a new reader 
on the fresh snapshot is opened.
This seems to work without trouble so far, but I've not done extensive 
testing.

To take this a step further (only an idea yet). I let the slaves work on the 
real index, as long as I do not optimize. Because the directory structure is 
not changing as long as I do not optimize, I can send commits to the slaves. 
Before I optimize I take a snapshot, send them a special "commit" to make them 
fall back to the most recent snapshot dir, optimize the index and send them a 
real commit when done.
Even though a little trickier I would be more up to date with the query 
slaves.

So if you have any design comments or see major or minor flaws, feedback would 
be very welcome.

I do not use live data yet, this is the experimental stage. But I'll give 
feedback on how it performs and what issues I run into. There's also the faint 
chance of letting this setup (or a "fixed" one) run on the real user data, 
which would be roughly 20TB of usable data for indexing. This would be really 
interesting :-)

Have a nice week
Nico


Reply via email to