I think it comes w/ some caveats, but is now workable (although it may
not give great performance), assuming you're using 2.3 (2.2????) or
later. I would definitely do a search in the Lucene archives about
NFS, especially paying attention to Mike McCandless' comments.
On Jun 30, 2008, at 1:08 PM, Bill Au wrote:
Isn't using Lucene over NFS *not* recommended?
Bill
On Mon, Jun 30, 2008 at 4:27 AM, Nico Heid <[EMAIL PROTECTED]> wrote:
Hey, I'm looking for some feedback on the following setup.
Due to the architects decision I will be working with NFS not
Solr's own
distribution scripts.
A few Solr indexing machines use Multicore to divide the 300.000
Users to
1000
shards.
For several reasons we have to go with per user sharding (as you
can see
300
per shard) Updates come in with about 166 updates per hour on each
shard.
So
not a problem.
The question lies more in this concept: I set up a few Query
Slaves, using
NFS
readonly mounts.
I do not use the index directory for the readonly slaves. I patched
the
slaves
to use the most recent snapshot directory to avoid all the nasty nfs
issues.
(only a quick and dirty hack for testing) On a not yet defined
interval I
do a
snapshot on the masters and send a http commit to the slave, so a new
reader
on the fresh snapshot is opened.
This seems to work without trouble so far, but I've not done
extensive
testing.
To take this a step further (only an idea yet). I let the slaves
work on
the
real index, as long as I do not optimize. Because the directory
structure
is
not changing as long as I do not optimize, I can send commits to the
slaves.
Before I optimize I take a snapshot, send them a special "commit"
to make
them
fall back to the most recent snapshot dir, optimize the index and
send them
a
real commit when done.
Even though a little trickier I would be more up to date with the
query
slaves.
So if you have any design comments or see major or minor flaws,
feedback
would
be very welcome.
I do not use live data yet, this is the experimental stage. But
I'll give
feedback on how it performs and what issues I run into. There's
also the
faint
chance of letting this setup (or a "fixed" one) run on the real
user data,
which would be roughly 20TB of usable data for indexing. This would
be
really
interesting :-)
Have a nice week
Nico
--------------------------
Grant Ingersoll
http://www.lucidimagination.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ