Otis,

Thanks for the responses; you've nailed it down mostly. Please see my
thoughts below.

> > One more question - is it worth it to try to keep the whole index in
> > memory and shard when it doesn't fit anymore? For me it seems like a
bit
> > of overhead, but I may be very wrong here.
> > What's a recommended ratio of the parts to keep in RAM and on the
HDDs?
> 
> It's well worth trying to keep the index buffered (i.e. in memory).
Yes,
> once you can't fit the hot parts of the index in RAM it's time to
think
> about sharding (or buying more RAM).  However, it's not as simple as
> looking at the index size and RAM size, as not all parts of the index
need
> to be cached.

Fair enough. So, we'll get as much RAM as possible and will keep an eye
on the performance.

> 
> > > > 2) How should we organize our clusters to ensure redundancy?
> > > >
> > > > Should we have 2 or more identical Masters (means that all the
> > > > updates/optimisations/etc. are done for every one of them)?
> > > >
> > > > An alternative, afaik, is to reconfigure one slave to become the
new
> > > > Master, how hard is that?
> > >
> > > I don't have a good answer here, maybe someone else can chime in.
I
> > > know master failover is a concern, but I'm not sure how others
handle
> > > it right now.  Would be good to have people share their approach.
> > > That being said, it seems reasonable to me to have identical
masters.
> >
> > I found this thread related to this issue:
> >
http://www.nabble.com/High-Availability-deployment-to13094489.html#a1309
> > 8729
> >
> > I guess, it depends on how easy we can fill the gap between the last
> > commit and the time of the Master going down. Most likely, we'll
have to
> > have 2 Masters.
> 
> Or you could simply have 2 masters and index the same data on both of
> them.  Then, in case #1 fails, you simply get your slaves to start
copying
> from the #2.  You could have slaves talk to the master via a LB VIP,
so a
> change from #1 to #2 can be done in LB quickly and slaves don't have
to be
> changed.  Or you could have masters keep the index on some sort of
shared
> storage (e.g. SAN).

I was thinking about the first solution you mentioned - 2 masters,
identical indexes.

> > As we are going to have just one index, so the only way to use it
that I
> > see is to configure a Master on Core1 and a Slave on core 2, or 2
slaves
> > on 2 cores.
> >
> > Do I miss something here?
> 
> It sounds like you are talking about a single server hosting the
master
> and slave(s) on the same server.
> That's not what you want to do, though.  Master and slave(s) live each
on
> their own server.  But I think you are aware of this.
> You don't need to think about Solr Multicore functionality if you have
but
> a single index.

OK, that clarifies things, thank you.

> > > > 4) Does it make much difference to get a more powerful Master?
> > > >
> > > > Or, on the contrary, as slaves will be queried more often, they
> > should
> > > > be the better ones? Maybe just the HDDs for the slaves should be
as
> > > > fast
> > > > as possible?
> > >
> > > Depends on where your bottlenecks are.  Are you getting a lot of
> > > queries or a lot of updates?
> >
> > Both, but more queries than updates. Means we shouldn't neglect
slaves,
> > I guess?
> 
> Definitely don't neglect slaves.  Neglecting slaves means neglecting
your
> customers.
> Your customers don't care so much if your indexing is not super fast,
esp.
> if you index in batches, but they do notice when they have to wait 2
> seconds for results instead of 300 ms.

Right, so the slaves should be the more powerful ones perhaps.

> > Our initial idea is to send batch updates several times per day
rather
> > than individual real-time updates, commit and run optimization after
> > that, as advised here:
> >
http://wiki.apache.org/solr/CollectionDistribution#head-cf174eea2524ae45
> > 171a8486a13eea8b6f511f8b
> > <
> > batch-like updates to the collection and/or once a day.>>
> >
> > Once the index is optimized, the slaves will get it when they pull
next
> > time. So there will be only few (or none) incremental updates.
However,
> > the new snapshots will appear not very often, so it shouldn't be a
> > problem for several slaves to get them, correct?
> 
> That's right.  You'll have to test things out, see what your index
size
> is, see how long optimization takes, and then decide how often to do
it.
> You could also set mergeFactor to a very low number and have nearly
> optimized indices at all times at the expense of indexing speed.

The idea with MergeFactor sounds interesting.
Anyhow, we'll have to do a lot of tweaking, I'm sure about that. :-)

Thanks,
Andrey.

Reply via email to