hi all.

complete noob as to solrcloud here. almost-non-noob on solr in general.

we're experiencing growing pains in our data and am thinking through moving
to solrcloud as a result. i'm hoping to find out if it seems like a good
strategy or if we need to get other areas of interest handled first before
introducing new complexities.

here's a rundown of things:
- we are on a 30g ram aws instance
- we have ~30g tucked away in the ../solr/server/ dir
- our largest core is 6.8g w/ ~25 segments at any given time. this is also
the core that our business directly runs off of, users interact with, etc.
- 5g is for a logs type of dataset that analytics can be built off of to
help inform the primary core above
- 3g are taken up by 3 different third party sources that we use solr to
warehouse and have available for query for the sake of linking items in our
primary core to these cores for data enrichment
- several others take up < 1g each
- and then we have dev- and demo- flavors for some of these

we had been operating on a 16gb machine till a few weeks ago (actually
bumped while at lucene revolution bc i hadn't noticed how much we'd
outgrown the cache size's needs till the week before!). the load when doing
an import or running our heavier operations is much better and doesn't fall
under the weight of the operations like it had been doing.

we have no master/slave replica. all of our data is 'replicated' by the
fact that it exists in mysql. if solr were to go down it'd be a nice big
fire but one we could recover from within a couple hours by simply
reimporting.

i'd like to have a more sophisticated set up in place for fault tolerance
than that, of course. i'd also like to see our heavy, many-query based
operations be speedier and better capable of handling multi-threaded runs
at once w/ ease.

is this a matter of getting still more ram on the machine? cpus for faster
processing? splitting up the read/write operations between master/slave?
going full steam into a solrcloud configuration?

one more note. per discussion at the conference i'm combing through our
configs to make sure we trim any fat we can. also wanting to get
optimization scheduled more regularly to help out w segmentation and
garbage heap. not sure how far those two alone will get us, though.

thanks for any thoughts!

--
John Blythe

Reply via email to