On 8/21/2013 6:23 PM, dmarini wrote:
Shawn,Thanks for your reply. All of these suggestions look like good ideas
and I will follow up. We are running Solr via the Jetty process on windows
as well as all of our zookeepers on the same boxes as the clouds. The reason
for this is that we're on EC2 servers so it gets ultra expensive to have a 6
box setup just to have zookeepers on separate boxes from the solr instances.

You can have zookeeper on the same host as Solr, that's no problem. You should drop to just three total zookeepers, one per node, and use the chroot method to keep things separate. You can probably run zookeeper with a max heap of 256MB, but it likely would never need more than 512MB. It doesn't use much memory at all.

Each of our Windows boxes has 8GB of RAM, with roughly 35 - 40% of it still
seemingly free. Is there a tool or some way we can identify for certain if
we're running into memory issues?I like your zookeeper idea and I didn't
know that this was feasible. I will get a test bed set up that way soon.As
for indexes, each cloud has multiple collections but we're looking at the
largest entire cloud (multiple indexes) being about 200MB, each collection
is between 50 and 100MB and I don't see them getting much bigger than that
per index (but I do see more indexes being added to the clouds).

With indexes that small, I would run each Jetty/Solr with a max heap of 1GB. With three of them per server, that will mean that Solr is using 3GB of RAM, leaving 5GB for the OS disk cache. You could probably bump that to 1.5 or 2GB and still be OK.

Is there a definitive advantage to running Solr on a linux box
over windows? I need to be able to justify the time and effort it will take
to get up to speed on a non-familiar OS if we're going to go that route but
if there's a good enough reason I don't see why not.

Linux manages memory better than Windows, and ext4 is a much better filesystem than NTFS. If you are familiar with Windows, there's nothing wrong with continuing to use it, except for the fact that you have to give Microsoft a few hundred bucks per machine for a server OS when you take it into production. You can run Linux for free.

--Would it be helpful to
have the zookeeper ensemble on a different disk drive than the clouds? --Can
the chattiness of all of the replication and zookeeper communication for
multiple clouds/collections cause any of these issues (We do have some
collections that are in constant flux with 1 - 5 requests each second, which
we gather up and send to solr in batches of 250 documents or a 10 second
flush)?

It never hurts to have things separated so they are on different disks, but SolrCloud will put hardly any load on zookeeper, so I don't think it matters much. It is Solr itself that will take that load.

Thanks,
Shawn

Reply via email to