Thank you Erick. I've set the RamBufferSize to 1G; perhaps higher would
be beneficial. One more data point is that if I restart a node, more
often than not, it goes into recovery, beats up the network for a while,
and then goes green. This happens even if I do no indexing between
restarts. Is that expected? Sometimes this can take longer than 20
minutes. No new data was added to the index between the restarts.
-Joe
On 11/21/2017 3:43 PM, Erick Erickson wrote:
bq: We are doing lots of soft commits for NRT search...
It's not surprising that this is slower than local storage, especially
if you have any autowarming going on. Opening new searchers will need
to read data from disk for the new segments, and HDFS may be slower
here.
As far as the commit interval, an under-appreciated event is that when
RAMBufferSizeMB is exceeded (default 100M last I knew) new segments
are written _anyway_, they're just a little invisible. That is, the
segments_n file isn't updated even though they're closed IIUC at
least. So that very long interval isn't helping with that problem I
don't think....
Evidence to the contrary trumps my understanding of course.
About starting all these collections up at once and the Overseer
queue. I've seen this in similar situations. There are a _lot_ of
messages flying back and forth for each replica on startup, and the
Overseer processing was very inefficient historically so that queue
could get in the 100s of K, I've seen some pathological situations
where it's over 1M. SOLR-10524 made this a lot better. There are still
a lot of messages written in a case like yours, but at least the
Overseer has a much better chance to keep up.... Solr 6.6... At that
point bringing up Solr took a very long time.
Erick
On Tue, Nov 21, 2017 at 12:24 PM, Hendrik Haddorp
<hendrik.hadd...@gmx.net> wrote:
We sometimes also have replicas not recovering. If one replica is left
active the easiest is to then to delete the replica and create a new one.
When all replicas are down it helps most of the time to restart one of the
nodes that contains a replica in down state. If that also doesn't get the
replica to recover I would check the logs of the node and also that of the
overseer node. I have seen the same issue on Solr using local storage. The
main HDFS related issues we had so far was those lock files and if you
delete and recreate collections/cores and it sometimes happens that the data
was not cleaned up in HDFS and then causes a conflict.
Hendrik
On 21.11.2017 21:07, Joe Obernberger wrote:
We've never run an index this size in anything but HDFS, so I have no
comparison. What we've been doing is keeping two main collections - all
data, and the last 30 days of data. Then we handle queries based on date
range. The 30 day index is significantly faster.
My main concern right now is that 6 of the 100 shards are not coming back
because of no leader. I've never seen this error before. Any ideas?
ClusterStatus shows all three replicas with state 'down'.
Thanks!
-joe
On 11/21/2017 2:35 PM, Hendrik Haddorp wrote:
We actually also have some performance issue with HDFS at the moment. We
are doing lots of soft commits for NRT search. Those seem to be slower then
with local storage. The investigation is however not really far yet.
We have a setup with 2000 collections, with one shard each and a
replication factor of 2 or 3. When we restart nodes too fast that causes
problems with the overseer queue, which can lead to the queue getting out of
control and Solr pretty much dying. We are still on Solr 6.3. 6.6 has some
improvements and should handle these actions faster. I would check what you
see for "/solr/admin/collections?action=OVERSEERSTATUS&wt=json". The
critical part is the "overseer_queue_size" value. If this goes up to about
10000 it is pretty much game over on our setup. In that case it seems to be
best to stop all nodes, clear the queue in ZK and then restart the nodes one
by one with a gap of like 5min. That normally recovers pretty well.
regards,
Hendrik
On 21.11.2017 20:12, Joe Obernberger wrote:
We set the hard commit time long because we were having performance
issues with HDFS, and thought that since the block size is 128M, having a
longer hard commit made sense. That was our hypothesis anyway. Happy to
switch it back and see what happens.
I don't know what caused the cluster to go into recovery in the first
place. We had a server die over the weekend, but it's just one out of ~50.
Every shard is 3x replicated (and 3x replicated in HDFS...so 9 copies). It
was at this point that we noticed lots of network activity, and most of the
shards in this recovery, fail, retry loop. That is when we decided to shut
it down resulting in zombie lock files.
I tried using the FORCELEADER call, which completed, but doesn't seem to
have any effect on the shards that have no leader. Kinda out of ideas for
that problem. If I can get the cluster back up, I'll try a lower hard
commit time. Thanks again Erick!
-Joe
On 11/21/2017 2:00 PM, Erick Erickson wrote:
Frankly with HDFS I'm a bit out of my depth so listen to Hendrik ;)...
I need to back up a bit. Once nodes are in this state it's not
surprising that they need to be forcefully killed. I was more thinking
about how they got in this situation in the first place. _Before_ you
get into the nasty state how are the Solr nodes shut down? Forcefully?
Your hard commit is far longer than it needs to be, resulting in much
larger tlog files etc. I usually set this at 15-60 seconds with local
disks, not quite sure whether longer intervals are helpful on HDFS.
What this means is that you can spend up to 30 minutes when you
restart solr _replaying the tlogs_! If Solr is killed, it may not have
had a chance to fsync the segments and may have to replay on startup.
If you have openSearcher set to false, the hard commit operation is
not horribly expensive, it just fsync's the current segments and opens
new ones. It won't be a total cure, but I bet reducing this interval
would help a lot.
Also, if you stop indexing there's no need to wait 30 minutes if you
issue a manual commit, something like
.../collection/update?commit=true. Just reducing the hard commit
interval will make the wait between stopping indexing and restarting
shorter all by itself if you don't want to issue the manual commit.
Best,
Erick
On Tue, Nov 21, 2017 at 10:34 AM, Hendrik Haddorp
<hendrik.hadd...@gmx.net> wrote:
Hi,
the write.lock issue I see as well when Solr is not been stopped
gracefully.
The write.lock files are then left in the HDFS as they do not get
removed
automatically when the client disconnects like a ephemeral node in
ZooKeeper. Unfortunately Solr does also not realize that it should be
owning
the lock as it is marked in the state stored in ZooKeeper as the owner
and
is also not willing to retry, which is why you need to restart the
whole
Solr instance after the cleanup. I added some logic to my Solr start
up
script which scans the log files in HDFS and compares that with the
state in
ZooKeeper and then delete all lock files that belong to the node that
I'm
starting.
regards,
Hendrik
On 21.11.2017 14:07, Joe Obernberger wrote:
Hi All - we have a system with 45 physical boxes running solr 6.6.1
using
HDFS as the index. The current index size is about 31TBytes. With 3x
replication that takes up 93TBytes of disk. Our main collection is
split
across 100 shards with 3 replicas each. The issue that we're running
into
is when restarting the solr6 cluster. The shards go into recovery
and start
to utilize nearly all of their network interfaces. If we start too
many of
the nodes at once, the shards will go into a recovery, fail, and
retry loop
and never come up. The errors are related to HDFS not responding
fast
enough and warnings from the DFSClient. If we stop a node when this
is
happening, the script will force a stop (180 second timeout) and upon
restart, we have lock files (write.lock) inside of HDFS.
The process at this point is to start one node, find out the lock
files,
wait for it to come up completely (hours), stop it, delete the
write.lock
files, and restart. Usually this second restart is faster, but it
still can
take 20-60 minutes.
The smaller indexes recover much faster (less than 5 minutes). Should
we
have not used so many replicas with HDFS? Is there a better way we
should
have built the solr6 cluster?
Thank you for any insight!
-Joe
---
This email has been checked for viruses by AVG.
http://www.avg.com