Hi All,

To close this off, I'm sad to report that we've come to a end with Solr on
HDFS.

Here's what we finally did:
 - created two brand-new identical Solr cloud clusters, one on HDFS and one
on local disk.
- 1 replica per node. Each node 16GB ram.
 - Added documents.
 - Compared start-up times for a single node after a graceful shutdown.

What we observe:
 - on startup, the replica will transition from "Gone" to "Down" fairly
quickly. (a few seconds)
 - The replica then spends some time in the "Down" state before
transitioning to "Recovering"
 - The replica stays in "Recovering" for some time, before transitioning to
"Active"

Results for 75M docs in the replica, replica size 28.5GB:

  - HDFS
     - Time in "Down": 4m 49s
     - Time in "Recovering": 2m 30s
     - Total time to restart: 7m 9s

  - Local Disk
     - Time in "Down": 0m 5s
     - Time in "Recovering": 0m 8s
     - Total time to restart: 0m 13s


Results for 100M docs in the replica, replica size 37GB:

   - HDFS
    - Time in "Down": 8m 30s
     - Time in "Recovering": 5m 19s
     - Total time to restart: 13m 49s

  - Local Disk
     - Time in "Down": 0m 4s
     - Time in "Recovering": 0m 10s
     - Total time to restart: 0m 14s


Conclusions:
 - As the index size grows, Solr on HDFS has a trend towards increasing
restart times that's not seen on local disk.

Notes:
 - HDFS in our environment is FINE. The network is FINE. We have hbase
servers running on the same ESXi hosts as Solr, they access the same HDFS
filesystem, and hbase bandwidth regularly exceeds 2GB/s. All latencies are
sub-millisecond.
 - The values reported above are averages. There's some variance to the
results, but the averages are representative of the times we're seeing.

Thanks for reading!

Kyle



On Mon, 10 Dec 2018 at 14:14, lstusr 5u93n4 <lstusr...@gmail.com> wrote:

> Hi Guys,
>
> >  What OS is it on?
> CentOS 7
>
> >  With your indexes in HDFS, the HDFS software running
> > inside Solr also needs heap memory to operate, and is probably going to
> > set aside part of the heap for caching purposes.
> We still have the solr.hdfs.blockcache.slab.count parameter set to the
> default of 1, but we're going to tune this a bit and see what happens.
>
> > but for this setup, I'd definitely want a LOT more than 16GB.GB
> So where would you start? We can easily double the number of servers to 6,
> and put one replica on each (probably going to do this anyways.)  Would you
> go bigger than 6 x 16GB ? Keeping in mind, even with our little 3 x 16GB we
> haven't had performance problems... This thread kind of diverged that way,
> but really the initial issue was just that the whole index seems to be read
> on startup. (Which I fully understand may be resource related, but I have
> yet to try reproduce on a smaller scale to confirm/deny.)
>
> > As Solr runs, it writes a GC log.  Can you share all of the GC log files
> > that Solr has created?  There should not be any proprietary information
> > in those files.
>
> This I can do. Actually, I've collected a lot of things, redacted any
> private info, and collected here into a series of logs / screenshots.
>
> So what I did:
>  - 16:49 GMT -- stopped solr on one node (node 4) using bin/solr stop, and
> keeping the others alive.. Captured the solr log as it was stopping, and
> uploaded here:
>      - https://pastebin.com/raw/UhSTdb1h
>
> - 17:00 GMT  - restarted solr on the same node (other two stayed up the
> whole time) and let it run for an hour. Captured the solr logs since the
> startup here:
>     - https://pastebin.com/raw/S4Z9XVrG
>
>  - Observed the outbound network traffic from HDFS to this particular solr
> instance during this time, screenshotted it, and put the image here: (times
> are in EST for that screenshot)
>     - https://imagebin.ca/v/4PY63LAMSVV1
>
>  - Screenshotted the resource usage on the node according to the solr UI:
>    - https://imagebin.ca/v/4PY6dYddWGXn
>
>  - Captured the GC logs for the 20 mins after restart, and pasted here:
>    - https://pastebin.com/raw/piswTy1M
>
> Some notes:
>  - the main collection (the big one) is called "main"
>  - there is an empty collection on the system called "history" but this
> has 0 documents.
>  - I redacted any private info in the logs... if there are inconsistencies
> it might be due to this manual process (but I think it's okay)
>
> Thanks!
>
> Kyle
>
>
>
>
>
>
>
> On Mon, 10 Dec 2018 at 12:43, Shawn Heisey <apa...@elyograg.org> wrote:
>
>> On 12/7/2018 8:54 AM, Erick Erickson wrote:
>> > Here's the trap:_Indexing_  doesn't take much memory. The memory
>> > is bounded
>> > by ramBufferSizeMB, which defaults to 100.
>>
>> This statement is completely true.  But it hides one detail:  A large
>> amount of indexing will allocate this buffer repeatedly.  So although
>> indexing doesn't take a huge amount of memory space at any given moment,
>> the amount of total memory allocated by large indexing will be enormous,
>> keeping the garbage collector busy.  This is particularly true when
>> segment merging happens.
>>
>> Going over the whole thread:
>>
>> Graceful shutdown on Solr 7.5 (for non-Windows operating systems) should
>> allow up to three minutes for Solr to shut down normally before it
>> hard-kills the instance.  On Windows it only waits 5 seconds, which is
>> not enough.  What OS is it on?
>>
>> The problems you've described do sound like your Solr instances are
>> experiencing massive GC pauses.  This can make *ALL* Solr activity take
>> a long time, including index recovery operations.  Increasing the heap
>> size MIGHT alleviate these problems.
>>
>> If every machine is handling 700GB of index data and 1.4 billion docs
>> (assuming one third of the 2.1 billion docs per shard replica, two
>> replicas per machine), you're going to need a lot of heap memory for
>> Solr to run well.  With your indexes in HDFS, the HDFS software running
>> inside Solr also needs heap memory to operate, and is probably going to
>> set aside part of the heap for caching purposes.  I thought I saw
>> something in the thread about a 6GB heap size.  This is probably way too
>> small.   For everything you've described, I have to agree with Erick ...
>> 16GB total memory is VERY undersized.  It's likely unrealistic to have
>> enough memory for the whole index ... but for this setup, I'd definitely
>> want a LOT more than 16GB.
>>
>> As Solr runs, it writes a GC log.  Can you share all of the GC log files
>> that Solr has created?  There should not be any proprietary information
>> in those files.
>>
>> Thanks,
>> Shawn
>>
>>

Reply via email to