Re: Solr+HDFS

Joseph Obernberger Fri, 05 Feb 2016 09:31:44 -0800

Thank you Shawn.  Sounds like increasing the autoSoftCommit maxTime would
be a good idea.  I assume this would go along with also increasing
autoCommit?
All of our collections (just 2 at the moment) have the same settings.  The
data directory is in HDFS and is the same data directory for every shard.
The two cores have different directories.
----------------
root@hades logs]# hadoop fs -ls /solr5.2
Found 2 items
drwxr-xr-x   - solr hadoop          0 2015-10-05 12:54 /solr5.2/IMAGEDATA
drwxr-xr-x   - solr hadoop          0 2015-06-09 15:54 /solr5.2/DOCUMENTS


[root@hades logs]# hadoop fs -ls /solr5.2/DOCUMENTS
Found 27 items
drwxr-xr-x   - solr hadoop          0 2015-06-09 15:08
/solr5.2/DOCUMENTS/core_node1
drwxr-xr-x   - solr hadoop          0 2015-06-09 15:35
/solr5.2/DOCUMENTS/core_node10
drwxr-xr-x   - solr hadoop          0 2015-06-09 15:36
/solr5.2/DOCUMENTS/core_node11
drwxr-xr-x   - solr hadoop          0 2015-06-09 15:36
/solr5.2/DOCUMENTS/core_node12
drwxr-xr-x   - solr hadoop          0 2015-06-09 15:36
/solr5.2/DOCUMENTS/core_node13
drwxr-xr-x   - solr hadoop          0 2015-06-09 15:36
/solr5.2/DOCUMENTS/core_node14
drwxr-xr-x   - solr hadoop          0 2015-06-09 15:36
/solr5.2/DOCUMENTS/core_node15
drwxr-xr-x   - solr hadoop          0 2015-06-09 15:36
/solr5.2/DOCUMENTS/core_node16
drwxr-xr-x   - solr hadoop          0 2015-06-09 15:36
/solr5.2/DOCUMENTS/core_node17
drwxr-xr-x   - solr hadoop          0 2015-06-09 15:36
/solr5.2/DOCUMENTS/core_node18
drwxr-xr-x   - solr hadoop          0 2015-06-09 15:36
/solr5.2/DOCUMENTS/core_node19
drwxr-xr-x   - solr hadoop          0 2015-06-09 15:08
/solr5.2/DOCUMENTS/core_node2
drwxr-xr-x   - solr hadoop          0 2015-06-09 15:36
/solr5.2/DOCUMENTS/core_node20
drwxr-xr-x   - solr hadoop          0 2015-06-09 15:36
/solr5.2/DOCUMENTS/core_node21
drwxr-xr-x   - solr hadoop          0 2015-06-09 15:36
/solr5.2/DOCUMENTS/core_node22
drwxr-xr-x   - solr hadoop          0 2015-06-09 15:36
/solr5.2/DOCUMENTS/core_node23
drwxr-xr-x   - solr hadoop          0 2015-06-09 15:36
/solr5.2/DOCUMENTS/core_node24
drwxr-xr-x   - solr hadoop          0 2015-06-09 15:36
/solr5.2/DOCUMENTS/core_node25
drwxr-xr-x   - solr hadoop          0 2015-06-09 15:44
/solr5.2/DOCUMENTS/core_node26
drwxr-xr-x   - solr hadoop          0 2015-06-09 15:54
/solr5.2/DOCUMENTS/core_node27
drwxr-xr-x   - solr hadoop          0 2015-06-09 15:08
/solr5.2/DOCUMENTS/core_node3
drwxr-xr-x   - solr hadoop          0 2015-06-09 15:21
/solr5.2/DOCUMENTS/core_node4
drwxr-xr-x   - solr hadoop          0 2015-06-09 15:34
/solr5.2/DOCUMENTS/core_node5
drwxr-xr-x   - solr hadoop          0 2015-06-09 15:34
/solr5.2/DOCUMENTS/core_node6
drwxr-xr-x   - solr hadoop          0 2015-06-09 15:35
/solr5.2/DOCUMENTS/core_node7
drwxr-xr-x   - solr hadoop          0 2015-06-09 15:35
/solr5.2/DOCUMENTS/core_node8
drwxr-xr-x   - solr hadoop          0 2015-06-09 15:35
/solr5.2/DOCUMENTS/core_node9
-----------------

Right now we are not running any replicas.

-Joe

On Fri, Feb 5, 2016 at 10:43 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 2/5/2016 8:11 AM, Joseph Obernberger wrote:
> > Thank you for the reply Scott - we have the commit settings as:
> > <autoCommit>
> >       <maxTime>60000</maxTime>
> >       <openSearcher>false</openSearcher>
> > </autoCommit>
> > <autoSoftCommit>
> >         <maxTime>15000</maxTime>
> > </autoSoftCommit>
> >
> > Is that 50% disk space rule across the entire HDFS cluster or on an
> > individual spindle?
>
> That autoSoftCommit maxTime is pretty small.  Frequent commits can be a
> source of problems, if the actual commits take anywhere near (or longer
> than) the maxTime value to complete.  If your commits are taking
> significantly less than 15 seconds to complete, then it probably isn't
> anything to worry about.
>
> The rule with disk space and Solr/Lucene is that you must have enough
> free disk space for your largest index to triple in size temporarily,
> and it's actually recommended to have three times the disk space of
> *all* your indexes, not just the largest.  Most of the time the largest
> merge you'll see will double the disk space, but in some unusual edge
> cases, it can triple.
>
> I have no idea how disk space works with HDFS when individual data nodes
> become full.  Someone else will have to tackle that question, and it
> might need to be answered by the Hadoop project rather than here.
>
> With autoCommit at 60 seconds, your transaction logs should remain small
> and there shouldn't be very many of them, so I really have no idea what
> might be happening with those.  Do you have this same
> autoCommit/autoSoftCommit config on every Solr collection?
>
> Erick's note about AlreadyBeingCreatedException may be relevant.  Are
> you possibly sharing a data  directory between two or more Solr cores?
> This can't normally be done, and even if you configure the locking
> mechanism to allow it, it's NOT recommended, especially with SolrCloud.
> In SolrCloud, all replicas will write to the index.  If two replicas try
> to write to the same index, then that index will become corrupted and
> unusable.
>
> Thanks,
> Shawn
>
>

Re: Solr+HDFS

Reply via email to