Re: Problems using MapReduceIndexerTool with multiple reducers

Douglas Rapp Tue, 12 Jan 2016 10:40:30 -0800

Great to know. Thank you very much for your assistance!

On Tue, Jan 12, 2016 at 10:34 AM, Erick Erickson <erickerick...@gmail.com>
wrote:


> bq: Do you know, is using the API the
> recommended way of handling collections? As opposed to putting collection
> folders containing "core.properties" file and "conf" folders (containing
> "schema.xml" and "solrconfig.xml", etc) all in the Solr home location?
>
> Absolutely and certainly DO use the collections API to create
> collections. DO NOT just try to create individual cores at various
> places on your disk and hope that Solr does the right thing. Solr tries,
> but as you've already discovered there are edge cases.
>
> Ditto for the Admin API. You _can_ use it, but unless you get everything
> exactly correct you'll have problems.
>
> Unless you're at the end of all possibilities, use the Collecitons API
> every time.
>
> Best,
> Erick
>
> On Tue, Jan 12, 2016 at 10:30 AM, Douglas Rapp <dougma...@gmail.com>
> wrote:
> > As an update, I went ahead and used the Collection API and deleted the
> > existing one, and then recreated it (specifying the compositeId router),
> > and when I tried out MRIT, I didn't have any problems whatsoever with the
> > number of reducers (and was able to cut the indexing time by over
> half!!).
> > I'm guessing that the issue was not with the router, but rather with how
> > the collection was getting created. Do you know, is using the API the
> > recommended way of handling collections? As opposed to putting collection
> > folders containing "core.properties" file and "conf" folders (containing
> > "schema.xml" and "solrconfig.xml", etc) all in the Solr home location?
> >
> > Thanks,
> > Doug
> >
> >
> > On Tue, Jan 12, 2016 at 9:26 AM, Douglas Rapp <dougma...@gmail.com>
> wrote:
> >
> >> I'm actually not specifying any router, and assumed the "implicit" one
> was
> >> the default. The only resource I can find for setting the document
> router
> >> is when creating a new collection via the Collections API, which I am
> not
> >> using. What I do is define several options in the "solrconfig.xml" file,
> >> then sync the conf directory with ZooKeeper, specifying the collection
> >> name. Then, when I start up Solr, it grabs the config from ZooKeeper,
> >> creates the HDFS directories (if not already present), and sets up the
> >> collection automatically. At that point, I can use MRIT to generate the
> >> indexes. Is that improper? Is there a way to specify the document
> router in
> >> solrconfig.xml?
> >>
> >> Your other questions:
> >> 1) Yes, the indexes are hosted directly in HDFS. As are the input data
> >> files.
> >> 2) Yes, I am using the --go-live option
> >>
> >> Here is the syntax I am using:
> >>
> >> hadoop jar ../../lib/*.jar org.apache.solr.hadoop.MapReduceIndexerTool
> >> hdfs://mhats-hadoop-master:54310/data --morphline-file
> >> my-morphline-file.conf --output-dir
> >> hdfs://mhats-hadoop-master:54310/solr/staging --log4j
> ../log4j.properties
> >> --zk-host my-zk-host --collection my-collection --go-live
> >>
> >> Thanks,
> >> Doug
> >>
> >> On Mon, Jan 11, 2016 at 5:22 PM, Erick Erickson <
> erickerick...@gmail.com>
> >> wrote:
> >>
> >>> Hmm, it looks like you created your collection with the "implicit"
> >>> router. Does the same thing happen when you use the default
> >>> compositeId router?
> >>>
> >>> Note, this should be OK with either, this is just to gather more info.
> >>>
> >>> Other questions:
> >>> 1> Are you running MRIT over Solr indexes that are actually hosted on
> >>> HDFS?
> >>> 2> Are you using the --go-live option?
> >>>
> >>> Actually, can you show us the entire command you use to invoke MRIT?
> >>>
> >>> Best,
> >>> Erick
> >>>
> >>> On Mon, Jan 11, 2016 at 4:18 PM, Douglas Rapp <dougma...@gmail.com>
> >>> wrote:
> >>> > Hello,
> >>> >
> >>> > I am using Solr 4.10.4 in SolrCloud mode, but so far with only a
> single
> >>> > instance (so just a single shard - not very cloud-like..).
> >>> >
> >>> > I have been experimenting using the MapReduceIndexerTool to handle
> batch
> >>> > indexing of CSV files in HDFS. I got it working on a weaker
> single-node
> >>> > Hadoop test system, so I have been trying to do some performance
> >>> testing on
> >>> > a 4-node Hadoop cluster (1 NameNode, 3 DataNode) with better
> hardware.
> >>> The
> >>> > issue that I have come across is that the job will only finish
> >>> successfully
> >>> > if I specify a single reducer (using the "--reducers 1" option upon
> >>> > invoking the tool).
> >>> >
> >>> > If the tool is invoked without specifying a number for
> >>> mappers/reducers, it
> >>> > appears that it tries to utilize the maximum number available. In my
> >>> case,
> >>> > it tries to use 16 mappers and 6 reducers. I have tried specifying
> many
> >>> > different combinations, and what I have found is that I can tweak the
> >>> > number of mappers to just about anything, but reducers must stay at
> "1"
> >>> or
> >>> > else the job fails. Also explains why I never saw this pop up on the
> >>> first
> >>> > system - looking closer at it, it defaults to only 1 reducer there.
> If I
> >>> > try to increase it, I get the same failure. When the job fails, I get
> >>> the
> >>> > following stack trace:
> >>> >
> >>> > 6602 [main] WARN  org.apache.hadoop.mapred.YarnChild  - Exception
> >>> running
> >>> > child : org.kitesdk.morphline.api.MorphlineRuntimeException:
> >>> > java.lang.IllegalStateException: No matching slice found! The slice
> >>> seems
> >>> > unavailable. docRouterClass:
> >>> org.apache.solr.common.cloud.ImplicitDocRouter
> >>> >         at
> >>> >
> >>>
> org.kitesdk.morphline.base.FaultTolerance.handleException(FaultTolerance.java:73)
> >>> >         at
> >>> >
> >>>
> org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:213)
> >>> >         at
> >>> >
> >>>
> org.apache.solr.hadoop.morphline.MorphlineMapper.map(MorphlineMapper.java:86)
> >>> >         at
> >>> >
> >>>
> org.apache.solr.hadoop.morphline.MorphlineMapper.map(MorphlineMapper.java:54)
> >>> >         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> >>> >         at
> >>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
> >>> >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> >>> >         at
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> >>> >         at java.security.AccessController.doPrivileged(Native Method)
> >>> >         at javax.security.auth.Subject.doAs(Subject.java:415)
> >>> >         at
> >>> >
> >>>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> >>> >         at
> org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> >>> > Caused by: java.lang.IllegalStateException: No matching slice found!
> The
> >>> > slice seems unavailable. docRouterClass:
> >>> > org.apache.solr.common.cloud.ImplicitDocRouter
> >>> >         at
> >>> >
> >>>
> org.apache.solr.hadoop.SolrCloudPartitioner.getPartition(SolrCloudPartitioner.java:120)
> >>> >         at
> >>> >
> >>>
> org.apache.solr.hadoop.SolrCloudPartitioner.getPartition(SolrCloudPartitioner.java:49)
> >>> >         at
> >>> >
> >>>
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:712)
> >>> >         at
> >>> >
> >>>
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
> >>> >         at
> >>> >
> >>>
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
> >>> >         at
> >>> >
> >>>
> org.apache.solr.hadoop.morphline.MorphlineMapper$MyDocumentLoader.load(MorphlineMapper.java:138)
> >>> >         at
> >>> >
> >>>
> org.apache.solr.morphlines.solr.LoadSolrBuilder$LoadSolr.doProcess(LoadSolrBuilder.java:129)
> >>> >         at
> >>> >
> >>>
> org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
> >>> >         at
> >>> org.kitesdk.morphline.base.Connector.process(Connector.java:64)
> >>> >         at
> >>> >
> >>>
> org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
> >>> >         at
> >>> >
> >>>
> org.apache.solr.morphlines.solr.SanitizeUnknownSolrFieldsBuilder$SanitizeUnknownSolrFields.doProcess(SanitizeUnknownSolrFieldsBuilder.java:94)
> >>> >         at
> >>> >
> >>>
> org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
> >>> >         at
> >>> org.kitesdk.morphline.base.Connector.process(Connector.java:64)
> >>> >         at
> >>> >
> >>>
> org.kitesdk.morphline.stdio.ReadCSVBuilder$ReadCSV.doProcess(ReadCSVBuilder.java:124)
> >>> >         at
> >>> >
> >>>
> org.kitesdk.morphline.stdio.AbstractParser.doProcess(AbstractParser.java:93)
> >>> >         at
> >>> >
> >>>
> org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
> >>> >         at
> >>> >
> >>>
> org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
> >>> >         at
> >>> >
> >>>
> org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
> >>> >         at
> >>> >
> >>>
> org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:201)
> >>> >         ... 10 more
> >>> >
> >>> > When I try searching online for "No matching slice found", the only
> >>> results
> >>> > I get back are of the source code.. I can't seem to find anything to
> >>> lead
> >>> > me in the right direction.
> >>> >
> >>> > Looking at the MapReduceIndexerTool more closely, it says that when
> >>> using
> >>> > more than one reducer per output shard (so in my case, >1) it will
> >>> utilize
> >>> > the "mtree" merge algorithm to merge the results held among several
> >>> > mini-shards. I'm guessing this might have something to do with it,
> but I
> >>> > can't find any other information on how this might be further
> tweaked or
> >>> > debugged.
> >>> >
> >>> > I can provide any additional information (environment settings,
> config
> >>> > files, etc) on request. Any help would be appreciated.
> >>> >
> >>> > Thanks,
> >>> > Doug
> >>>
> >>
> >>
>

Re: Problems using MapReduceIndexerTool with multiple reducers

Reply via email to