Great to know. Thank you very much for your assistance! On Tue, Jan 12, 2016 at 10:34 AM, Erick Erickson <erickerick...@gmail.com> wrote:
> bq: Do you know, is using the API the > recommended way of handling collections? As opposed to putting collection > folders containing "core.properties" file and "conf" folders (containing > "schema.xml" and "solrconfig.xml", etc) all in the Solr home location? > > Absolutely and certainly DO use the collections API to create > collections. DO NOT just try to create individual cores at various > places on your disk and hope that Solr does the right thing. Solr tries, > but as you've already discovered there are edge cases. > > Ditto for the Admin API. You _can_ use it, but unless you get everything > exactly correct you'll have problems. > > Unless you're at the end of all possibilities, use the Collecitons API > every time. > > Best, > Erick > > On Tue, Jan 12, 2016 at 10:30 AM, Douglas Rapp <dougma...@gmail.com> > wrote: > > As an update, I went ahead and used the Collection API and deleted the > > existing one, and then recreated it (specifying the compositeId router), > > and when I tried out MRIT, I didn't have any problems whatsoever with the > > number of reducers (and was able to cut the indexing time by over > half!!). > > I'm guessing that the issue was not with the router, but rather with how > > the collection was getting created. Do you know, is using the API the > > recommended way of handling collections? As opposed to putting collection > > folders containing "core.properties" file and "conf" folders (containing > > "schema.xml" and "solrconfig.xml", etc) all in the Solr home location? > > > > Thanks, > > Doug > > > > > > On Tue, Jan 12, 2016 at 9:26 AM, Douglas Rapp <dougma...@gmail.com> > wrote: > > > >> I'm actually not specifying any router, and assumed the "implicit" one > was > >> the default. The only resource I can find for setting the document > router > >> is when creating a new collection via the Collections API, which I am > not > >> using. What I do is define several options in the "solrconfig.xml" file, > >> then sync the conf directory with ZooKeeper, specifying the collection > >> name. Then, when I start up Solr, it grabs the config from ZooKeeper, > >> creates the HDFS directories (if not already present), and sets up the > >> collection automatically. At that point, I can use MRIT to generate the > >> indexes. Is that improper? Is there a way to specify the document > router in > >> solrconfig.xml? > >> > >> Your other questions: > >> 1) Yes, the indexes are hosted directly in HDFS. As are the input data > >> files. > >> 2) Yes, I am using the --go-live option > >> > >> Here is the syntax I am using: > >> > >> hadoop jar ../../lib/*.jar org.apache.solr.hadoop.MapReduceIndexerTool > >> hdfs://mhats-hadoop-master:54310/data --morphline-file > >> my-morphline-file.conf --output-dir > >> hdfs://mhats-hadoop-master:54310/solr/staging --log4j > ../log4j.properties > >> --zk-host my-zk-host --collection my-collection --go-live > >> > >> Thanks, > >> Doug > >> > >> On Mon, Jan 11, 2016 at 5:22 PM, Erick Erickson < > erickerick...@gmail.com> > >> wrote: > >> > >>> Hmm, it looks like you created your collection with the "implicit" > >>> router. Does the same thing happen when you use the default > >>> compositeId router? > >>> > >>> Note, this should be OK with either, this is just to gather more info. > >>> > >>> Other questions: > >>> 1> Are you running MRIT over Solr indexes that are actually hosted on > >>> HDFS? > >>> 2> Are you using the --go-live option? > >>> > >>> Actually, can you show us the entire command you use to invoke MRIT? > >>> > >>> Best, > >>> Erick > >>> > >>> On Mon, Jan 11, 2016 at 4:18 PM, Douglas Rapp <dougma...@gmail.com> > >>> wrote: > >>> > Hello, > >>> > > >>> > I am using Solr 4.10.4 in SolrCloud mode, but so far with only a > single > >>> > instance (so just a single shard - not very cloud-like..). > >>> > > >>> > I have been experimenting using the MapReduceIndexerTool to handle > batch > >>> > indexing of CSV files in HDFS. I got it working on a weaker > single-node > >>> > Hadoop test system, so I have been trying to do some performance > >>> testing on > >>> > a 4-node Hadoop cluster (1 NameNode, 3 DataNode) with better > hardware. > >>> The > >>> > issue that I have come across is that the job will only finish > >>> successfully > >>> > if I specify a single reducer (using the "--reducers 1" option upon > >>> > invoking the tool). > >>> > > >>> > If the tool is invoked without specifying a number for > >>> mappers/reducers, it > >>> > appears that it tries to utilize the maximum number available. In my > >>> case, > >>> > it tries to use 16 mappers and 6 reducers. I have tried specifying > many > >>> > different combinations, and what I have found is that I can tweak the > >>> > number of mappers to just about anything, but reducers must stay at > "1" > >>> or > >>> > else the job fails. Also explains why I never saw this pop up on the > >>> first > >>> > system - looking closer at it, it defaults to only 1 reducer there. > If I > >>> > try to increase it, I get the same failure. When the job fails, I get > >>> the > >>> > following stack trace: > >>> > > >>> > 6602 [main] WARN org.apache.hadoop.mapred.YarnChild - Exception > >>> running > >>> > child : org.kitesdk.morphline.api.MorphlineRuntimeException: > >>> > java.lang.IllegalStateException: No matching slice found! The slice > >>> seems > >>> > unavailable. docRouterClass: > >>> org.apache.solr.common.cloud.ImplicitDocRouter > >>> > at > >>> > > >>> > org.kitesdk.morphline.base.FaultTolerance.handleException(FaultTolerance.java:73) > >>> > at > >>> > > >>> > org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:213) > >>> > at > >>> > > >>> > org.apache.solr.hadoop.morphline.MorphlineMapper.map(MorphlineMapper.java:86) > >>> > at > >>> > > >>> > org.apache.solr.hadoop.morphline.MorphlineMapper.map(MorphlineMapper.java:54) > >>> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > >>> > at > >>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) > >>> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > >>> > at > org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > >>> > at java.security.AccessController.doPrivileged(Native Method) > >>> > at javax.security.auth.Subject.doAs(Subject.java:415) > >>> > at > >>> > > >>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > >>> > at > org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > >>> > Caused by: java.lang.IllegalStateException: No matching slice found! > The > >>> > slice seems unavailable. docRouterClass: > >>> > org.apache.solr.common.cloud.ImplicitDocRouter > >>> > at > >>> > > >>> > org.apache.solr.hadoop.SolrCloudPartitioner.getPartition(SolrCloudPartitioner.java:120) > >>> > at > >>> > > >>> > org.apache.solr.hadoop.SolrCloudPartitioner.getPartition(SolrCloudPartitioner.java:49) > >>> > at > >>> > > >>> > org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:712) > >>> > at > >>> > > >>> > org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) > >>> > at > >>> > > >>> > org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) > >>> > at > >>> > > >>> > org.apache.solr.hadoop.morphline.MorphlineMapper$MyDocumentLoader.load(MorphlineMapper.java:138) > >>> > at > >>> > > >>> > org.apache.solr.morphlines.solr.LoadSolrBuilder$LoadSolr.doProcess(LoadSolrBuilder.java:129) > >>> > at > >>> > > >>> > org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156) > >>> > at > >>> org.kitesdk.morphline.base.Connector.process(Connector.java:64) > >>> > at > >>> > > >>> > org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181) > >>> > at > >>> > > >>> > org.apache.solr.morphlines.solr.SanitizeUnknownSolrFieldsBuilder$SanitizeUnknownSolrFields.doProcess(SanitizeUnknownSolrFieldsBuilder.java:94) > >>> > at > >>> > > >>> > org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156) > >>> > at > >>> org.kitesdk.morphline.base.Connector.process(Connector.java:64) > >>> > at > >>> > > >>> > org.kitesdk.morphline.stdio.ReadCSVBuilder$ReadCSV.doProcess(ReadCSVBuilder.java:124) > >>> > at > >>> > > >>> > org.kitesdk.morphline.stdio.AbstractParser.doProcess(AbstractParser.java:93) > >>> > at > >>> > > >>> > org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156) > >>> > at > >>> > > >>> > org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181) > >>> > at > >>> > > >>> > org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156) > >>> > at > >>> > > >>> > org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:201) > >>> > ... 10 more > >>> > > >>> > When I try searching online for "No matching slice found", the only > >>> results > >>> > I get back are of the source code.. I can't seem to find anything to > >>> lead > >>> > me in the right direction. > >>> > > >>> > Looking at the MapReduceIndexerTool more closely, it says that when > >>> using > >>> > more than one reducer per output shard (so in my case, >1) it will > >>> utilize > >>> > the "mtree" merge algorithm to merge the results held among several > >>> > mini-shards. I'm guessing this might have something to do with it, > but I > >>> > can't find any other information on how this might be further > tweaked or > >>> > debugged. > >>> > > >>> > I can provide any additional information (environment settings, > config > >>> > files, etc) on request. Any help would be appreciated. > >>> > > >>> > Thanks, > >>> > Doug > >>> > >> > >> >