Re: MapReduceIndexerTool Indexing

2016-01-05 Thread Erick Erickson
MRIT is not designed for that scenario, so you simply can't. What people usually do is have a process whereby, after the initial bulk load, there is some way their system-of-record "knows" what new docs have been added since and indexes only those. Flume is sometimes used if you have access. Best

Re: MapReduceIndexerTool Indexing

2016-01-04 Thread vidya
Hi I would like to index only new data but not already indexed data(delta Indexing). how can i achieve it using MRIT. Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/MapReduceIndexerTool-Indexing-tp4248387p4248573.html Sent from the Solr - User mailing li

Re: MapReduceIndexerTool Indexing

2016-01-04 Thread Erick Erickson
Yes it does. MRIT is intended for initial bulk loads. It takes whatever it's pointed at and indexes it. Additionally, it does not update documents. If the same document (by ID) is indexed twice, you'll wind up with two copies in your results. Best, Erick On Mon, Jan 4, 2016 at 5:00 AM, vidya wr

Re: MapReduceIndexerTool

2015-07-15 Thread Erick Erickson
Charles: bq: My understanding is that this is actually somewhat slower than the standard indexing path... Yes and no. If you just use a single thread, you're right it'll be slower since it has to copy a bunch of stuff around. Then at the end, the --go-live step copies the built index to Solr the

RE: MapReduceIndexerTool

2015-07-15 Thread Reitzel, Charles
The OP asked about MapReduceIndexerTool. My understanding is that this is actually somewhat slower than the standard indexing path and is recommended only if the site is already invested in the Hadoop infrastructure. E.g. input files are already distributed on the Hadoop/Search cluster via HD

Re: MapReduceIndexerTool does not respect Lucene version in solrconfig Was: converting 4.7 index to 4.3.1

2014-04-11 Thread Dmitry Kan
Thanks Shawn, perhaps the comment on the luceneMatchVersion in the example schema.xml could be changed to reflect / clarify this? this comment made me think that the parameter is affecting the index side of things too (aka index format version). I.e. I would appreciate seeing there things lik

Re: MapReduceIndexerTool does not respect Lucene version in solrconfig Was: converting 4.7 index to 4.3.1

2014-04-11 Thread Shawn Heisey
On 4/11/2014 12:42 AM, Dmitry Kan wrote: > Thanks! So solr 4.7 does not seem to respect the luceneMatchVersion on the > binary (index) level. Or perhaps, I misunderstand the meaning of the > luceneMatchVersion. luceneMatchVersion does not dictate the index format. It is a way to signal things lik

Re: MapReduceIndexerTool does not respect Lucene version in solrconfig Was: converting 4.7 index to 4.3.1

2014-04-10 Thread Dmitry Kan
Thanks! So solr 4.7 does not seem to respect the luceneMatchVersion on the binary (index) level. Or perhaps, I misunderstand the meaning of the luceneMatchVersion. This is what I see when loading index from hdfs via luke and launching the Index Checker tool: [clip] Segments file=segments_2 numSeg

Re: MapReduceIndexerTool does not respect Lucene version in solrconfig Was: converting 4.7 index to 4.3.1

2014-04-10 Thread Wolfgang Hoschek
There’s no such other location in there. BTW, you can disable the mtree merge via --reducers=-2 (or --reducers=0 in old versions) . Wolfgang. On Apr 10, 2014, at 3:44 PM, Dmitry Kan wrote: > a correction: actually when I tested the above change I had so little data, > that it didn't trigger su

Re: MapReduceIndexerTool does not respect Lucene version in solrconfig Was: converting 4.7 index to 4.3.1

2014-04-10 Thread Dmitry Kan
a correction: actually when I tested the above change I had so little data, that it didn't trigger sub-shard slicing and thus merging of the slices. Still, looks as if somewhere in the map-reduce contrib code there is a "link" to what lucene version to use. Wolfgang, do you happen to know where th

Re: MapReduceIndexerTool does not respect Lucene version in solrconfig Was: converting 4.7 index to 4.3.1

2014-04-10 Thread Dmitry Kan
Thanks for responding, Wolfgang. Changing to LUCENE_43: IndexWriterConfig writerConfig = new IndexWriterConfig(Version.LUCENE_43, null); didn't affect on the index format version, because, I believe, if the format of the index to merge has been of higher version (4.1 in this case), it will merge

Re: MapReduceIndexerTool does not respect Lucene version in solrconfig Was: converting 4.7 index to 4.3.1

2014-04-09 Thread Wolfgang Hoschek
There is a current limitation in that the code doesn’t actually look into solrconfig.xml for the version. We should fix this, indeed. See https://github.com/apache/lucene-solr/blob/trunk/solr/contrib/map-reduce/src/java/org/apache/solr/hadoop/TreeMergeOutputFormat.java#L100-101 Wolfgang. On Apr