MRIT is not designed for that scenario, so you simply can't.
What people usually do is have a process whereby, after
the initial bulk load, there is some way their system-of-record
"knows" what new docs have been added since and
indexes only those. Flume is sometimes used if you have
access.
Best
Hi
I would like to index only new data but not already indexed data(delta
Indexing). how can i achieve it using MRIT.
Thanks in advance
--
View this message in context:
http://lucene.472066.n3.nabble.com/MapReduceIndexerTool-Indexing-tp4248387p4248573.html
Sent from the Solr - User mailing li
Yes it does. MRIT is intended for initial bulk loads. It takes whatever
it's pointed at and indexes it.
Additionally, it does not update documents. If the same document (by
ID) is indexed twice, you'll wind up with two copies in your results.
Best,
Erick
On Mon, Jan 4, 2016 at 5:00 AM, vidya wr
Charles:
bq: My understanding is that this is actually somewhat slower than
the standard indexing path...
Yes and no. If you just use a single thread, you're right it'll be
slower since it has to copy a
bunch of stuff around. Then at the end, the --go-live step copies the
built index to Solr
the
The OP asked about MapReduceIndexerTool. My understanding is that this is
actually somewhat slower than the standard indexing path and is recommended
only if the site is already invested in the Hadoop infrastructure. E.g. input
files are already distributed on the Hadoop/Search cluster via HD
Thanks Shawn,
perhaps the comment on the luceneMatchVersion in the example schema.xml
could be changed to reflect / clarify this?
this comment made me think that the parameter is affecting the index side
of things too (aka index format version). I.e. I would appreciate seeing
there things lik
On 4/11/2014 12:42 AM, Dmitry Kan wrote:
> Thanks! So solr 4.7 does not seem to respect the luceneMatchVersion on the
> binary (index) level. Or perhaps, I misunderstand the meaning of the
> luceneMatchVersion.
luceneMatchVersion does not dictate the index format. It is a way to
signal things lik
Thanks! So solr 4.7 does not seem to respect the luceneMatchVersion on the
binary (index) level. Or perhaps, I misunderstand the meaning of the
luceneMatchVersion.
This is what I see when loading index from hdfs via luke and launching the
Index Checker tool:
[clip]
Segments file=segments_2 numSeg
There’s no such other location in there. BTW, you can disable the mtree merge
via --reducers=-2 (or --reducers=0 in old versions) .
Wolfgang.
On Apr 10, 2014, at 3:44 PM, Dmitry Kan wrote:
> a correction: actually when I tested the above change I had so little data,
> that it didn't trigger su
a correction: actually when I tested the above change I had so little data,
that it didn't trigger sub-shard slicing and thus merging of the slices.
Still, looks as if somewhere in the map-reduce contrib code there is a
"link" to what lucene version to use.
Wolfgang, do you happen to know where th
Thanks for responding, Wolfgang.
Changing to LUCENE_43:
IndexWriterConfig writerConfig = new IndexWriterConfig(Version.LUCENE_43,
null);
didn't affect on the index format version, because, I believe, if the
format of the index to merge has been of higher version (4.1 in this case),
it will merge
There is a current limitation in that the code doesn’t actually look into
solrconfig.xml for the version. We should fix this, indeed. See
https://github.com/apache/lucene-solr/blob/trunk/solr/contrib/map-reduce/src/java/org/apache/solr/hadoop/TreeMergeOutputFormat.java#L100-101
Wolfgang.
On Apr
12 matches
Mail list logo