Hi guys,

I've used the MapReduceIndexerTool [1] in order to import data into SOLR
and seem to stumbled upon something. I've followed the tutorial [2] and
managed to import data into a SolrCloud cluster using the map reduce job.
I ran the job a second time in order to update some of the existing
documents. The job itself was successful, but the documents maintained the
same field values as before.
In order to update some fields for the existing IDs, I've decompiled the
AVRO sample file
(examples/test-documents/sample-statuses-20120906-141433-medium.avro),
updated some of the fields with new values, while maintaining the same IDs
and packaged the AVRO back. After this I ran the MapReduceIndexerTool and,
although successful, the records were not updated.
I've tried this several times. Even with a few documents the result is the
same - the documents are not being updated with the new values. Instead,
the old field values are kept.
If I manually delete the old document from SOLR and after this I run the
job, the document is inserted with the new values.

Do you guys have any experience with this tool? Is this something by design
/ Am I missing something? Can this behavior be overwritten to force an
update? Any feedback is gladly appreciated.

Thanks,
Constantin

[1]
http://www.cloudera.com/content/cloudera-content/cloudera-docs/Search/latest/Cloudera-Search-User-Guide/csug_mapreduceindexertool.html#csug_topic_6_1

[2]
http://www.cloudera.com/content/cloudera-content/cloudera-docs/Search/latest/Cloudera-Search-User-Guide/csug_batch_index_to_solr_servers_using_golive.html

Reply via email to