Hi guys, I've used the MapReduceIndexerTool [1] in order to import data into SOLR and seem to stumbled upon something. I've followed the tutorial [2] and managed to import data into a SolrCloud cluster using the map reduce job. I ran the job a second time in order to update some of the existing documents. The job itself was successful, but the documents maintained the same field values as before. In order to update some fields for the existing IDs, I've decompiled the AVRO sample file (examples/test-documents/sample-statuses-20120906-141433-medium.avro), updated some of the fields with new values, while maintaining the same IDs and packaged the AVRO back. After this I ran the MapReduceIndexerTool and, although successful, the records were not updated. I've tried this several times. Even with a few documents the result is the same - the documents are not being updated with the new values. Instead, the old field values are kept. If I manually delete the old document from SOLR and after this I run the job, the document is inserted with the new values.
Do you guys have any experience with this tool? Is this something by design / Am I missing something? Can this behavior be overwritten to force an update? Any feedback is gladly appreciated. Thanks, Constantin [1] http://www.cloudera.com/content/cloudera-content/cloudera-docs/Search/latest/Cloudera-Search-User-Guide/csug_mapreduceindexertool.html#csug_topic_6_1 [2] http://www.cloudera.com/content/cloudera-content/cloudera-docs/Search/latest/Cloudera-Search-User-Guide/csug_batch_index_to_solr_servers_using_golive.html