Thanks, Wolfgang! Appreciate your support. Is there any plan to make it possible to update/delete existing SOLR docs using the MapReduceIndexerTool? Is such a thing even possible given the way it works behind the curtains?
Costi On Tue, May 6, 2014 at 3:58 PM, Wolfgang Hoschek <whosc...@cloudera.com>wrote: > Yes, this is a known issue. Repeatedly running the MapReduceIndexerTool on > the same set of input files can result in duplicate entries in the Solr > collection. This occurs because currently the tool can only insert > documents and cannot update or delete existing Solr documents. > > Wolfgang. > > On May 6, 2014, at 3:08 PM, Costi Muraru <costimur...@gmail.com> wrote: > > > Hi guys, > > > > I've used the MapReduceIndexerTool [1] in order to import data into SOLR > > and seem to stumbled upon something. I've followed the tutorial [2] and > > managed to import data into a SolrCloud cluster using the map reduce job. > > I ran the job a second time in order to update some of the existing > > documents. The job itself was successful, but the documents maintained > the > > same field values as before. > > In order to update some fields for the existing IDs, I've decompiled the > > AVRO sample file > > (examples/test-documents/sample-statuses-20120906-141433-medium.avro), > > updated some of the fields with new values, while maintaining the same > IDs > > and packaged the AVRO back. After this I ran the MapReduceIndexerTool > and, > > although successful, the records were not updated. > > I've tried this several times. Even with a few documents the result is > the > > same - the documents are not being updated with the new values. Instead, > > the old field values are kept. > > If I manually delete the old document from SOLR and after this I run the > > job, the document is inserted with the new values. > > > > Do you guys have any experience with this tool? Is this something by > design > > / Am I missing something? Can this behavior be overwritten to force an > > update? Any feedback is gladly appreciated. > > > > Thanks, > > Constantin > > > > [1] > > > http://www.cloudera.com/content/cloudera-content/cloudera-docs/Search/latest/Cloudera-Search-User-Guide/csug_mapreduceindexertool.html#csug_topic_6_1 > > > > [2] > > > http://www.cloudera.com/content/cloudera-content/cloudera-docs/Search/latest/Cloudera-Search-User-Guide/csug_batch_index_to_solr_servers_using_golive.html > >