This (almost) sounds like https://issues.apache.org/jira/browse/SOLR-2492 which 
was fixed in Solr 3.4 .. Are you on an earlier version?

But maybe not, because you're seeing the # deleted documents increment, and 
prior to this bug fix (I think) the deleted counter wasn't getting incremented 
either.

Perhaps this is a related bug that only happens when the deletes are added via 
a transformer?  Try a query like this without a transformer:

select uniqueID as '$deleteDocById' from table where uniqueID = '1-devpeter-1';

Does this work?  If so, you've probably stumbled on a new bug related to 
SOLR-2492.

In any case, the workaround (probably) is to manually issue a commit after 
doing your deletes.  Or, combine your deletes with add/updates in the same DIH 
run and it should commit automatically as configured.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Peter Boudreau [mailto:pe...@makeshop.jp] 
Sent: Friday, March 09, 2012 2:22 AM
To: solr-user@lucene.apache.org
Subject: Solr DIH and $deleteDocById

Hello everyone,

I've got Solr DIH up and running with no problems as far as importing data, but 
I'm now trying to add some functionality to our delta import to delete invalid 
records.

The special command $deleteDocById seems to provide what I'm looking for, and 
just for testing purposes until I get things working, I setup a simple 
transformer to delete just one document with a specific ID:

<script>
<![CDATA[ 
    function deleteBadDocs(row) {
        var uniqueID = row.get('unique_id');
        if(uniqueID == '1-devpeter-1') { 
            row.put('$deleteDocById', uniqueID); 
        }
        return row; 
    }
]]>
</script>

When I run DIH with this, sure enough, it tells me that 1 document was deleted:

Indexing completed. Added/Updated: 4755 documents. Deleted 1 documents. 

But then when I search the index, the document is still there.  I've been 
googling this for a while now, and found a number of references saying that you 
need to commit or optimize after this in order for the deletes to take effect, 
but I was under the impression that DIH both commits and optimizes by default, 
so shouldn't it be getting committed and optimized automatically by DIH?  I 
even tried implicitly setting the commit= and optimize= flags to true, but 
still, the deleted document was still in the index when I searched.  I also 
tried restarting Solr, but the deleted document was still there.

Could anyone help me understand why this document which is being reported as 
deleted still shows up in the index?

Also, there is one thing which I'm unclear on after reading the Solr wiki:

$deleteDocById : Delete a doc from Solr with this id. The value has to be the 
uniqueKey value of the document. Note that this command can only delete docs 
already committed to the index. 

I was starting to think that maybe $deleteDocById was only preventing documents 
from entering the index, and not deleting existing documents which were already 
in the index, but if I understand this correctly, $deleteDocById should be able 
to delete a document which was already in the index *before* running DIH, right?

Any help would be very much appreciated.

Thanks in advance,

Peter

Reply via email to