Thanks Koji. Problem seems to be that template transformer is not used when delete is performed.
... Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed ModifiedRowKey for Entity: entry rows obtained : 0 Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed DeletedRowKey for Entity: entry rows obtained : 1223 Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed parentDeltaQuery for Entity: entry Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder deleteAll INFO: Deleting stale documents Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc INFO: Deleting document: 787 Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc INFO: Deleting document: 786 ... There are entries with id 787 and 786 in database and those are marked as deleted. Query returns right number of deleted documents and right rows from database but delete fails because solr is using plain numeric id when deleting document. The same happens with blogs also. Matti 2010/12/4 Koji Sekiguchi <k...@r.email.ne.jp>: > (10/11/17 20:18), Matti Oinas wrote: >> >> Solr does not delete documents from index although delta-import says >> it has deleted n documents from index. I'm using version 1.4.1. >> >> The schema looks like >> >> <fields> >> <field name="uuid" type="string" indexed="true" stored="true" >> required="true" /> >> <field name="type" type="int" indexed="true" stored="true" >> required="true" /> >> <field name="blog_id" type="int" indexed="true" stored="true" /> >> <field name="entry_id" type="int" indexed="false" stored="true" /> >> <field name="content" type="textgen" indexed="true" stored="true" /> >> </fields> >> <uniqueKey>uuid</uniqueKey> >> >> >> Relevant fields from database tables: >> >> TABLE: blogs and entries both have >> >> Field: id >> Type: int(11) >> Null: NO >> Key: PRI >> Default: NULL >> Extra: auto_increment >> ------------------------------------ >> Field: modified >> Type: datetime >> Null: YES >> Key: >> Default: NULL >> Extra: >> ------------------------------------ >> Field: status >> Type: tinyint(1) unsigned >> Null: YES >> Key: >> Default: NULL >> Extra: >> >> >> <?xml version="1.0" encoding="UTF-8" ?> >> <dataConfig> >> <dataSource type="JdbcDataSource" >> driver="com.mysql.jdbc.Driver".../> >> <document> >> <entity name="blog" >> pk="id" >> query="SELECT id,description,1 as type FROM >> blogs WHERE status=2" >> deltaImportQuery="SELECT id,description,1 >> as type FROM blogs WHERE >> status=2 AND id='${dataimporter.delta.id}'" >> deltaQuery="SELECT id FROM blogs WHERE >> '${dataimporter.last_index_time}'< modified AND status=2" >> deletedPkQuery="SELECT id FROM blogs WHERE >> '${dataimporter.last_index_time}'<= modified AND status=3" >> transformer="TemplateTransformer"> >> <field column="uuid" name="uuid" >> template="blog-${blog.id}" /> >> <field column="id" name="blog_id" /> >> <field column="description" name="content" /> >> <field column="type" name="type" /> >> </entity> >> <entity name="entry" >> pk="id" >> query="SELECT f.id as >> id,f.content,f.blog_id,2 as type FROM >> entries f,blogs b WHERE f.blog_id=b.id AND b.status=2" >> deltaImportQuery="SELECT f.id as >> id,f.content,f.blog_id,2 as type >> FROM entries f,blogs b WHERE f.blog_id=b.id AND >> f.id='${dataimporter.delta.id}'" >> deltaQuery="SELECT f.id as id FROM entries >> f JOIN blogs b ON >> b.id=f.blog_id WHERE '${dataimporter.last_index_time}'< b.modified >> AND b.status=2" >> deletedPkQuery="SELECT f.id as id FROM >> entries f JOIN blogs b ON >> b.id=f.blog_id WHERE b.status!=2 AND '${dataimporter.last_index_time}' >> < b.modified" >> >> transformer="HTMLStripTransformer,TemplateTransformer"> >> <field column="uuid" name="uuid" >> template="entry-${entry.id}" /> >> <field column="id" name="entry_id" /> >> <field column="blog_id" name="blog_id" /> >> <field column="content" name="content" >> stripHTML="true" /> >> <field column="type" name="type" /> >> </entity> >> </document> >> </dataConfig> >> >> Full import and delta import works without problems when it comes to >> adding new documents to the index but when blog is deleted (status is >> set to 3 in database), solr report after delta import is something >> like "Indexing completed. Added/Updated: 0 documents. Deleted 81 >> documents.". The problem is that documents are still found from solr >> index. >> >> 1. UPDATE blogs SET modified=NOW(),status=3 WHERE id=26; >> >> 2. delta-import => >> >> <str name=""> >> Indexing completed. Added/Updated: 0 documents. Deleted 81 documents. >> </str> >> <str name="Committed">2010-11-17 13:00:50</str> >> <str name="Optimized">2010-11-17 13:00:50</str> >> >> So solr says it has deleted documents and that index is also optimzed >> and committed after the operation. >> >> 3. Search; blog_id:26 still returns 1 document with type 1 (blog) and >> 80 documents with type 2 (entry). >> > > Hi Matti, > > Can you see something like the following "Completed DeletedRowKey for > Entity" > and then "Deleting document: ID-1" in your solr log? > > (sample messages from my Solr log) > Dec 4, 2010 8:25:40 PM org.apache.solr.handler.dataimport.DocBuilder > collectDelta > INFO: Completed DeletedRowKey for Entity: product rows obtained : 2 > : > Dec 4, 2010 8:25:40 PM org.apache.solr.handler.dataimport.DocBuilder > deleteAll > INFO: Deleting stale documents > Dec 4, 2010 8:25:40 PM org.apache.solr.handler.dataimport.SolrWriter > deleteDoc > INFO: Deleting document: OVEN-2 > : > > If you cannot find these messages, I think there is something incorrect > setting (but I couldn't find incorrect ones in your data-config.xml...). > > Koji > -- > http://www.rondhuit.com/en/ >