Problem was incorrect pk definition on data-config.xml <entity name="blog" pk="id" ....... <field column="uuid" name="uuid" template="blog-${blog.id}" /> <field column="id" name="blog_id" />
pk attribute needs to be the same as Solr uniqueField, so in my case changing pk value from id to uuid solved the problem. 2010/12/7 Matti Oinas <matti.oi...@gmail.com>: > Thanks Koji. > > Problem seems to be that template transformer is not used when delete > is performed. > > ... > Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder > collectDelta > INFO: Completed ModifiedRowKey for Entity: entry rows obtained : 0 > Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder > collectDelta > INFO: Completed DeletedRowKey for Entity: entry rows obtained : 1223 > Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder > collectDelta > INFO: Completed parentDeltaQuery for Entity: entry > Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder deleteAll > INFO: Deleting stale documents > Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc > INFO: Deleting document: 787 > Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc > INFO: Deleting document: 786 > ... > > There are entries with id 787 and 786 in database and those are marked > as deleted. Query returns right number of deleted documents and right > rows from database but delete fails because solr is using plain > numeric id when deleting document. The same happens with blogs also. > > Matti > > > 2010/12/4 Koji Sekiguchi <k...@r.email.ne.jp>: >> (10/11/17 20:18), Matti Oinas wrote: >>> >>> Solr does not delete documents from index although delta-import says >>> it has deleted n documents from index. I'm using version 1.4.1. >>> >>> The schema looks like >>> >>> <fields> >>> <field name="uuid" type="string" indexed="true" stored="true" >>> required="true" /> >>> <field name="type" type="int" indexed="true" stored="true" >>> required="true" /> >>> <field name="blog_id" type="int" indexed="true" stored="true" /> >>> <field name="entry_id" type="int" indexed="false" stored="true" /> >>> <field name="content" type="textgen" indexed="true" stored="true" /> >>> </fields> >>> <uniqueKey>uuid</uniqueKey> >>> >>> >>> Relevant fields from database tables: >>> >>> TABLE: blogs and entries both have >>> >>> Field: id >>> Type: int(11) >>> Null: NO >>> Key: PRI >>> Default: NULL >>> Extra: auto_increment >>> ------------------------------------ >>> Field: modified >>> Type: datetime >>> Null: YES >>> Key: >>> Default: NULL >>> Extra: >>> ------------------------------------ >>> Field: status >>> Type: tinyint(1) unsigned >>> Null: YES >>> Key: >>> Default: NULL >>> Extra: >>> >>> >>> <?xml version="1.0" encoding="UTF-8" ?> >>> <dataConfig> >>> <dataSource type="JdbcDataSource" >>> driver="com.mysql.jdbc.Driver".../> >>> <document> >>> <entity name="blog" >>> pk="id" >>> query="SELECT id,description,1 as type FROM >>> blogs WHERE status=2" >>> deltaImportQuery="SELECT id,description,1 >>> as type FROM blogs WHERE >>> status=2 AND id='${dataimporter.delta.id}'" >>> deltaQuery="SELECT id FROM blogs WHERE >>> '${dataimporter.last_index_time}'< modified AND status=2" >>> deletedPkQuery="SELECT id FROM blogs WHERE >>> '${dataimporter.last_index_time}'<= modified AND status=3" >>> transformer="TemplateTransformer"> >>> <field column="uuid" name="uuid" >>> template="blog-${blog.id}" /> >>> <field column="id" name="blog_id" /> >>> <field column="description" name="content" /> >>> <field column="type" name="type" /> >>> </entity> >>> <entity name="entry" >>> pk="id" >>> query="SELECT f.id as >>> id,f.content,f.blog_id,2 as type FROM >>> entries f,blogs b WHERE f.blog_id=b.id AND b.status=2" >>> deltaImportQuery="SELECT f.id as >>> id,f.content,f.blog_id,2 as type >>> FROM entries f,blogs b WHERE f.blog_id=b.id AND >>> f.id='${dataimporter.delta.id}'" >>> deltaQuery="SELECT f.id as id FROM entries >>> f JOIN blogs b ON >>> b.id=f.blog_id WHERE '${dataimporter.last_index_time}'< b.modified >>> AND b.status=2" >>> deletedPkQuery="SELECT f.id as id FROM >>> entries f JOIN blogs b ON >>> b.id=f.blog_id WHERE b.status!=2 AND '${dataimporter.last_index_time}' >>> < b.modified" >>> >>> transformer="HTMLStripTransformer,TemplateTransformer"> >>> <field column="uuid" name="uuid" >>> template="entry-${entry.id}" /> >>> <field column="id" name="entry_id" /> >>> <field column="blog_id" name="blog_id" /> >>> <field column="content" name="content" >>> stripHTML="true" /> >>> <field column="type" name="type" /> >>> </entity> >>> </document> >>> </dataConfig> >>> >>> Full import and delta import works without problems when it comes to >>> adding new documents to the index but when blog is deleted (status is >>> set to 3 in database), solr report after delta import is something >>> like "Indexing completed. Added/Updated: 0 documents. Deleted 81 >>> documents.". The problem is that documents are still found from solr >>> index. >>> >>> 1. UPDATE blogs SET modified=NOW(),status=3 WHERE id=26; >>> >>> 2. delta-import => >>> >>> <str name=""> >>> Indexing completed. Added/Updated: 0 documents. Deleted 81 documents. >>> </str> >>> <str name="Committed">2010-11-17 13:00:50</str> >>> <str name="Optimized">2010-11-17 13:00:50</str> >>> >>> So solr says it has deleted documents and that index is also optimzed >>> and committed after the operation. >>> >>> 3. Search; blog_id:26 still returns 1 document with type 1 (blog) and >>> 80 documents with type 2 (entry). >>> >> >> Hi Matti, >> >> Can you see something like the following "Completed DeletedRowKey for >> Entity" >> and then "Deleting document: ID-1" in your solr log? >> >> (sample messages from my Solr log) >> Dec 4, 2010 8:25:40 PM org.apache.solr.handler.dataimport.DocBuilder >> collectDelta >> INFO: Completed DeletedRowKey for Entity: product rows obtained : 2 >> : >> Dec 4, 2010 8:25:40 PM org.apache.solr.handler.dataimport.DocBuilder >> deleteAll >> INFO: Deleting stale documents >> Dec 4, 2010 8:25:40 PM org.apache.solr.handler.dataimport.SolrWriter >> deleteDoc >> INFO: Deleting document: OVEN-2 >> : >> >> If you cannot find these messages, I think there is something incorrect >> setting (but I couldn't find incorrect ones in your data-config.xml...). >> >> Koji >> -- >> http://www.rondhuit.com/en/ >> >