Problem was incorrect pk definition on data-config.xml

<entity name="blog"
                               pk="id"
                             .......
                       <field column="uuid" name="uuid"
template="blog-${blog.id}" />
                       <field column="id" name="blog_id" />

pk attribute needs to be the same as Solr uniqueField, so in my case
changing pk value from id to uuid solved the problem.


2010/12/7 Matti Oinas <matti.oi...@gmail.com>:
> Thanks Koji.
>
> Problem seems to be that template transformer is not used when delete
> is performed.
>
> ...
> Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed ModifiedRowKey for Entity: entry rows obtained : 0
> Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed DeletedRowKey for Entity: entry rows obtained : 1223
> Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed parentDeltaQuery for Entity: entry
> Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder deleteAll
> INFO: Deleting stale documents
> Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
> INFO: Deleting document: 787
> Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
> INFO: Deleting document: 786
> ...
>
> There are entries with id 787 and 786 in database and those are marked
> as deleted. Query returns right number of deleted documents and right
> rows from database but delete fails because solr is using plain
> numeric id when deleting document. The same happens with blogs also.
>
> Matti
>
>
> 2010/12/4 Koji Sekiguchi <k...@r.email.ne.jp>:
>> (10/11/17 20:18), Matti Oinas wrote:
>>>
>>> Solr does not delete documents from index although delta-import says
>>> it has deleted n documents from index. I'm using version 1.4.1.
>>>
>>> The schema looks like
>>>
>>>  <fields>
>>>     <field name="uuid" type="string" indexed="true" stored="true"
>>> required="true" />
>>>     <field name="type" type="int" indexed="true" stored="true"
>>> required="true" />
>>>     <field name="blog_id" type="int" indexed="true" stored="true" />
>>>     <field name="entry_id" type="int" indexed="false" stored="true" />
>>>     <field name="content" type="textgen" indexed="true" stored="true" />
>>>  </fields>
>>>  <uniqueKey>uuid</uniqueKey>
>>>
>>>
>>> Relevant fields from database tables:
>>>
>>> TABLE: blogs and entries both have
>>>
>>>   Field: id
>>>    Type: int(11)
>>>    Null: NO
>>>     Key: PRI
>>> Default: NULL
>>>   Extra: auto_increment
>>> ------------------------------------
>>>   Field: modified
>>>    Type: datetime
>>>    Null: YES
>>>     Key:
>>> Default: NULL
>>>   Extra:
>>> ------------------------------------
>>>   Field: status
>>>    Type: tinyint(1) unsigned
>>>    Null: YES
>>>     Key:
>>> Default: NULL
>>>   Extra:
>>>
>>>
>>> <?xml version="1.0" encoding="UTF-8" ?>
>>> <dataConfig>
>>>        <dataSource type="JdbcDataSource"
>>> driver="com.mysql.jdbc.Driver".../>
>>>        <document>
>>>                <entity name="blog"
>>>                                pk="id"
>>>                                query="SELECT id,description,1 as type FROM
>>> blogs WHERE status=2"
>>>                                deltaImportQuery="SELECT id,description,1
>>> as type FROM blogs WHERE
>>> status=2 AND id='${dataimporter.delta.id}'"
>>>                                deltaQuery="SELECT id FROM blogs WHERE
>>> '${dataimporter.last_index_time}'&lt; modified AND status=2"
>>>                                deletedPkQuery="SELECT id FROM blogs WHERE
>>> '${dataimporter.last_index_time}'&lt;= modified AND status=3"
>>>                                transformer="TemplateTransformer">
>>>                        <field column="uuid" name="uuid"
>>> template="blog-${blog.id}" />
>>>                        <field column="id" name="blog_id" />
>>>                        <field column="description" name="content" />
>>>                        <field column="type" name="type" />
>>>                </entity>
>>>                <entity name="entry"
>>>                                pk="id"
>>>                                query="SELECT f.id as
>>> id,f.content,f.blog_id,2 as type FROM
>>> entries f,blogs b WHERE f.blog_id=b.id AND b.status=2"
>>>                                deltaImportQuery="SELECT f.id as
>>> id,f.content,f.blog_id,2 as type
>>> FROM entries f,blogs b WHERE f.blog_id=b.id AND
>>> f.id='${dataimporter.delta.id}'"
>>>                                deltaQuery="SELECT f.id as id FROM entries
>>> f JOIN blogs b ON
>>> b.id=f.blog_id WHERE '${dataimporter.last_index_time}'&lt; b.modified
>>> AND b.status=2"
>>>                                deletedPkQuery="SELECT f.id as id FROM
>>> entries f JOIN blogs b ON
>>> b.id=f.blog_id WHERE b.status!=2 AND '${dataimporter.last_index_time}'
>>> &lt; b.modified"
>>>
>>>  transformer="HTMLStripTransformer,TemplateTransformer">
>>>                        <field column="uuid" name="uuid"
>>> template="entry-${entry.id}" />
>>>                        <field column="id" name="entry_id" />
>>>                        <field column="blog_id" name="blog_id" />
>>>                        <field column="content" name="content"
>>> stripHTML="true" />
>>>                        <field column="type" name="type" />
>>>                </entity>
>>>        </document>
>>> </dataConfig>
>>>
>>> Full import and delta import works without problems when it comes to
>>> adding new documents to the index but when blog is deleted (status is
>>> set to 3 in database), solr report after delta import is something
>>> like "Indexing completed. Added/Updated: 0 documents. Deleted 81
>>> documents.". The problem is that documents are still found from solr
>>> index.
>>>
>>> 1. UPDATE blogs SET modified=NOW(),status=3 WHERE id=26;
>>>
>>> 2. delta-import =>
>>>
>>> <str name="">
>>> Indexing completed. Added/Updated: 0 documents. Deleted 81 documents.
>>> </str>
>>> <str name="Committed">2010-11-17 13:00:50</str>
>>> <str name="Optimized">2010-11-17 13:00:50</str>
>>>
>>> So solr says it has deleted documents and that index is also optimzed
>>> and committed after the operation.
>>>
>>> 3. Search; blog_id:26 still returns 1 document with type 1 (blog) and
>>> 80 documents with type 2 (entry).
>>>
>>
>> Hi Matti,
>>
>> Can you see something like the following "Completed DeletedRowKey for
>> Entity"
>> and then "Deleting document: ID-1" in your solr log?
>>
>> (sample messages from my Solr log)
>> Dec 4, 2010 8:25:40 PM org.apache.solr.handler.dataimport.DocBuilder
>> collectDelta
>> INFO: Completed DeletedRowKey for Entity: product rows obtained : 2
>>  :
>> Dec 4, 2010 8:25:40 PM org.apache.solr.handler.dataimport.DocBuilder
>> deleteAll
>> INFO: Deleting stale documents
>> Dec 4, 2010 8:25:40 PM org.apache.solr.handler.dataimport.SolrWriter
>> deleteDoc
>> INFO: Deleting document: OVEN-2
>>  :
>>
>> If you cannot find these messages, I think there is something incorrect
>> setting (but I couldn't find incorrect ones in your data-config.xml...).
>>
>> Koji
>> --
>> http://www.rondhuit.com/en/
>>
>

Reply via email to