First off, starting to play with the DataImportHandler (DIH) more... very cool.

I have a rather simple case where I am indexing an RSS feed that contains articles. For one or more articles, I have an entry in a database that contains the URL of the article and a rating.

My config is appended below in [1]. The DB looks like [2]. The RSS feed comes from my blog (heh, it's convenient and I control it)

Now, my question. Let's say I have an initial set of ratings for a feed. I then do a full import of the articles on that feed. Everything is peachy so far. Then, I get a new rating for an existing article that I've already indexed, thus the child entity (named "rating") has a delta. However, when I run the delta-import, it doesn't pick up any changes, since, I believe, the parent hasn't changed. Either that, or I am doing something wrong. It seems like it is akin to the parentDeltaQuery problem, but, of course, there is no parent query since there is no parent table, in the DB sense, at least
not how I see it.  The relevant logs are in [3].

Is this case handled? If not, Any suggestions for alternatives? Any help would be appreciated.

Thanks,
Grant

[1]
<dataConfig>
<dataSource name="ratings" driver="org.postgresql.Driver" url="jdbc:postgresql://localhost:5432/db" user="user" />
  <dataSource name="rss" type="HttpDataSource" encoding="UTF-8"/>
        <document>
                <entity name="solrFeed"
                                pk="link"
                                
url="http://lucene.grantingersoll.com/category/solr/feed";
                                processor="XPathEntityProcessor"
                                forEach="/rss/channel | /rss/channel/item"
            dataSource="rss"
        transformer="DateFormatTransformer">
<field column="source" xpath="/rss/channel/title" commonField="true" /> <field column="source-link" xpath="/rss/channel/link" commonField="true" />
                        <field column="title" xpath="/rss/channel/item/title" />
                        <field column="link" xpath="/rss/channel/item/link" />
                        <field column="description" 
xpath="/rss/channel/item/description" />
                        <field column="category" 
xpath="/rss/channel/item/category" />
      <field column="content" xpath="/rss/channel/item/content" />
      <!-- 'Sun, 18 May 2008 11:23:11 +0000' -->
<field column="date" xpath="/rss/channel/item/pubDate" dateTimeFormat="EEE, dd MMM yyyy HH:mm:ss Z" />
                        
<entity name="rating" pk="feed" query="select rating from feeds where feed = '${solrFeed.link}'" deltaQuery="select rating from feeds where feed = '$ {solrFeed.link}' AND last_modified > '${dataimporter.last_index_time}'"
              dataSource="ratings"
              >
        <field column="rating" name="rating"/>
      </entity>
    </entity>
        </document>
</dataConfig>

 [2]
 \d feeds
                  Table "public.feeds"
    Column     |            Type             | Modifiers
---------------+-----------------------------+-----------
 feed          | character varying(4096)     | not null
 rating        | double precision            |
 last_modified | timestamp without time zone |
Indexes:
    "feeds_pkey" PRIMARY KEY, btree (feed)

[3]
Oct 3, 2008 11:39:09 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Time taken for getConnection(): 8
Oct 3, 2008 11:39:09 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta
INFO: Completed ModifiedRowKey for Entity: rating rows obtained : 0
Oct 3, 2008 11:39:09 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta
INFO: Running DeletedRowKey() for Entity: rating
Oct 3, 2008 11:39:09 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta
INFO: Completed DeletedRowKey for Entity: rating rows obtained : 0
Oct 3, 2008 11:39:09 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta
INFO: Completed parentDeltaQuery for Entity: rating
Oct 3, 2008 11:39:09 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta
INFO: Running ModifiedRowKey() for Entity: solrFeed
Oct 3, 2008 11:39:09 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta
INFO: Completed ModifiedRowKey for Entity: solrFeed rows obtained : 0
Oct 3, 2008 11:39:09 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta
INFO: Running DeletedRowKey() for Entity: solrFeed
Oct 3, 2008 11:39:09 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta
INFO: Completed DeletedRowKey for Entity: solrFeed rows obtained : 0
Oct 3, 2008 11:39:09 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta
INFO: Completed parentDeltaQuery for Entity: solrFeed
Oct 3, 2008 11:39:09 AM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties
INFO: Read dataimport.properties
Oct 3, 2008 11:39:09 AM org.apache.solr.handler.dataimport.SolrWriter persistStartTime
INFO: Wrote last indexed time to dataimport.properties
Oct 3, 2008 11:39:09 AM org.apache.solr.handler.dataimport.DocBuilder doDelta
INFO: Delta Import completed successfully

Reply via email to