First off, starting to play with the DataImportHandler (DIH) more...
very cool.
I have a rather simple case where I am indexing an RSS feed that
contains articles. For one or more articles, I have an entry in a
database that contains the URL of the article and a rating.
My config is appended below in [1]. The DB looks like [2]. The RSS
feed comes from my blog (heh, it's convenient and I control it)
Now, my question. Let's say I have an initial set of ratings for a
feed. I then do a full import of the articles on that feed.
Everything is peachy so far. Then, I get a new rating for an existing
article that I've already indexed, thus the child entity (named
"rating")
has a delta. However, when I run the delta-import, it doesn't pick
up any changes, since, I believe, the parent hasn't changed. Either
that, or I am doing something wrong. It seems like it is akin to the
parentDeltaQuery problem, but, of course, there is no parent query
since there is no parent table, in the DB sense, at least
not how I see it. The relevant logs are in [3].
Is this case handled? If not, Any suggestions for alternatives? Any
help would be appreciated.
Thanks,
Grant
[1]
<dataConfig>
<dataSource name="ratings" driver="org.postgresql.Driver"
url="jdbc:postgresql://localhost:5432/db" user="user" />
<dataSource name="rss" type="HttpDataSource" encoding="UTF-8"/>
<document>
<entity name="solrFeed"
pk="link"
url="http://lucene.grantingersoll.com/category/solr/feed"
processor="XPathEntityProcessor"
forEach="/rss/channel | /rss/channel/item"
dataSource="rss"
transformer="DateFormatTransformer">
<field column="source" xpath="/rss/channel/title"
commonField="true" />
<field column="source-link" xpath="/rss/channel/link"
commonField="true" />
<field column="title" xpath="/rss/channel/item/title" />
<field column="link" xpath="/rss/channel/item/link" />
<field column="description"
xpath="/rss/channel/item/description" />
<field column="category"
xpath="/rss/channel/item/category" />
<field column="content" xpath="/rss/channel/item/content" />
<!-- 'Sun, 18 May 2008 11:23:11 +0000' -->
<field column="date" xpath="/rss/channel/item/pubDate"
dateTimeFormat="EEE, dd MMM yyyy HH:mm:ss Z" />
<entity name="rating" pk="feed" query="select rating from feeds
where feed = '${solrFeed.link}'"
deltaQuery="select rating from feeds where feed = '$
{solrFeed.link}' AND last_modified > '${dataimporter.last_index_time}'"
dataSource="ratings"
>
<field column="rating" name="rating"/>
</entity>
</entity>
</document>
</dataConfig>
[2]
\d feeds
Table "public.feeds"
Column | Type | Modifiers
---------------+-----------------------------+-----------
feed | character varying(4096) | not null
rating | double precision |
last_modified | timestamp without time zone |
Indexes:
"feeds_pkey" PRIMARY KEY, btree (feed)
[3]
Oct 3, 2008 11:39:09 AM
org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Time taken for getConnection(): 8
Oct 3, 2008 11:39:09 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed ModifiedRowKey for Entity: rating rows obtained : 0
Oct 3, 2008 11:39:09 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Running DeletedRowKey() for Entity: rating
Oct 3, 2008 11:39:09 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed DeletedRowKey for Entity: rating rows obtained : 0
Oct 3, 2008 11:39:09 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed parentDeltaQuery for Entity: rating
Oct 3, 2008 11:39:09 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Running ModifiedRowKey() for Entity: solrFeed
Oct 3, 2008 11:39:09 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed ModifiedRowKey for Entity: solrFeed rows obtained : 0
Oct 3, 2008 11:39:09 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Running DeletedRowKey() for Entity: solrFeed
Oct 3, 2008 11:39:09 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed DeletedRowKey for Entity: solrFeed rows obtained : 0
Oct 3, 2008 11:39:09 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed parentDeltaQuery for Entity: solrFeed
Oct 3, 2008 11:39:09 AM org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties
INFO: Read dataimport.properties
Oct 3, 2008 11:39:09 AM org.apache.solr.handler.dataimport.SolrWriter
persistStartTime
INFO: Wrote last indexed time to dataimport.properties
Oct 3, 2008 11:39:09 AM org.apache.solr.handler.dataimport.DocBuilder
doDelta
INFO: Delta Import completed successfully