I have two questions:
1. I am pulling data from 2 data sources using the DIH. I am using the
deltaQuery functionality. Since the data sources pull data sequentially, I
find that some data is getting unnecessarily re-indexed from my second data
source. Hopefully this helps illustrate my probem:
Assume last_index_time is 0.
At time = 1, pull data from data source 1 with a query that includes
"last_modified> '${dataimporter.last_index_time}'". Note that this pulls
data for the time interval [0,1]. This step takes 1 time interval.
At time = 2, data source 2 is polled with the same query. This step takes 1
time interval. Note that this pulls data for the time interval [0,2].
At t=3, last_index_time is set to 1
Next time I run the DIH, I will be unneccessarily re-indexing data that
appeared in data source 2 in the inteval [1,2].
Ideally, I'd like to have access to something like
${dataimporter.current_index_time}, so I could restrict my delta query to:
"last_modified> '${dataimporter.last_index_time}' AND last_modified <
'${dataimporter.current_index_time}'"
Is this available?
2. I have a transient table that I query with the DIH to load my index.
After loading values into the index, I want to delete them from the
transient table. Is there a way to do this from the DIH? I tried stuffing a
delete statement into the deltaQuery attribute, but that didn't work:
<dataConfig>
<dataSource driver="org.hsqldb.jdbcDriver"
url="jdbc:hsqldb:/temp/example/ex" user="sa" />
<document name="products">
<entity name="item" pk="ID" query="select * from item"
deltaQuery="select id from item where last_modified >
'${dataimporter.last_index_time}'; delete from item where last_modified <
'${dataimporter.last_index_time}'">
</entity>
</entity>
</document>
</dataConfig>
--
View this message in context:
http://www.nabble.com/DataImportHandler-current_index_time---post-completion-action-tp18498832p18498832.html
Sent from the Solr - User mailing list archive at Nabble.com.