I think delta imports only work on the parent entity and cached child entities
will load in full, even if you only need to look up a few rows for the delta.
Others though might have a way to get this to work.
Here's two possible workarounds.
On the child entity, specify:
<entity processer="SqlEntityProcessor" name="media_tags_map"
cacheImpl="${cache.impl}" />
When it is a full import, pass the parameter: cache.impl=SortedMapBackedCache .
For delta imports, leave this blank. This (I think) will give you a cache for
the full-import and no cache for the deltas.
Another workaround is to include a subquery on your delta import like this:
Select * from table ${delta.subquery}
When it is a delta import, pass the pass the paremeter: delta.subquery=where
blah in (select blah from parent_table ...)
This will cause it to cache only the entries needed for that delta import.
James Dyer
Ingram Content Group
(615) 213-4311
-----Original Message-----
From: [email protected] [mailto:[email protected]] On
Behalf Of David Larochelle
Sent: Monday, September 23, 2013 5:22 PM
To: solr-user
Subject: Using CachedSqlEntityProcessor with delta imports in DIH
I'm trying to use the CachedSqlEntityProcessor on a child entity that also
has a delta query.
Full imports and delta imports of the parent entity work fine however delta
imports for the child entity have no effect. If I remove the
processor="CachedSqlEntityProcessor" attribute from the child entity, the
delta import works flawlessly but the full import is very slow.
Here's my data-config.xml:
<dataConfig>
<xi:include href="db-connection.xml"
xmlns:xi="http://www.w3.org/2001/XInclude"/>
<document>
<entity name="story_sentences"
pk="story_sentences_id"
query="select story_sentences_id || '_ss' as id, 'ss' as
field_type, * from story_sentences"
deltaImportQuery="select story_sentences_id || '_ss' as id,
'ss' as field_type, * from story_sentences where story_sentences_id=${
dataimporter.delta.id}"
deltaQuery="SELECT story_sentences_id as id, story_sentences_id
from story_sentences where db_row_last_updated >
'${dih.last_index_time}' ">
<entity name="media_tags_map"
pk="media_tags_map_id"
query="select tags_id as tags_id_media, * from media_tags_map"
cacheKey="media_id"
cacheLookup="story_sentences.media_id"
processor="CachedSqlEntityProcessor"
deltaQuery="select media_tags_map_id, media_id::varchar from
media_tags_map where db_row_last_updated > '${dih.last_index_time}' "
parentDeltaQuery="select story_sentences_id as id from
story_sentences where media_id = ${media_tags_map.media_id}"
>
</entity>
</entity>
</document>
</dataConfig>
I need to be able to run delta imports based on the media_tags_map table in
addition to the story_sentences table.
Any idea why delta imports for media_tags_map won't work when the
CachedSqlEntityProcessor is used?
I've searched extensively but can't find an example that uses both
CachedSqlEntityProcessor and deltaQuery on the sub-entity or any
explanation of why the above configuration won't work as expected.
--
Thanks,
David