On 17.09.2010, at 05:40, Lance Norskog wrote:

> Database optimization is not like program optimization- it is wildly 
> unpredictable.

well an RDBMS that cannot handle true != false as a NOP during the planning 
stage doesn't even do basics in optimization.

But this approach is so much more efficient than the approach of reading out 
the id's of the changed rows in any RDBMS. Furthermore it gets rid of an 
essentially redundant query definition which improves readability and 
maintainability.

> What bugs me about the delta approach is using the last time DIH ran, rather 
> than a timestamp from the DB. Oh well. Also, with SOLR-1499 you can query 
> Solr directly to see what it has.

Yeah, it would be nice to be able to tell DIH to store the timestamp in some 
table. Aka there should be a way to run arbitrary SQL before and after and the 
to be stored new last update timestamp should be available.

> 
> Lukas Kahwe Smith wrote:
>> Hi,
>> 
>> I think i have mentioned this approach before on this list, but I really 
>> think that the deltaQuery approach which is currently explained as the "way 
>> to do updates" is far from ideal. It seems to add a lot of redundant queries.
>> 
>> I therefore propose to merge the initial import and delta queries using the 
>> below approach:
>> 
>>         <entity name="person" query="SELECT * FROM foo
>>         WHERE '${dataimporter.request.clean}' != 'false' OR last_updated>  
>> '${dataimporter.last_index_time}'">
>> 
>> Using this approach when clean = true the "last_updated>  
>> '${dataimporter.last_index_time}" should be optimized out by any sane RDBMS. 
>> And if clean = false it basically triggers the delta query part to be 
>> evaluated.
>> 
>> Is there any downside to this approach? Should this be added to the wiki?

Lukas Kahwe Smith
m...@pooteeweet.org



Reply via email to