On 17.09.2010, at 05:40, Lance Norskog wrote: > Database optimization is not like program optimization- it is wildly > unpredictable.
well an RDBMS that cannot handle true != false as a NOP during the planning stage doesn't even do basics in optimization. But this approach is so much more efficient than the approach of reading out the id's of the changed rows in any RDBMS. Furthermore it gets rid of an essentially redundant query definition which improves readability and maintainability. > What bugs me about the delta approach is using the last time DIH ran, rather > than a timestamp from the DB. Oh well. Also, with SOLR-1499 you can query > Solr directly to see what it has. Yeah, it would be nice to be able to tell DIH to store the timestamp in some table. Aka there should be a way to run arbitrary SQL before and after and the to be stored new last update timestamp should be available. > > Lukas Kahwe Smith wrote: >> Hi, >> >> I think i have mentioned this approach before on this list, but I really >> think that the deltaQuery approach which is currently explained as the "way >> to do updates" is far from ideal. It seems to add a lot of redundant queries. >> >> I therefore propose to merge the initial import and delta queries using the >> below approach: >> >> <entity name="person" query="SELECT * FROM foo >> WHERE '${dataimporter.request.clean}' != 'false' OR last_updated> >> '${dataimporter.last_index_time}'"> >> >> Using this approach when clean = true the "last_updated> >> '${dataimporter.last_index_time}" should be optimized out by any sane RDBMS. >> And if clean = false it basically triggers the delta query part to be >> evaluated. >> >> Is there any downside to this approach? Should this be added to the wiki? Lukas Kahwe Smith m...@pooteeweet.org