Another feature missing in DIH is ability to pass parameters into your queries. If one could pass a named or positional parameter for an entity query, it will give them lot of freedom to optimize their delta or full load queries. One can even get creative with entity and delta queries that can take ranges and pass timestamps that depend on external sources.
My 2 cents since we are on the topic. Thanks, Paul Dhaliwal On Thu, Sep 16, 2010 at 10:55 PM, Lukas Kahwe Smith <m...@pooteeweet.org>wrote: > > On 17.09.2010, at 05:40, Lance Norskog wrote: > > > Database optimization is not like program optimization- it is wildly > unpredictable. > > well an RDBMS that cannot handle true != false as a NOP during the planning > stage doesn't even do basics in optimization. > > But this approach is so much more efficient than the approach of reading > out the id's of the changed rows in any RDBMS. Furthermore it gets rid of an > essentially redundant query definition which improves readability and > maintainability. > > > What bugs me about the delta approach is using the last time DIH ran, > rather than a timestamp from the DB. Oh well. Also, with SOLR-1499 you can > query Solr directly to see what it has. > > Yeah, it would be nice to be able to tell DIH to store the timestamp in > some table. Aka there should be a way to run arbitrary SQL before and after > and the to be stored new last update timestamp should be available. > > > > > Lukas Kahwe Smith wrote: > >> Hi, > >> > >> I think i have mentioned this approach before on this list, but I really > think that the deltaQuery approach which is currently explained as the "way > to do updates" is far from ideal. It seems to add a lot of redundant > queries. > >> > >> I therefore propose to merge the initial import and delta queries using > the below approach: > >> > >> <entity name="person" query="SELECT * FROM foo > >> WHERE '${dataimporter.request.clean}' != 'false' OR > last_updated> '${dataimporter.last_index_time}'"> > >> > >> Using this approach when clean = true the "last_updated> > '${dataimporter.last_index_time}" should be optimized out by any sane > RDBMS. And if clean = false it basically triggers the delta query part to be > evaluated. > >> > >> Is there any downside to this approach? Should this be added to the > wiki? > > Lukas Kahwe Smith > m...@pooteeweet.org > > > >