On 01.06.2010, at 23:35, Chris Hostetter wrote: > > : > http://lucene.472066.n3.nabble.com/StackOverflowError-during-Delta-Import-td811053.html#a824780 > > yeah, i remember that thread -- it really seems like a driver issue, but > understandable that "fixing hte driver" is probably more out of scope then > "working arround in solr" > > : I never did find a "good" solution to that bug however I did come up with a > : workaround. I noticed if I removed my deletedPkQuery then the delta-import > : would work as expected. Obviously I still have the need to delete items out > : of the index during indexing so I wanted to subclass the DataImportHandler > : to first update all documents then I would delete all the documents that my > : deletedPkQuery would have deleted. > > i'm not a DIH expert, but have you considered the possibility of having > two > distinct "entities" declared in your config, that both refer to the same > logical entity -- one that you use fo hte delta importing, and one that > you use for hte deletedPkQuery ? > > I'm not sure if it would work, but based on another recent thread i saw, i > think it might...
to me the entire delta-query approach makes no sense, but i digress. here is a cut down version of the config i use todo full imports, deletes and updates <dataConfig> <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="${dataimporter.request.source_dsn}" batchSize="-1" user="${dataimporter.request.user}" password="${dataimporter.request.password}"/> <document> <entity name="deletedentity" query="SELECT NULL" pk="id" deletedPkQuery="SELECT e.id AS `$deleteDocById` FROM deletedentity AS e"/> <entity name="entity" query="SELECT e.id, e.status, e.name FROM entity AS e WHERE ('${dataimporter.request.clear}' != 'false' OR e.updated_at > '${dataimporter.last_index_time}')"/> </document> </dataConfig> As you can see I have parameterized the DSN information. Plus I have one query defined for the deletes and another one for both the full import and updates. if clear is set to anything but false, the where condition evalutes to true and the updated_at would be ignored in pretty much any decent RDBMS. if its false, then the updated_at is checked as per usual. regards, Lukas Kahwe Smith m...@pooteeweet.org