Hello, I have a very large dataset (> 1 Mrecords) on the RDBMS which I want my Solr application to pull data from.
Problem is that the document fields which I have to index aren't in the same table, but I have to join records with two other tables. Well, in fact they are views, but I don't think that this makes any difference. That's the data import handler that I've actually written: <?xml version="1.0"?> <dataConfig> <dataSource type="JdbcDataSource" driver="net.sourceforge.jtds.jdbc.Driver" url="jdbc:jtds:sqlserver://YSQLDEV01BLQ/YooxProcessCluster1" instance="SVCSQLDEV" /> <document name="Products"> <entity name="fd" query="SELECT * FROM clust_w_fast_dump ORDER BY endeca_id;"> <entity name="fd2" query="SELECT macrocolor_id, color_descr, gsize_descr, size_descr FROM clust_w_fast_dump2_ByMarkets WHERE endeca_id='${fd.Endeca_ID}' ORDER BY endeca_id;"/> <entity name="cpd" query="SELECT DepartmentCode, Ranking, DepartmentPriceRangeCode FROM clust_w_CatalogProductsDepartments_ByMarket WHERE endeca_id='${fd.Endeca_ID}' ORDER BY endeca_id;"/> <entity name="env" query="SELECT Environment FROM clust_w_Environment WHERE endeca_id='${fd.Endeca_ID}' ORDER BY endeca_id;"/> </entity> </document> </dataConfig> It works, but it takes 1'38" to parse 100 records: it means 1 rec/s! That means that digesting the whole dataset would take 1 Ms (=> 12 days). The problem is that for each record in "fd", Solr makes three distinct SELECT on the other three tables. Of course, this is absolutely inefficient. Is there a way to have Solr loading every record in the four tables and join them when they are already loaded in memory? TIA