I am starting to look at Solr's Data Import Handler framework and am quite impressed with it so far. My question is in trying to reduce the number of SQL queries issued to the database and saw this entity processor.
In the following example: <entity name="x" query="select * from x"> <entity name="y" query="select * from y where xid=${x.id}" processor="CachedSqlEntityProcessor"> </entity> <entity> I like the concept of having multiple entity blocks for clarity but why wouldn't I have (for DB efficiency), the following as one entity's SQL statement "select * from X,Y where x.id=y.xid" and have two fields pointing at X and Y columns? My main question though is how the CachedSQLEntityProcessor helps in this case for I want to use the multiple entity blocks for cleanliness. If I have 500,000 X records, how many SQL queries in the second entity block (y) would get executed, 500000? If there is any more detailed information about the number of queries executed in different circumstances, memory overhead or way that the data is brought from the database into Java it would be much appreciated for it's important for my application. Thanks in advance! Amit