On 7/23/2015 10:55 AM, cbuxbaum wrote: > Say we have 1000000 party records. Then the child SQL will be run 1000000 > times (once for each party record). Isn't there a way to just run the child > SQL on all of the party records at once with a join, using a GROUP BY and > ORDER BY on the PARTY_ID? Then the results from that query could easily be > placed in SOLR according to the primary key (party_id). Is there some part > of the Data Import Handler that operates that way?
Using well-crafted SQL JOIN is almost always going to be better for dataimport than nested entities. The heavy lifting is done by the database server, using code that's extremely well-optimized for that kind of lifting. Doing what you describe with a parent entity and one nested entity (that is not cached) will result in 1000001 total SQL queries. A million SQL queries, no matter how fast each one is, will be slow. If you can do everything in a single SQL query with JOIN, then Solr will make exactly one SQL query to the server for a full-import. For my own dataimport, I use a view that was defined on the mysql server by the dbadmin. The view does all the JOINs we require. Solr's dataimport handler doesn't have any intelligence to do the join locally. It would be cool if it did, but somebody would have to write the code to teach it how. Because the DB server itself can already do JOINs, and it can do them VERY well, there's really no reason to teach it to Solr. Thanks, Shawn