Gustav, DIH should give you the same results in both scenarios. The performance trade-offs depend on your data. In your case, it looks like there is a 1-to-1 or many-to-1 relationship between "item" and "member", so use the SQL Join. You'll get all of your data in one query and you'll be using your rbdms for what it does best.
But in the case there was a 1-to-many relationship between "item" and "member", and especially if each "item" has several "member" rows, you might get better performance using the child entity setup. Although by default DIH is going to do an "n+1" select on member. For every row in item, it will issue a separate query to the db. Also, DIH does not use prepared statements, so this might be a bad choice. To work around this, specify "cacheImpl='SortedMapBackedCache'" on the child entity (this is the same as using CachedSqlEntityProcessor instead of SqlEntityProcessor). Do not include a "where" clause in this child entity. Instead, specify "cacheKey='memberId'" and "cacheLookup='item.memberId'". DIH will now pull down your entire "member" table in 1 query and cache it in memory, then it can do fast hash joins against "item". But if your "member" table is too big to fit into memory, then you need to use a disk-backed cache instead of SortedMapBackedCache. For that, see https://issues.apache.org/jira/browse/SOLR-2948 and https://issues.apache.org/jira/browse/SOLR-2613 . James Dyer Ingram Content Group (615) 213-4311 -----Original Message----- From: Gustav [mailto:xbihy...@sharklasers.com] Sent: Thursday, April 25, 2013 9:17 AM To: solr-user@lucene.apache.org Subject: What is the difference between a Join Query and Embedded Entities in Solr DIH? Hello guys, i saw this thread on stackoverflow, but still not satisfied with the answers. I am trying to index data across multiple tables using Solr's Data Import Handler. The official wiki on the DIH suggests using embedded entities to link multiple tables like so: <document> <entity name="item" pk="id" query="SELECT * FROM item"> <entity name="member" pk="memberid" query="SELECT * FROM member WHERE memberid='${item.memberid}'> </entity> </entity> </document> Another way that works is: <document> <entity name="item" pk="id" query="SELECT * FROM item INNER JOIN member ON item.memberid=member.memberid"> </entity> </document> Are these two methods functionally different? Is there a performance difference? Another though would be that, if using join tables in MySQL, using the SQL query method with multiple joins could cause multiple documents to be indexed instead of one. -- View this message in context: http://lucene.472066.n3.nabble.com/What-is-the-difference-between-a-Join-Query-and-Embedded-Entities-in-Solr-DIH-tp4058923.html Sent from the Solr - User mailing list archive at Nabble.com.