On Tue, Nov 25, 2008 at 1:52 PM, Amit Nithian <[EMAIL PROTECTED]> wrote:
> > I like the concept of having multiple entity blocks for clarity but why > wouldn't I have (for DB efficiency), the following as one entity's SQL > statement "select * from X,Y where x.id=y.xid" and have two fields > pointing > at X and Y columns? You can certainly do that. However, it is a problem when you need field X or Y to be multi-valued. You'd get repeated rows for that query and DataImportHandler will have no way to figure out what to put where. In the nested entities approach, DataImportHandler multiple values will come from a nested entity which can be very easily represented as a List. If you do not have multi-valued fields then you can go for that approach. > My main question though is how the > CachedSQLEntityProcessor helps in this case for I want to use the multiple > entity blocks for cleanliness. If I have 500,000 X records, how many SQL > queries in the second entity block (y) would get executed, 500000? For each row fetched from the parent entity, the query for its nested entity is executed after replacing the variables with known values. When the nested entity has few records in the database, it is more efficient to use CachedSqlEntityProcessor which executes the query only once and keeps all the returned rows in memory. After that for each row returned by parent entity, the cached entity needs to do a lookup in the cache which is quite fast. Since all rows are stored in-memory, you trade memory for number of queries to the db when you use CachedSqlEntityProcessor. http://wiki.apache.org/solr/DataImportHandler#head-4465e39677ec06e4b14fd6a574434bac6e4d01e1 -- Regards, Shalin Shekhar Mangar.