On Tue, Nov 25, 2008 at 1:52 PM, Amit Nithian <[EMAIL PROTECTED]> wrote:

>
> I like the concept of having multiple entity blocks for clarity but why
> wouldn't I have (for DB efficiency), the following as one entity's SQL
> statement "select * from X,Y where x.id=y.xid" and have two fields
> pointing
> at X and Y columns?


You can certainly do that. However, it is a problem when you need field X or
Y to be multi-valued. You'd get repeated rows for that query and
DataImportHandler will have no way to figure out what to put where. In the
nested entities approach, DataImportHandler multiple values will come from a
nested entity which can be very easily represented as a List. If you do not
have multi-valued fields then you can go for that approach.


> My main question though is how the
> CachedSQLEntityProcessor helps in this case for I want to use the multiple
> entity blocks for cleanliness. If I have 500,000 X records, how many SQL
> queries in the second entity block (y) would get executed, 500000?


For each row fetched from the parent entity, the query for its nested entity
is executed after replacing the variables with known values. When the nested
entity has few records in the database, it is more efficient to use
CachedSqlEntityProcessor which executes the query only once and keeps all
the returned rows in memory. After that for each row returned by parent
entity, the cached entity needs to do a lookup in the cache which is quite
fast. Since all rows are stored in-memory, you trade memory for number of
queries to the db when you use CachedSqlEntityProcessor.

http://wiki.apache.org/solr/DataImportHandler#head-4465e39677ec06e4b14fd6a574434bac6e4d01e1


-- 
Regards,
Shalin Shekhar Mangar.

Reply via email to