You can really go either way.  Empty fields are OK.  Having lots of cores seems 
harder to maintain.  Searching against a small core will be faster than 
searching against a single core/index with all data, but you can use 'fq' to 
make things really fast.  The numbers you quote are not really big.  If you 
need to search by name across types, I would go with a single index.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: karthik c <karthik...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Thursday, March 19, 2009 7:14:08 AM
> Subject: large number of cores
> 
> Hi guys,
> 
> We need to index data of a large number of types. I was wondering if it is
> better to create separate cores for each type or add everything to one core
> with a "type" field ?
> 
> Here are some more details:
> The database: Currently we have around 200 types of data. The data for each
> type is stored in a separate mysql table. Each type has its own set of
> fields, though they all share a name field and a globally unique id field.
> The volume of data under each type varies from around 30 records to around
> 1.5 million records.
> 
> The queries: We will need to support the following kinds of queries:
>   1. search by name within a type
>   2. perform faceted filtering on all fields within a type
>   3. search by name across all types
> 
> We have currently created separate cores for each type. We also wrote a
> small tool to create cores for each type and trigger a full-import for each
> of them. I am not sure if this is right approach though. Also, the number of
> types may increase by quite a bit in the future.
> 
> My concerns with having such a large number of cores is:
> 1. Does Solr support such a large number of cores ?
> 2. Will searching across all cores be fast/effective with such a large
> number of cores ?
> 3. We ran into an issue where they were too many open file handles and had
> to increase the file open limit in the OS.
> 4. Triggering the full-import for a lot of cores at once results in some
> cores not being indexed fully. Manually re-triggering the import for these
> cores seems to fix the problem though.
> 
> My concerns about using a single core are:
> 1. The schema will now contain fields for all types. So most fields will be
> empty in most documents.
> 2. Will searching within a type be slower when compared to having the type
> in a separate core ?
> 
> Thanks,
> karthik c
> http://cantspellathing.blogspot.com

Reply via email to