Hi guys,

We need to index data of a large number of types. I was wondering if it is
better to create separate cores for each type or add everything to one core
with a "type" field ?

Here are some more details:
The database: Currently we have around 200 types of data. The data for each
type is stored in a separate mysql table. Each type has its own set of
fields, though they all share a name field and a globally unique id field.
The volume of data under each type varies from around 30 records to around
1.5 million records.

The queries: We will need to support the following kinds of queries:
  1. search by name within a type
  2. perform faceted filtering on all fields within a type
  3. search by name across all types

We have currently created separate cores for each type. We also wrote a
small tool to create cores for each type and trigger a full-import for each
of them. I am not sure if this is right approach though. Also, the number of
types may increase by quite a bit in the future.

My concerns with having such a large number of cores is:
1. Does Solr support such a large number of cores ?
2. Will searching across all cores be fast/effective with such a large
number of cores ?
3. We ran into an issue where they were too many open file handles and had
to increase the file open limit in the OS.
4. Triggering the full-import for a lot of cores at once results in some
cores not being indexed fully. Manually re-triggering the import for these
cores seems to fix the problem though.

My concerns about using a single core are:
1. The schema will now contain fields for all types. So most fields will be
empty in most documents.
2. Will searching within a type be slower when compared to having the type
in a separate core ?

Thanks,
karthik c
http://cantspellathing.blogspot.com

Reply via email to