Hi guys, We need to index data of a large number of types. I was wondering if it is better to create separate cores for each type or add everything to one core with a "type" field ?
Here are some more details: The database: Currently we have around 200 types of data. The data for each type is stored in a separate mysql table. Each type has its own set of fields, though they all share a name field and a globally unique id field. The volume of data under each type varies from around 30 records to around 1.5 million records. The queries: We will need to support the following kinds of queries: 1. search by name within a type 2. perform faceted filtering on all fields within a type 3. search by name across all types We have currently created separate cores for each type. We also wrote a small tool to create cores for each type and trigger a full-import for each of them. I am not sure if this is right approach though. Also, the number of types may increase by quite a bit in the future. My concerns with having such a large number of cores is: 1. Does Solr support such a large number of cores ? 2. Will searching across all cores be fast/effective with such a large number of cores ? 3. We ran into an issue where they were too many open file handles and had to increase the file open limit in the OS. 4. Triggering the full-import for a lot of cores at once results in some cores not being indexed fully. Manually re-triggering the import for these cores seems to fix the problem though. My concerns about using a single core are: 1. The schema will now contain fields for all types. So most fields will be empty in most documents. 2. Will searching within a type be slower when compared to having the type in a separate core ? Thanks, karthik c http://cantspellathing.blogspot.com