Thanks Otis. Will try out using a single index. karthik c http://cantspellathing.blogspot.com
On Thu, Mar 19, 2009 at 11:24 PM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: > > You can really go either way. Empty fields are OK. Having lots of cores > seems harder to maintain. Searching against a small core will be faster > than searching against a single core/index with all data, but you can use > 'fq' to make things really fast. The numbers you quote are not really big. > If you need to search by name across types, I would go with a single index. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > ----- Original Message ---- > > From: karthik c <karthik...@gmail.com> > > To: solr-user@lucene.apache.org > > Sent: Thursday, March 19, 2009 7:14:08 AM > > Subject: large number of cores > > > > Hi guys, > > > > We need to index data of a large number of types. I was wondering if it > is > > better to create separate cores for each type or add everything to one > core > > with a "type" field ? > > > > Here are some more details: > > The database: Currently we have around 200 types of data. The data for > each > > type is stored in a separate mysql table. Each type has its own set of > > fields, though they all share a name field and a globally unique id > field. > > The volume of data under each type varies from around 30 records to > around > > 1.5 million records. > > > > The queries: We will need to support the following kinds of queries: > > 1. search by name within a type > > 2. perform faceted filtering on all fields within a type > > 3. search by name across all types > > > > We have currently created separate cores for each type. We also wrote a > > small tool to create cores for each type and trigger a full-import for > each > > of them. I am not sure if this is right approach though. Also, the number > of > > types may increase by quite a bit in the future. > > > > My concerns with having such a large number of cores is: > > 1. Does Solr support such a large number of cores ? > > 2. Will searching across all cores be fast/effective with such a large > > number of cores ? > > 3. We ran into an issue where they were too many open file handles and > had > > to increase the file open limit in the OS. > > 4. Triggering the full-import for a lot of cores at once results in some > > cores not being indexed fully. Manually re-triggering the import for > these > > cores seems to fix the problem though. > > > > My concerns about using a single core are: > > 1. The schema will now contain fields for all types. So most fields will > be > > empty in most documents. > > 2. Will searching within a type be slower when compared to having the > type > > in a separate core ? > > > > Thanks, > > karthik c > > http://cantspellathing.blogspot.com > >