Welcome Michael, On 4/12/06, Bryzek.Michael <[EMAIL PROTECTED]> wrote: > * Integrated support for partitioning - database tables can be > partitioned for scalability reasons. The most common scenario for > us is to partition off data for our largest customers. For > example, imagine a users table: > > * user_id > * email_address > * site_id > > where site_id refers to the customer to whom the user > belongs. Some sites aggregate data... i.e. one of our customers > may have 100 sites. When indexing, we create a separate index to > store only data for a given site. This precomputes one of our more > expensive computations for search - a filter for all users that > belong to a given site.
So the number of filters is equal to the number of sites? How many sites are there? > * Decoupled infrastructure - we wanted the ability to fully scale > our search application independent of our database application That makes total sense... we do the same thing. > * High speed indexing - we initially moved data from the database to > Lucene via XML documents. We found that to index even a 100k > documents, it was much faster to move the data in CSV files > (smaller files, less intensive processing). Support for indexing from CSV files as well as simple pulling from a database is on our "todo" list: http://wiki.apache.org/solr/TaskList > IDEAS: > > Looking through SOLR, I've identified the following main categories of > change. I would love to hear comments and feedback from this group. It would be nice to make any changes as general as possible, while still solving your particular problem. I think I understand many of the internal changes you outlined, but I'm not sure yet exactly what problem you are trying to solve, and how the multiple indicies will be used. - How would one identify what index (or SolrCore) an update is targeted to? - What is the relationship between the multiple indicies... do queries ever go across multiple indicies, or would there be an "objectType" parameter passed in as part of the query? - What is the purpose of multiple indicies... is it so search results are always restricted to a single site, but it's not practical to have that many Solr instances? It looks like the indicies are partitioned along the lines of object type, and not site-id though. -Yonik