Welcome Michael,

On 4/12/06, Bryzek.Michael <[EMAIL PROTECTED]> wrote:
>   * Integrated support for partitioning - database tables can be
>     partitioned for scalability reasons. The most common scenario for
>     us is to partition off data for our largest customers. For
>     example, imagine a users table:
>
>      * user_id
>      * email_address
>      * site_id
>
>     where site_id refers to the customer to whom the user
>     belongs. Some sites aggregate data... i.e. one of our customers
>     may have 100 sites. When indexing, we create a separate index to
>     store only data for a given site. This precomputes one of our more
>     expensive computations for search - a filter for all users that
>     belong to a given site.

So the number of filters is equal to the number of sites?  How many
sites are there?

>   * Decoupled infrastructure - we wanted the ability to fully scale
>     our search application independent of our database application

That makes total sense... we do the same thing.

>   * High speed indexing - we initially moved data from the database to
>     Lucene via XML documents. We found that to index even a 100k
>     documents, it was much faster to move the data in CSV files
>     (smaller files, less intensive processing).

Support for indexing from CSV files as well as simple pulling from a
database is on our "todo" list: http://wiki.apache.org/solr/TaskList

> IDEAS:
>
> Looking through SOLR, I've identified the following main categories of
> change. I would love to hear comments and feedback from this group.

It would be nice to make any changes as general as possible, while
still solving your particular problem.

I think I understand many of the internal changes you outlined, but
I'm not sure yet exactly what problem you are trying to solve, and how
the multiple indicies will be used.
- How would one identify what index (or SolrCore) an update is targeted to?
- What is the relationship between the multiple indicies... do queries
ever go across multiple indicies, or would there be an "objectType"
parameter passed in as part of the query?
- What is the purpose of multiple indicies... is it so search results
are always restricted to a single site, but it's not practical to have
that many Solr instances?  It looks like the indicies are partitioned
along the lines of object type, and not site-id though.

-Yonik

Reply via email to