Makes sense, nut probably overkill for my requirements. I wasn't really talking 275*200000, more likely the total would be something like four million documents. I was under the assumption that a single machine, or a simple distributed index, should be able to handle that, is that wrong?
-ds On Wed, Mar 26, 2008 at 2:05 PM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Dietrich, > > I don't think there are established practices in the open (yet). You could > design your application with a site(s)->shard mapping and then, knowing which > sites are involved in the query, search only the relevant shards. This will > be efficient, but it would require careful management on your part. > > Putting everything in a single index would just not work with "normal" > machines, I think. > > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > ----- Original Message ---- > From: Dietrich <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > > > Sent: Wednesday, March 26, 2008 10:47:55 AM > Subject: Re: How to index multiple sites with option of combining results in > search > > I understand that, and that makes sense. But, coming back to the > orginal question: > > > When performing searches, > > > I need to be able to search against any combination of sites. > > > Does anybody have suggestions what the best practice for a scenario > > > like that would be, considering both indexing and querying > > > performance? Put everything into one index and filter when performing > > > the queries, or creating a separate index for each one and combining > > > results when performing the query? > > Are there any established best practices for that? > > -ds > > On Tue, Mar 25, 2008 at 11:25 PM, Otis Gospodnetic > <[EMAIL PROTECTED]> wrote: > > Dietrich, > > > > I pointed to SOLR-303 because 275 * 200,000 looks like a too big of a > number for a single machine to handle. > > > > > > Otis > > -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > ----- Original Message ---- > > From: Dietrich <[EMAIL PROTECTED]> > > To: solr-user@lucene.apache.org > > > > > > Sent: Tuesday, March 25, 2008 7:00:17 PM > > Subject: Re: How to index multiple sites with option of combining results > in search > > > > On Tue, Mar 25, 2008 at 6:12 PM, Otis Gospodnetic > > <[EMAIL PROTECTED]> wrote: > > > Sounds like SOLR-303 is a must for you. > > Why? I see the benefits of using a distributed architecture in > > general, but why do you recommend it specifically for this scenario. > > > Have you looked at Nutch? > > I don't want to (or need to) use a crawler. I am using a crawler-base > > system now, and it does not offer the flexibility I need when it comes > > to custom schemes and faceting. > > > > > > Otis > > > -- > > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > > > > > > > ----- Original Message ---- > > > From: Dietrich <[EMAIL PROTECTED]> > > > To: solr-user@lucene.apache.org > > > Sent: Tuesday, March 25, 2008 4:15:23 PM > > > Subject: How to index multiple sites with option of combining results > in search > > > > > > I am planning to index 275+ different sites with Solr, each of which > > > might have anywhere up to 200 000 documents. When performing searches, > > > I need to be able to search against any combination of sites. > > > Does anybody have suggestions what the best practice for a scenario > > > like that would be, considering both indexing and querying > > > performance? Put everything into one index and filter when performing > > > the queries, or creating a separate index for each one and combining > > > results when performing the query? > > > > > > > > > > > > > > > > > > > > > > > >