Re: Indexing off of the production servers

2013-05-06 Thread Erick Erickson
Nope. There is no replication, as in replication of the indexed document in the normal flow. The _raw_ document is forwarded to all replicas and upon return from the replicas, the raw document has been written to each individual transaction log on each replica. "replication" implies the _indexed_ f

RE: Indexing off of the production servers

2013-05-06 Thread David Parks
Dave -Original Message- From: Furkan KAMACI [mailto:furkankam...@gmail.com] Sent: Monday, May 06, 2013 9:44 PM To: solr-user@lucene.apache.org Subject: Re: Indexing off of the production servers Hi Erick; Thanks for your answer. I have read that at somewhere: I believe "redirect"

Re: Indexing off of the production servers

2013-05-06 Thread Furkan KAMACI
Hi Erick; Thanks for your answer. I have read that at somewhere: I believe "redirect" from replica to leader would happen only at index time, so a doc first gets indexed to leader and from there it's replicated to non-leader shards. Is that true? I want to make clear the things in my mind otherw

Re: Indexing off of the production servers

2013-05-06 Thread Shawn Heisey
On 5/6/2013 7:55 AM, Andre Bois-Crettez wrote: > Excellent idea ! > And it is possible to use collection aliasing with the CREATEALIAS to > make this transparent for the query side. > > ex. with 2 collections named : > collection_1 > collection_2 > > /collections?action=CREATEALIAS&name=collectio

Re: Indexing off of the production servers

2013-05-06 Thread Andre Bois-Crettez
ic might be right in that it's not worth the effort if there isn't some existing strategy. Dave -Original Message- From: Furkan KAMACI [mailto:furkankam...@gmail.com] Sent: Monday, May 06, 2013 7:06 PM To: solr-user@lucene.apache.org Subject: Re: Indexing off of the product

Re: Indexing off of the production servers

2013-05-06 Thread Erick Erickson
---Original Message- > From: Furkan KAMACI [mailto:furkankam...@gmail.com] > Sent: Monday, May 06, 2013 7:06 PM > To: solr-user@lucene.apache.org > Subject: Re: Indexing off of the production servers > > Hi Erick; > > I think that even if you use Map/Reduce you will not par

Re: Indexing off of the production servers

2013-05-06 Thread Erick Erickson
ought maybe something like that >> followed into solr cloud. Eric might be right in that it's not worth the >> effort if there isn't some existing strategy. >> >> Dave >> >> >> -Original Message- >> From: Furkan KAMACI [mailto:furkan

Re: Indexing off of the production servers

2013-05-06 Thread Upayavira
omething like that > followed into solr cloud. Eric might be right in that it's not worth the > effort if there isn't some existing strategy. > > Dave > > > -Original Message- > From: Furkan KAMACI [mailto:furkankam...@gmail.com] > Sent: Monday, May 06,

Re: Indexing off of the production servers

2013-05-06 Thread Furkan KAMACI
ke that > followed into solr cloud. Eric might be right in that it's not worth the > effort if there isn't some existing strategy. > > Dave > > > -Original Message- > From: Furkan KAMACI [mailto:furkankam...@gmail.com] > Sent: Monday, May 06, 2013 7:06 PM

RE: Indexing off of the production servers

2013-05-06 Thread David Parks
the effort if there isn't some existing strategy. Dave -Original Message- From: Furkan KAMACI [mailto:furkankam...@gmail.com] Sent: Monday, May 06, 2013 7:06 PM To: solr-user@lucene.apache.org Subject: Re: Indexing off of the production servers Hi Erick; I think that even if you use Map/

Re: Indexing off of the production servers

2013-05-06 Thread Furkan KAMACI
Hi Erick; I think that even if you use Map/Reduce you will not parallelize you indexing because indexing will parallelize as much as how many leaders you have at your SolrCloud, isn't it? 2013/5/6 Erick Erickson > The only problem with using Hadoop (or whatever) is that you > need to be sure th

Re: Indexing off of the production servers

2013-05-06 Thread Erick Erickson
The only problem with using Hadoop (or whatever) is that you need to be sure that documents end up on the same shard, which means that you have to use the same routing mechanism that SolrCloud uses. The custom doc routing may help here My very first question, though, would be whether this is n

Re: Indexing off of the production servers

2013-05-06 Thread Furkan KAMACI
1-2) Your aim for using Hadoop is probably Map/Reduce jobs. When you use Map/Reduce jobs you split your workload, process it, and then reduce step takes into account. Let me explain you new SolrCloud architecture. You start your SolrCluoud with a numShards parameter. Let's assume that you have 5 sh