Substring and Case In sensitive Search

2014-08-19 Thread Nishanth S
Hi,

I am  very new to solr.How can I allow solr search on a string field case
insensitive and substring?.

Thanks,
Nishanth


Pointing solr cloud to multiple index directories.

2014-12-22 Thread Nishanth S
Hey folks,

I have 5 drives in my machine which are mounted to  5 different
locations(/d/1 ,/d/2,/d/3).How can I point solr to write to all these
directories?.


Thanks,
Nishanth


Pointing-solr-cloud-to-multiple-index-directories

2014-12-30 Thread Nishanth S
Thanks Eric and Shawn.Here is why I am trying to do so.I may be missing
something here since this is relatively new to me.Appreciate your help and
time.* I  will elaborate on what I am trying to acheive here.*


I am trying to install solr cloud and my machines typically have 5 drives
which are mounted to 5 different locations(/d/1 ,/d/2,/d/3..).Each of these
drives are 3 Tb in size which should give a total of 15 Tb for solr index
storage.How ever when I specify the solr dir in solrconfig.xml ,I can point
only to one specific directory(say I specify /d/1 I am using only 3 Tb of
storage).
What would be the preferd approach in this case?

1.Create symlink
2.Since I will r=2 I can specify the datadir and point to different drive
in the machine
3.use lvm?

Thanks,
Nishanth


Re: Pointing-solr-cloud-to-multiple-index-directories

2014-12-31 Thread Nishanth S
Hi folks,

Can you help?

-Nishanth

On Tue, Dec 30, 2014 at 11:20 PM, Nishanth S 
wrote:

> Thanks Eric and Shawn.Here is why I am trying to do so.I may be missing
> something here since this is relatively new to me.Appreciate your help and
> time.* I  will elaborate on what I am trying to acheive here.*
>
>
> I am trying to install solr cloud and my machines typically have 5 drives
> which are mounted to 5 different locations(/d/1 ,/d/2,/d/3..).Each of these
> drives are 3 Tb in size which should give a total of 15 Tb for solr index
> storage.How ever when I specify the solr dir in solrconfig.xml ,I can point
> only to one specific directory(say I specify /d/1 I am using only 3 Tb of
> storage).
> What would be the preferd approach in this case?
>
> 1.Create symlink
> 2.Since I will r=2 I can specify the datadir and point to different drive
> in the machine
> 3.use lvm?
>
> Thanks,
> Nishanth
>
>


Running Multiple Solr Instances

2015-01-05 Thread Nishanth S
Hi folks,

I  am running  multiple solr instances  (Solr 4.10.3 on tomcat 8).There are
3 physical machines and  I have 4 solr instances running  on each machine
on ports  8080,8081,8082 and 8083.The set up is well up to this point.Now I
want to point each of these instance to a different  index directories.The
drives in the machines are mounted as d/1,d/2,d/3 ,d/4 etc.Now if I define
/d/1 as  the solr home all solr index directories  are created in /d/1
where as the other drives remain un used.So how do I configure solr to
 make use of all the drives so that I can  get maximum storage for solr.I
would really appreciate any help in this regard.

Thanks,
Nishanth


Re: Running Multiple Solr Instances

2015-01-06 Thread Nishanth S
Thanks a lot guys.As a begineer these are very helpful fo rme.

Thanks,
Nishanth

On Tue, Jan 6, 2015 at 5:12 AM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> I would do one of either:
>
> 1. Set a different Solr home for each instance. I'd use the
> -Dsolr.solr.home=/d/2 command line switch when launching Solr to do so.
>
> 2. RAID 10 the drives. If you expect the Solr instances to get uneven
> traffic, pooling the drives will allow a given Solr instance to share the
> capacity of all of them.
>
>
> On 1/5/15 23:31, Nishanth S wrote:
>
>> Hi folks,
>>
>> I  am running  multiple solr instances  (Solr 4.10.3 on tomcat 8).There
>> are
>> 3 physical machines and  I have 4 solr instances running  on each machine
>> on ports  8080,8081,8082 and 8083.The set up is well up to this point.Now
>> I
>> want to point each of these instance to a different  index directories.The
>> drives in the machines are mounted as d/1,d/2,d/3 ,d/4 etc.Now if I define
>> /d/1 as  the solr home all solr index directories  are created in /d/1
>> where as the other drives remain un used.So how do I configure solr to
>>   make use of all the drives so that I can  get maximum storage for solr.I
>> would really appreciate any help in this regard.
>>
>> Thanks,
>> Nishanth
>>
>>
>


Re: Running Multiple Solr Instances

2015-01-07 Thread Nishanth S
Hey Ganesh,

This was not for clustering.I do not think you would need clustering with
solr cloud.With solr cloud when  you create a collection from  scratch  it
creates the data directories under solr home.Now if your drives are mounted
as (/d/1,/d/2  etc) you would want to use all the storage available for
indexes apart from any system resources.So we just created  the collection
and then shut down solr and used symlinks to point to all the mounts.

Thanks,
Nishanth

On Tue, Jan 6, 2015 at 11:29 AM,  wrote:

>  Nishanth,
>
> 1.   I understand you are implementing clustering for the web apps
> which is running the same application on multiple different instances on
> one or more machines.
>
> 2.   If each of your web apps start pointing to the different index
> directory, how it will switch to the next web App with different index if
> search term is not found in the first index directory?
>
> 3.   Or will the web app collect the result sequentially from all the
> Index directories and will present the resulting collection to the user?
>
>
>
> Please share your thoughts
>
>
>
> Thanks
>
> G
>
>
>
>
>
>
>
> -Original Message-
> From: Nishanth S [mailto:nishanth.2...@gmail.com]
> Sent: Tuesday, January 06, 2015 12:17 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Running Multiple Solr Instances
>
>
>
> Thanks a lot guys.As a begineer these are very helpful fo rme.
>
>
>
> Thanks,
>
> Nishanth
>
>
>
> On Tue, Jan 6, 2015 at 5:12 AM, Michael Della Bitta <
> michael.della.bi...@appinions.com> wrote:
>
>
>
> > I would do one of either:
>
> >
>
> > 1. Set a different Solr home for each instance. I'd use the
>
> > -Dsolr.solr.home=/d/2 command line switch when launching Solr to do so.
>
> >
>
> > 2. RAID 10 the drives. If you expect the Solr instances to get uneven
>
> > traffic, pooling the drives will allow a given Solr instance to share
>
> > the capacity of all of them.
>
> >
>
> >
>
> > On 1/5/15 23:31, Nishanth S wrote:
>
> >
>
> >> Hi folks,
>
> >>
>
> >> I  am running  multiple solr instances  (Solr 4.10.3 on tomcat
>
> >> 8).There are
>
> >> 3 physical machines and  I have 4 solr instances running  on each
>
> >> machine on ports  8080,8081,8082 and 8083.The set up is well up to
>
> >> this point.Now I want to point each of these instance to a different
>
> >> index directories.The drives in the machines are mounted as
>
> >> d/1,d/2,d/3 ,d/4 etc.Now if I define
>
> >> /d/1 as  the solr home all solr index directories  are created in
>
> >> /d/1 where as the other drives remain un used.So how do I configure
> solr to
>
> >>   make use of all the drives so that I can  get maximum storage for
>
> >> solr.I would really appreciate any help in this regard.
>
> >>
>
> >> Thanks,
>
> >> Nishanth
>
> >>
>
> >>
>
> >
>


Determining the Number of Solr Shards

2015-01-07 Thread Nishanth S
Hi All,

I  am working on coming up with a solr architecture layout  for my use
case.We are a very write heavy application with  no down time tolerance and
 have low SLAs on reads when compared with writes.I am looking at around
12K tps with average index size of solr document in the range of 6kB.I
would like to go with 3 replicas for that extra fault tolerance and  trying
to identify the number  of shards.The machines are monsterous and have
 around 100 GB of RAM and  more than 24 cores on each.Is there a way to
come at the number of  desired shards in this case.Any pointers would be
helpful.


Thanks,
Nishanth


Re: Determining the Number of Solr Shards

2015-01-07 Thread Nishanth S
Thanks Shawn and Walter.Yes those are 12,000 writes/second.Reads  for the
moment would be in the 1000 reads/second. Guess finding out the right
number  of  shards would be my starting point.

Thanks,
Nishanth


On Wed, Jan 7, 2015 at 6:28 PM, Walter Underwood 
wrote:

> This is described as “write heavy”, so I think that is 12,000
> writes/second, not queries.
>
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/
>
>
> On Jan 7, 2015, at 5:16 PM, Shawn Heisey  wrote:
>
> > On 1/7/2015 3:29 PM, Nishanth S wrote:
> >> I  am working on coming up with a solr architecture layout  for my use
> >> case.We are a very write heavy application with  no down time tolerance
> and
> >> have low SLAs on reads when compared with writes.I am looking at around
> >> 12K tps with average index size of solr document in the range of 6kB.I
> >> would like to go with 3 replicas for that extra fault tolerance and
> trying
> >> to identify the number  of shards.The machines are monsterous and have
> >> around 100 GB of RAM and  more than 24 cores on each.Is there a way to
> >> come at the number of  desired shards in this case.Any pointers would be
> >> helpful.
> >
> > This is one of those questions that's nearly impossible to answer
> > without field trials that have a production load on a production index.
> > Minor changes to either config or schema can have a major impact on the
> > query load Solr will support.
> >
> >
> https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> >
> > A query load of 12000 queries per second is VERY high.  That is likely
> > to require a **LOT** of hardware, because you're going to need a lot of
> > replicas.  Because each server will be handling quite a lot of
> > simultaneous queries, the best results will come from having only one
> > replica (solr core) per server.
> >
> > Generally you'll get better results for a high query load if you don't
> > shard your index, but depending on how many docs you have, you might
> > want to shard.  You haven't said how many docs you have.
> >
> > The key to excellent performance with Solr is to make sure that the
> > system never hits the disk to read index data -- for 12000 queries per
> > second, the index must be fully cached in RAM.  If Solr must go to the
> > actual disk, query performance will drop significantly.
> >
> > Thanks,
> > Shawn
> >
>
>


Re: Determining the Number of Solr Shards

2015-01-08 Thread Nishanth S
Thanks guys for your inputs I would be looking at around 100 Tb of total
 index size  with 5100 million documents  for  a period of  30 days before
we purge the  indexes.I had estimated it slightly on the  higher side of
things but that's where I feel we would be.

Thanks,
Nishanth

On Wed, Jan 7, 2015 at 7:50 PM, Shawn Heisey  wrote:

> On 1/7/2015 7:14 PM, Nishanth S wrote:
> > Thanks Shawn and Walter.Yes those are 12,000 writes/second.Reads  for the
> > moment would be in the 1000 reads/second. Guess finding out the right
> > number  of  shards would be my starting point.
>
> I don't think indexing 12000 docs per second would be too much for Solr
> to handle, as long as you architect the indexing application properly.
> You would likely need to have several indexing threads or processes that
> index in parallel.  Solr is fully thread-safe and can handle several
> indexing requests at the same time.  If the indexing application is
> single-threaded, indexing speed will not reach its full potential.
>
> Be aware that indexing at the same time as querying will reduce the
> number of queries per second that you can handle.  In an environment
> where both reads and writes are heavy like you have described, more
> shards and/or more replicas might be required.
>
> For the query side ... even 1000 queries per second is a fairly heavy
> query rate.  You're likely to need at least a few replicas, possibly
> several, to handle that.  The type and complexity of the queries you do
> will make a big difference as well.  To handle that query level, I would
> still recommend only running one shard replica on each server.  If you
> have three shards and three replicas, that means 9 Solr servers.
>
> How many documents will you have in total?  You said they are about 6KB
> each ... but depending on the fieldType definitions (and the analysis
> chain for TextField types), 6KB might be very large or fairly small.
>
> Do you have any idea how large the Solr index will be with all your
> documents?  Estimating that will require indexing a significant
> percentage of your documents with the actual schema and config that you
> will use in production.
>
> If I know how many documents you have, how large the full index will be,
> and can see an example of the more complex queries you will do, I can
> make *preliminary* guesses about the number of shards you might need.  I
> do have to warn you that it will only be a guess.  You'll have to
> experiment to see what works best.
>
> Thanks,
> Shawn
>
>


Connection Reset Errors with Solr 4.4

2015-01-20 Thread Nishanth S
Hello All,

We are running solr cloud 4.4 with 30 shards and 3 replicas with real time
indexing on rhel 6.5.The indexing rate is 3K Tps now.We are running into an
issue with replicas going into recovery mode  due to connection reset
errors.Soft commit time is 2 min and auto commit is set as 5 minutes.I have
seen that replicas do a full index recovery which takes a long
time(days).Below is the error trace that  I see.I would really appreciate
any help in this case.

g.apache.solr.client.solrj.SolrServerException: IOException occured when
talking to server at: http://xxx:8083/solr/log_pn_shard20_replica2
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:435)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:401)
at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:375)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:196)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:166)
at
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:90)
at
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:281)
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:92)
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62)
at
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254)
at
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)
at
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
at
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
at
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300)
at
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127)
at
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:717)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:522)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365)
... 9 more


Thanks,
Nishanth


Re: Connection Reset Errors with Solr 4.4

2015-01-20 Thread Nishanth S
Thank you Mike.Sure enough,we are running into the same issue you
mentoined.Is there a quick fix for this other than the patch.I do not see
the tlogs getting replayed at all.It is doing a full index recovery from
the leader and our index size is around 200G.Would lowering the autocommit
settings help(where the replica would go for a tlog replay as the tlogs I
see are not huge).

Thanks,
Nishanth

On Tue, Jan 20, 2015 at 10:46 AM, Mike Drob  wrote:

> Are we sure this isn't SOLR-6931?
>
> On Tue, Jan 20, 2015 at 11:39 AM, Nishanth S 
> wrote:
>
> > Hello All,
> >
> > We are running solr cloud 4.4 with 30 shards and 3 replicas with real
> time
> > indexing on rhel 6.5.The indexing rate is 3K Tps now.We are running into
> an
> > issue with replicas going into recovery mode  due to connection reset
> > errors.Soft commit time is 2 min and auto commit is set as 5 minutes.I
> have
> > seen that replicas do a full index recovery which takes a long
> > time(days).Below is the error trace that  I see.I would really appreciate
> > any help in this case.
> >
> > g.apache.solr.client.solrj.SolrServerException: IOException occured when
> > talking to server at: http://xxx:8083/solr/log_pn_shard20_replica2
> > at
> >
> >
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:435)
> > at
> >
> >
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
> > at
> >
> >
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:401)
> > at
> >
> >
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:375)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> > at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:745)
> > Caused by: java.net.SocketException: Connection reset
> > at java.net.SocketInputStream.read(SocketInputStream.java:196)
> > at java.net.SocketInputStream.read(SocketInputStream.java:122)
> > at
> >
> >
> org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:166)
> > at
> >
> >
> org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:90)
> > at
> >
> >
> org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:281)
> > at
> >
> >
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:92)
> > at
> >
> >
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62)
> > at
> >
> >
> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254)
> > at
> >
> >
> org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)
> > at
> >
> >
> org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
> > at
> >
> >
> org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
> > at
> >
> >
> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300)
> > at
> >
> >
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127)
> > at
> >
> >
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:717)
> > at
> >
> >
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:522)
> > at
> >
> >
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
> > at
> >
> >
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
> > at
> >
> >
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
> > at
> >
> >
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365)
> > ... 9 more
> >
> >
> > Thanks,
> > Nishanth
> >
>


Solr Recovery process

2015-01-21 Thread Nishanth S
Hello Everyone,

I am hitting a few issues with solr replicas going into recovery and then
doing a full index copy.I am trying to understand the solr recovery
process.I have read a few blogs  on this and saw  that when leader notifies
a replica to  recover(in my case it is due to connection resets) it will
try to do a peer sync first and  if the missed updates are more than 100 it
will do a full index copy from the leader.I am trying to understand what
peer sync is and where does tlog come into picture.Are tlogs replayed only
during server restart?.Can some one  help me with this?

Thanks,
Nishanth


Re: Solr Recovery process

2015-01-21 Thread Nishanth S
Thank you Shalin.So in a system where the indexing rate is more than 5K TPS
or so the replica  will never be able to recover   through peer sync
process.In  my case I have mostly seen  step 3 where a full copy happens
and  if the index size is huge it takes a very long time for replicas to
recover.Is there a way we can  configure the  number of missed updates for
peer sync.

Thanks,
Nishanth

On Wed, Jan 21, 2015 at 4:47 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Hi Nishanth,
>
> The recovery happens as follows:
>
> 1. PeerSync is attempted first. If the number of new updates on leader is
> less than 100 then the missing documents are fetched directly and indexed
> locally. The tlog tells us the last 100 updates very quickly. Other uses of
> the tlog are for durability of updates and of course, startup recovery.
> 2. If the above step fails then replication recovery is attempted. A hard
> commit is called on the leader and then the leader is polled for the latest
> index version and generation. If the leader's version and generation are
> greater than local index's version/generation then the difference of the
> index files between leader and replica are fetched and installed.
> 3. If the above fails (because leader's version/generation is somehow equal
> or more than local) then a full index recovery happens and the entire index
> from the leader is fetched and installed locally.
>
> There are some other details involved in this process too but probably not
> worth going into here.
>
> On Wed, Jan 21, 2015 at 5:13 PM, Nishanth S 
> wrote:
>
> > Hello Everyone,
> >
> > I am hitting a few issues with solr replicas going into recovery and then
> > doing a full index copy.I am trying to understand the solr recovery
> > process.I have read a few blogs  on this and saw  that when leader
> notifies
> > a replica to  recover(in my case it is due to connection resets) it will
> > try to do a peer sync first and  if the missed updates are more than 100
> it
> > will do a full index copy from the leader.I am trying to understand what
> > peer sync is and where does tlog come into picture.Are tlogs replayed
> only
> > during server restart?.Can some one  help me with this?
> >
> > Thanks,
> > Nishanth
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: Replicas fall into recovery mode right after update

2015-01-23 Thread Nishanth S
Can you tell what version of solr you are using and what causes your
replicas to go into recovery.

On Fri, Jan 23, 2015 at 8:40 PM, gouthsmsimhadri 
wrote:

> I'm working with a cluster of solr-cloud servers at a configration of 10
> shards and 4 replicas on each shard in stress environment.
> Planned production configuration is 10 shards and 15 replicas on each
> shard.
>
> Current commit settings are as follows
>
> 
> 50
> 18
> 
>
> 
> 200
> 18
> false
> 
>
>
> The application requires to index approximately 90 Million docs which is
> indexed in two ways
> a)  Full indexing. It takes 4 hours to index 90 Million docs and the
> rate of
> docs coming to the searcher is around 6000 per second
> b)  Incremental indexing. It takes an hour to index delta changes.
> Roughly
> there are 3 million changes and rate of docs coming to the searchers is
> 2500
> per second
>
> I use two collections for example collection1 and collection2
> Each collection has system settings at 12 GB of available RAM and quad core
> Intel(R) Xeon(R) CPU X5570  @ 2.93GHz
>
> Full indexing is always performed on a collection which is not serving live
> traffic and Once job is completed we swap collection so the collection with
> latest data serves traffic and other is inactive.
>
> The other mode of incremental indexing  is performed  always on the
> collection which is serving live traffic.
>
> The problem is in about 10 minutes of indexing is triggered, the replicas
> goes in to recovery mode. This happens on all the shards. In about 20
> minutes or more rest of replicas start to fall into recovery mode. In about
> half an hour all replicas except the leader is in recovery mode.
>
> I cannot throttle the indexing load as that will increase our overall
> indexing time. So to overcome this issue, I remove all the replicas before
> the indexing is started and then add them after the indexing completes.
>
> The behavior(replicas falling into recovery mode) in incremental mode of
> indexing is troublesome as i cannot remove replicas during incremental
> indexing since it serves live traffic, i tried to throttle the speed at
> which documents are indexed but with no success as the cluster still goes
> on
> recovery.
>
> If i let the cluster as is the indexing  eventually completes and also
> recovers after a while, but since this is serving live traffic i just
> cannot
> let these replicas go into recovery mode since it degrades the search
> performance also (from the tests performed).
>
> I tried different commit settings like the below
> a)  No auto soft commit, no auto hard commit and a commit triggered at
> the
> end of indexing
> b)  No auto soft commit, yes auto hard commit and a commit in the end
> of
> indexing
> c)  Yes auto soft commit , no auto hard commit
> d)  Yes auto soft commit , yes auto hard commit
> e)  Different frequency setting for commits for above
>
> Unfortunately all the above yields the same behavior . The replicas still
> goes in recovery
>
> I have increased the zookeeper timeout from 30 seconds to 5 minutes and the
> problem persists.
>
> Is there any setting that would fix this issue ?
>
>
>
>
> -
>  -goutham
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-Replicas-fall-into-recovery-mode-right-after-update-tp4181706.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr Recovery process

2015-01-26 Thread Nishanth S
Thank you Ram.

On Mon, Jan 26, 2015 at 1:49 AM, Ramkumar R. Aiyengar <
andyetitmo...@gmail.com> wrote:

> https://issues.apache.org/jira/browse/SOLR-6359 has a patch which allows
> this to be configured, it has not gone in as yet.
>
> Note that the current design of the UpdateLog causes it to be less
> efficient if the number is bumped up too much, but certainly worth
> experimenting with.
> On 22 Jan 2015 02:47, "Nishanth S"  wrote:
>
> > Thank you Shalin.So in a system where the indexing rate is more than 5K
> TPS
> > or so the replica  will never be able to recover   through peer sync
> > process.In  my case I have mostly seen  step 3 where a full copy happens
> > and  if the index size is huge it takes a very long time for replicas to
> > recover.Is there a way we can  configure the  number of missed updates
> for
> > peer sync.
> >
> > Thanks,
> > Nishanth
> >
> > On Wed, Jan 21, 2015 at 4:47 PM, Shalin Shekhar Mangar <
> > shalinman...@gmail.com> wrote:
> >
> > > Hi Nishanth,
> > >
> > > The recovery happens as follows:
> > >
> > > 1. PeerSync is attempted first. If the number of new updates on leader
> is
> > > less than 100 then the missing documents are fetched directly and
> indexed
> > > locally. The tlog tells us the last 100 updates very quickly. Other
> uses
> > of
> > > the tlog are for durability of updates and of course, startup recovery.
> > > 2. If the above step fails then replication recovery is attempted. A
> hard
> > > commit is called on the leader and then the leader is polled for the
> > latest
> > > index version and generation. If the leader's version and generation
> are
> > > greater than local index's version/generation then the difference of
> the
> > > index files between leader and replica are fetched and installed.
> > > 3. If the above fails (because leader's version/generation is somehow
> > equal
> > > or more than local) then a full index recovery happens and the entire
> > index
> > > from the leader is fetched and installed locally.
> > >
> > > There are some other details involved in this process too but probably
> > not
> > > worth going into here.
> > >
> > > On Wed, Jan 21, 2015 at 5:13 PM, Nishanth S 
> > > wrote:
> > >
> > > > Hello Everyone,
> > > >
> > > > I am hitting a few issues with solr replicas going into recovery and
> > then
> > > > doing a full index copy.I am trying to understand the solr recovery
> > > > process.I have read a few blogs  on this and saw  that when leader
> > > notifies
> > > > a replica to  recover(in my case it is due to connection resets) it
> > will
> > > > try to do a peer sync first and  if the missed updates are more than
> > 100
> > > it
> > > > will do a full index copy from the leader.I am trying to understand
> > what
> > > > peer sync is and where does tlog come into picture.Are tlogs replayed
> > > only
> > > > during server restart?.Can some one  help me with this?
> > > >
> > > > Thanks,
> > > > Nishanth
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Shalin Shekhar Mangar.
> > >
> >
>


Removing a stored field from solrcloud 4.4

2015-01-30 Thread Nishanth S
Hello,

I have a field which is indexed  and stored  in the solr schema( 4.4.solr
cloud).This field is relatively huge and I plan to  only index the field
and not to store.Is there a  need to re-index the  documents once this
change is made?.

Thanks,
Nishanth


Re: Solr Logging files get high

2015-02-03 Thread Nishanth S
I feel the tlog size is perfectly fine since your hard commit interval is
low.You can try increasing your hard commit and soft commit values.Soft
commit of 1 sec is very low.Soft  commit is about visibility of
documents,so you can try and increase  this as far your  slas.

-Nishanth

On Mon, Feb 2, 2015 at 10:51 PM, Nitin Solanki  wrote:

> Hi Michael Della and Michael Sokolov,
>
> *size of tlog :-*
> 56K/mnt/nitin/solr/node1/solr/wikingram_shard3_replica1/data/tlog/
> 56K/mnt/nitin/solr/node1/solr/wikingram_shard7_replica1/data/tlog/
> 56K/mnt/nitin/solr/node2/solr/wikingram_shard4_replica1/data/tlog/
> 52K/mnt/nitin/solr/node2/solr/wikingram_shard8_replica1/data/tlog/
> 52K/mnt/nitin/solr/node3/solr/wikingram_shard1_replica1/data/tlog/
> 52K/mnt/nitin/solr/node3/solr/wikingram_shard5_replica1/data/tlog/
> 56K/mnt/nitin/solr/node4/solr/wikingram_shard2_replica1/data/tlog/
> 48K/mnt/nitin/solr/node4/solr/wikingram_shard6_replica1/data/tlog/
>
> *Size of logs :-*
> 755M/mnt/nitin/solr/node1/logs/
> 729M/mnt/nitin/solr/node2/logs/
> 729M/mnt/nitin/solr/node3/logs/
> 729M/mnt/nitin/solr/node4/logs/
>
> Which log is reducing performance?  I am committing more frequent hard
> commits. After 1 second , I am performing soft commit and after 15 seconds,
> I am performing hard commit. I indexed 2 GB of data and you can see the
> size of tlog that I pasted above. Is this tlog is good for 2GB indexed
> data? Or is it high? The main question is that size of log will harm
> performance of Solr?
>
>
>
>
> On Mon, Feb 2, 2015 at 10:27 PM, Michael Della Bitta <
> michael.della.bi...@appinions.com> wrote:
>
> > Good call, it could easily be the tlog Nitin is talking about.
> >
> > As for which definition of high, I was making assumptions as well. :)
> >
> > Michael Della Bitta
> >
> > Senior Software Engineer
> >
> > o: +1 646 532 3062
> >
> > appinions inc.
> >
> > “The Science of Influence Marketing”
> >
> > 18 East 41st Street
> >
> > New York, NY 10017
> >
> > t: @appinions  | g+:
> > plus.google.com/appinions
> > <
> >
> https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
> > >
> > w: appinions.com 
> >
> > On Mon, Feb 2, 2015 at 11:51 AM, Michael Sokolov <
> > msoko...@safaribooksonline.com> wrote:
> >
> > > I was tempted to suggest rehab -- but seriously it wasn't clear if
> Nitin
> > > meant the log files Michael is referring to, or the transaction log
> > > (tlog).  If it's the transaction log, the solution is more frequent
> hard
> > > commits.
> > >
> > > -Mike
> > >
> > > On 2/2/2015 11:48 AM, Michael Della Bitta wrote:
> > >
> > >> If you'd like to reduce the amount of lines Solr logs, you need to
> edit
> > >> the
> > >> file example/resources/log4j.properties in Solr's home directory.
> Change
> > >> lines that say INFO to WARN.
> > >>
> > >> Michael Della Bitta
> > >>
> > >> Senior Software Engineer
> > >>
> > >> o: +1 646 532 3062
> > >>
> > >> appinions inc.
> > >>
> > >> “The Science of Influence Marketing”
> > >>
> > >> 18 East 41st Street
> > >>
> > >> New York, NY 10017
> > >>
> > >> t: @appinions  | g+:
> > >> plus.google.com/appinions
> > >>  > >> 112002776285509593336/posts>
> > >> w: appinions.com 
> > >>
> > >> On Mon, Feb 2, 2015 at 7:42 AM, Nitin Solanki 
> > >> wrote:
> > >>
> > >>  Hi,
> > >>>   My solr logs directory has been get high. It is seriously
> > >>> problem
> > >>> or It harms my solr performance in both cases indexing as well as
> > >>> searching.
> > >>>
> > >>>
> > >
> >
>


Re: Transaction logs not getting deleted

2015-02-08 Thread Nishanth S
Are you saying that you are not able index at ll for the past  2 days?.Can
you tell me if leaders for all shards are up.There could be large tlog
files when you set auto commit  to a higher value.Check in the logs if
 tlog is getting replayed.I have used only 4.4 and with it the  tlog replay
 was slow.One thing we did was to shut down  solr instance,remove tlog from
the recovering replica and restart(for huge tlogs).It  did a full copy of
the index  from  the leader.It depends on your index size and I/O on how
fast it would be.I am not sure if there is a better solution to this.

This is a very nice blog that I read...
http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/


-Nishanth

On Sat, Feb 7, 2015 at 10:54 AM, vidit.asthana 
wrote:

> Dear Experts,
>
> I have a solrcloud setup - 8 machines, 7 collections(replicationFactor=2,
> numShards=8). Transaction log for one of the replica of a collection is not
> getting deleted and has grown to ~4GB.
>
> Here's the stats for this collection:
>
> *Solr Version:* 4.10.0
> *NumDocs:* 33.5 million
> *Softcommit duration:* 2 minutes
> *Hardcommit duration:* 30 mins
> *Indexing rate:* variable(avg: 2000/sec). Indexing is paused from last 2
> days.
>
> *Size of data directory of this particular buggy replica(shard1_replica1):*
> 20 GB(index dir = 16GB, tlog dir = 4 GB)
> *Size of data directory of second replica of this shard(shard1_replica2):*
> 16 GB(index dir = 16GB, tlog dir = 13 MB)
>
> *Number of files in tlog dir of buggy replica(shard1_replica1) =* 171 (from
> tlog.882 to tlog.0001172)
>
> Size of most files in tlog directory is 4K. Out of 171 files, 70 are
> greater
> than 1 MB, 10 files are larger than 100MB. Largest tlog file is 935MB.
> Oldest tlog file is 42MB. Latest tlog file in 6MB.
>
> *Number of files in tlog directory of second replica of this
> shard(shard1_replica2) = * 3 (tlog.0001169,
> tlog.0001171 and tlog.0001173)
>
>
> Let me know how can I fix this issue and delete old transaction logs of
> this
> particular replica. I have encountered the same issue previously with 4.8.1
> as well, where tlog kept growing to 20 GB(it was a test collection, so we
> just deleted it at that time).
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Transaction-logs-not-getting-deleted-tp4184635.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>