new data structure for some fields

2015-12-21 Thread Abhishek Mishra
Hello all

i am facing some kind of requirement that where for an id p1 is  associated
with some category_ids c1,c2,c3,c4 with some integers b1,b2,b3,b4. We need
to sort the query of solr on the basis of b1/b2/b3/b4 depending on given
category_id . Right now we mapped the category_ids into multi-valued
attribute. [c1,c2,c3,c4] something like this. we are querying into it. But
from now we also need to find which integer b1,b2,b3.. associated with
given category and also sort the whole query on it.


sorry for any typos..

Regards
Abhishek


Re: new data structure for some fields

2015-12-21 Thread Abhishek Mishra
hi binoy
thanks for reply. I mean by sort is to sort the data-sets on the basis of
integers values given for that category.
For any document let say for an id P1,
category associated is c1,c2,c3,c4 (using multivalued field)
For new implementation
similarly a number is associated with each category. let say
c1---b1,c2---b2,c3---b3,c4---b4.
now when we querying into solr for the ids which have c1 in their
categories. (q=category_id:c1) now i want the result of this query sorted
on the basis of number(b) associated with it throughout the result..

number of association is usually less than 20 (means an id can't be mapped
more than 20 category_ids)


On Mon, Dec 21, 2015 at 3:59 PM, Binoy Dalal  wrote:

> When you say sort, do you mean search on the basis of category and
> integers? Or score the docs based on their category and integer values?
>
> Also, for any given document, how many categories or integers are
> associated with it?
>
> On Mon, 21 Dec 2015, 14:43 Abhishek Mishra  wrote:
>
> > Hello all
> >
> > i am facing some kind of requirement that where for an id p1 is
> associated
> > with some category_ids c1,c2,c3,c4 with some integers b1,b2,b3,b4. We
> need
> > to sort the query of solr on the basis of b1/b2/b3/b4 depending on given
> > category_id . Right now we mapped the category_ids into multi-valued
> > attribute. [c1,c2,c3,c4] something like this. we are querying into it.
> But
> > from now we also need to find which integer b1,b2,b3.. associated with
> > given category and also sort the whole query on it.
> >
> >
> > sorry for any typos..
> >
> > Regards
> > Abhishek
> >
> --
> Regards,
> Binoy Dalal
>


Re: new data structure for some fields

2015-12-21 Thread Abhishek Mishra
Hi binoy it will not work as category and integer is one to one mapping so
if category_id is multivalued same goes to integer also. and you need some
kind of mechanism which will identify which integer to pick given to
category_id for search thenafter you can implement sort according to it.

On Mon, Dec 21, 2015 at 5:27 PM, Binoy Dalal  wrote:

> Small edit:
> The sort parameter in the solrconfig goes in the request handler
> declaration that you're using. So if it's select, put in the  name="defaults"> list.
>
> On Mon, 21 Dec 2015, 17:21 Binoy Dalal  wrote:
>
> > OK. You will only be able to sort based on the integers if the integer
> > field is single valued, I.e. only one integer is associated with one
> > category I'd.
> >
> > To do this you've to use the sort parameter.
> > You can either specify it in your solrconfig.XML like so:
> > integer asc
> > Field name followed by the order - asc/desc
> >
> > Or you can specify the it along with our query by appending it to your
> > query like so:
> > /select?q=query&sort=integet%20asc
> >
> > If you want to apply these sorting rules for all docs, then specify the
> > sorting in your solrconfig. If you only want It for a certain subset then
> > apply the parameter from code at the app level
> >
> > On Mon, 21 Dec 2015, 16:49 Abhishek Mishra  wrote:
> >
> >> hi binoy
> >> thanks for reply. I mean by sort is to sort the data-sets on the basis
> of
> >> integers values given for that category.
> >> For any document let say for an id P1,
> >> category associated is c1,c2,c3,c4 (using multivalued field)
> >> For new implementation
> >> similarly a number is associated with each category. let say
> >> c1---b1,c2---b2,c3---b3,c4---b4.
> >> now when we querying into solr for the ids which have c1 in their
> >> categories. (q=category_id:c1) now i want the result of this query
> sorted
> >> on the basis of number(b) associated with it throughout the result..
> >>
> >> number of association is usually less than 20 (means an id can't be
> mapped
> >> more than 20 category_ids)
> >>
> >>
> >> On Mon, Dec 21, 2015 at 3:59 PM, Binoy Dalal 
> >> wrote:
> >>
> >> > When you say sort, do you mean search on the basis of category and
> >> > integers? Or score the docs based on their category and integer
> values?
> >> >
> >> > Also, for any given document, how many categories or integers are
> >> > associated with it?
> >> >
> >> > On Mon, 21 Dec 2015, 14:43 Abhishek Mishra 
> >> wrote:
> >> >
> >> > > Hello all
> >> > >
> >> > > i am facing some kind of requirement that where for an id p1 is
> >> > associated
> >> > > with some category_ids c1,c2,c3,c4 with some integers b1,b2,b3,b4.
> We
> >> > need
> >> > > to sort the query of solr on the basis of b1/b2/b3/b4 depending on
> >> given
> >> > > category_id . Right now we mapped the category_ids into multi-valued
> >> > > attribute. [c1,c2,c3,c4] something like this. we are querying into
> it.
> >> > But
> >> > > from now we also need to find which integer b1,b2,b3.. associated
> with
> >> > > given category and also sort the whole query on it.
> >> > >
> >> > >
> >> > > sorry for any typos..
> >> > >
> >> > > Regards
> >> > > Abhishek
> >> > >
> >> > --
> >> > Regards,
> >> > Binoy Dalal
> >> >
> >>
> > --
> > Regards,
> > Binoy Dalal
> >
> --
> Regards,
> Binoy Dalal
>


Need a group custom function(fieldcollapsing)

2016-03-14 Thread Abhishek Mishra
Hi all
We are running on solr5.2.1 . Now the requirement come that we need the
data on basis on some algo. The algorithm part we need to put on result
obtained from query. So best we can do is using
group.field,group.main,group.func. In group.func we need to use custom
function which will run the algorithm part. My doubts are where we need to
put custom function in which file??.  I found some articles related to this
https://dzone.com/articles/how-write-custom-solr
in this it's not explained where to put the code part in which file.


Regards,
Abhishek


Re: Need a group custom function(fieldcollapsing)

2016-03-15 Thread Abhishek Mishra
Any update on this???

On Mon, Mar 14, 2016 at 4:06 PM, Abhishek Mishra 
wrote:

> Hi all
> We are running on solr5.2.1 . Now the requirement come that we need the
> data on basis on some algo. The algorithm part we need to put on result
> obtained from query. So best we can do is using
> group.field,group.main,group.func. In group.func we need to use custom
> function which will run the algorithm part. My doubts are where we need to
> put custom function in which file??.  I found some articles related to this
> https://dzone.com/articles/how-write-custom-solr
> in this it's not explained where to put the code part in which file.
>
>
> Regards,
> Abhishek
>


edismax parsing confusion

2017-04-03 Thread Abhishek Mishra
Hi all
i am running solr query with these parameter

bf: "sum(product(new_popularity,100),if(exists(third_price),50,0))"
qf: "test_product^5 category_path_tf^4 product_id gender"
q: "handbags between rs150 and rs 400"
defType: "edismax"

parsed query is like below one

for q:-
(+(DisjunctionMaxQuery((category_path_tf:handbags^4.0 | gender:handbag |
test_product:handbag^5.0 | product_id:handbags))
DisjunctionMaxQuery((category_path_tf:between^4.0 | gender:between |
test_product:between^5.0 | product_id:between))
+DisjunctionMaxQuery((category_path_tf:rs150^4.0 | gender:rs150 |
test_product:rs150^5.0 | product_id:rs150))
+DisjunctionMaxQuery((category_path_tf:rs^4.0 | gender:rs |
test_product:rs^5.0 | product_id:rs))
DisjunctionMaxQuery((category_path_tf:400^4.0 | gender:400 |
test_product:400^5.0 | product_id:400))) DisjunctionMaxQuery(("":"handbags
between rs150 ? rs 400")) (DisjunctionMaxQuery(("":"handbags between"))
DisjunctionMaxQuery(("":"between rs150")) DisjunctionMaxQuery(("":"rs
400"))) (DisjunctionMaxQuery(("":"handbags between rs150"))
DisjunctionMaxQuery(("":"between rs150")) DisjunctionMaxQuery(("":"rs150 ?
rs")) DisjunctionMaxQuery(("":"? rs 400")))
FunctionQuery(sum(product(float(new_popularity),const(100)),if(exists(float(third_price)),const(50),const(0)/no_coord

but for dismax parser it is working perfect:

(+(DisjunctionMaxQuery((category_path_tf:handbags^4.0 | gender:handbag |
test_product:handbag^5.0 | product_id:handbags))
DisjunctionMaxQuery((category_path_tf:between^4.0 | gender:between |
test_product:between^5.0 | product_id:between))
DisjunctionMaxQuery((category_path_tf:rs150^4.0 | gender:rs150 |
test_product:rs150^5.0 | product_id:rs150))
DisjunctionMaxQuery((product_id:and))
DisjunctionMaxQuery((category_path_tf:rs^4.0 | gender:rs |
test_product:rs^5.0 | product_id:rs))
DisjunctionMaxQuery((category_path_tf:400^4.0 | gender:400 |
test_product:400^5.0 | product_id:400))) DisjunctionMaxQuery(("":"handbags
between rs150 ? rs 400"))
FunctionQuery(sum(product(float(new_popularity),const(100)),if(exists(float(third_price)),const(50),const(0)/no_coord


*according to me difference between dismax and edismax is based on some
extra features plus working of boosting fucntions.*



Regards,
Abhishek


Re: edismax parsing confusion

2017-04-04 Thread Abhishek Mishra
Hello guys
sorry for late response. @steve I am using solr 5.2 .
@greg i am using default mm from config file(According to me it is default
mm is 1).

Regards,
Abhishek

On Tue, Apr 4, 2017 at 5:27 AM, Greg Pendlebury 
wrote:

> eDismax uses 'mm', so knowing what that has been set to is important, or if
> it has been left unset/default you would need to consider whether 'q.op'
> has been set. Or the default operator from the config file.
>
> Ta,
> Greg
>
>
> On 3 April 2017 at 23:56, Steve Rowe  wrote:
>
> > Hi Abhishek,
> >
> > Which version of Solr are you using?
> >
> > I can see that the parsed queries are different, but they’re also very
> > similar, and there’s a lot of detail there - can you be more specific
> about
> > what the problem is?
> >
> > --
> > Steve
> > www.lucidworks.com
> >
> > > On Apr 3, 2017, at 4:54 AM, Abhishek Mishra 
> > wrote:
> > >
> > > Hi all
> > > i am running solr query with these parameter
> > >
> > > bf: "sum(product(new_popularity,100),if(exists(third_price),50,0))"
> > > qf: "test_product^5 category_path_tf^4 product_id gender"
> > > q: "handbags between rs150 and rs 400"
> > > defType: "edismax"
> > >
> > > parsed query is like below one
> > >
> > > for q:-
> > > (+(DisjunctionMaxQuery((category_path_tf:handbags^4.0 |
> gender:handbag |
> > > test_product:handbag^5.0 | product_id:handbags))
> > > DisjunctionMaxQuery((category_path_tf:between^4.0 | gender:between |
> > > test_product:between^5.0 | product_id:between))
> > > +DisjunctionMaxQuery((category_path_tf:rs150^4.0 | gender:rs150 |
> > > test_product:rs150^5.0 | product_id:rs150))
> > > +DisjunctionMaxQuery((category_path_tf:rs^4.0 | gender:rs |
> > > test_product:rs^5.0 | product_id:rs))
> > > DisjunctionMaxQuery((category_path_tf:400^4.0 | gender:400 |
> > > test_product:400^5.0 | product_id:400))) DisjunctionMaxQuery(("":"
> > handbags
> > > between rs150 ? rs 400")) (DisjunctionMaxQuery(("":"handbags
> between"))
> > > DisjunctionMaxQuery(("":"between rs150")) DisjunctionMaxQuery(("":"rs
> > > 400"))) (DisjunctionMaxQuery(("":"handbags between rs150"))
> > > DisjunctionMaxQuery(("":"between rs150"))
> > DisjunctionMaxQuery(("":"rs150 ?
> > > rs")) DisjunctionMaxQuery(("":"? rs 400")))
> > > FunctionQuery(sum(product(float(new_popularity),const(
> > 100)),if(exists(float(third_price)),const(50),const(0)/no_coord
> > >
> > > but for dismax parser it is working perfect:
> > >
> > > (+(DisjunctionMaxQuery((category_path_tf:handbags^4.0 |
> gender:handbag |
> > > test_product:handbag^5.0 | product_id:handbags))
> > > DisjunctionMaxQuery((category_path_tf:between^4.0 | gender:between |
> > > test_product:between^5.0 | product_id:between))
> > > DisjunctionMaxQuery((category_path_tf:rs150^4.0 | gender:rs150 |
> > > test_product:rs150^5.0 | product_id:rs150))
> > > DisjunctionMaxQuery((product_id:and))
> > > DisjunctionMaxQuery((category_path_tf:rs^4.0 | gender:rs |
> > > test_product:rs^5.0 | product_id:rs))
> > > DisjunctionMaxQuery((category_path_tf:400^4.0 | gender:400 |
> > > test_product:400^5.0 | product_id:400))) DisjunctionMaxQuery(("":"
> > handbags
> > > between rs150 ? rs 400"))
> > > FunctionQuery(sum(product(float(new_popularity),const(
> > 100)),if(exists(float(third_price)),const(50),const(0)/no_coord
> > >
> > >
> > > *according to me difference between dismax and edismax is based on some
> > > extra features plus working of boosting fucntions.*
> > >
> > >
> > >
> > > Regards,
> > > Abhishek
> >
> >
>


Inconsistent recovery status of replicas

2020-12-07 Thread Abhishek Mishra
Hello guys
I am using Solr cloud 7.7 on Kubernetes. During the adding of replica
sometimes we see inconsistency after successful addition nodes go to
recovery status sometimes it takes 2-3 minute to recover while sometimes it
takes more than an hour. We are getting this error.
We have 4 shards each shard has around 7GB of data. After seeing the system
metrics we see bandwidth exchanges are high between the leader and the new
replica node. Do we have any way to rate-limit the bandwidth exchange like
we had some configuration for it in master-slave? maxMbpersec something
like that?

Error

> 2020-12-01 13:40:34.983 ERROR 
> (recoveryExecutor-4-thread-1-processing-n:solr-olxid-statefulset-pull-9.solr-olxid-statefulset-headless.relevance:8983_solr
>  x:olxid-20200531_d6e431ec_shard2_replica_p3955 c:olxid-20200531_d6e431ec 
> s:shard2 r:core_node3956) [c:olxid-20200531_d6e431ec s:shard2 r:core_node3956 
> x:olxid-20200531_d6e431ec_shard2_replica_p3955] o.a.s.c.RecoveryStrategy 
> Error while trying to 
> recover:org.apache.solr.client.solrj.SolrServerException: Timeout occured 
> while waiting response from server at: 
> http://solr-olxid-statefulset-tlog-7.solr-olxid-statefulset-headless.relevance:8983/solr/olxid-20200531_d6e431ec_shard2_replica_t139
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:654)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
>   at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
>   at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211)
>   at 
> org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:287)
>   at 
> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:215)
>   at 
> org.apache.solr.cloud.RecoveryStrategy.doReplicateOnlyRecovery(RecoveryStrategy.java:382)
>   at 
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:328)
>   at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:307)
>   at 
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
>   at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.net.SocketTimeoutException: Read timed out
>   at java.base/java.net.SocketInputStream.socketRead0(Native Method)
>   at 
> java.base/java.net.SocketInputStream.socketRead(SocketInputStream.java:115)
>   at java.base/java.net.SocketInputStream.read(SocketInputStream.java:168)
>   at java.base/java.net.SocketInputStream.read(SocketInputStream.java:140)
>   at 
> org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
>   at 
> org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
>   at 
> org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
>   at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
>   at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
>   at 
> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
>   at 
> org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
>   at 
> org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
>   at 
> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
>   at 
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>   at 
> org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:120)
>   at 
> org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
>   at 
> org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
>   at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
>   at 
> org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
>   at 
> org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
>   at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
>

Migrating from solr 7.7 to solr 8.6 issues

2020-12-07 Thread Abhishek Mishra
We are trying to migrate from solr 7.7 to solr 8.6 on Kubernetes. We are
using zookeeper-3.4.13. While adding a replica to the cluster, it returns
500 status code. While in the background it is added sometimes successfully
while sometime it is in the inactive node. We are using http2 without SSL.

Error:

>  {

  "responseHeader":{
"status":500,
"QTime":307},
  "failure":{

"solr-pklatest-statefulset-pull-0.solr-pklatest-statefulset-headless.relevance:8983_solr":"org.apache.solr.client.solrj.SolrServerException:IOException
occured when talking to server at: null"},
  "Operation addreplica caused
exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
ADDREPLICA failed to create replica",
  "exception":{
"msg":"ADDREPLICA failed to create replica",
"rspCode":500},
  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.common.SolrException"],
"msg":"ADDREPLICA failed to create replica",
"trace":"org.apache.solr.common.SolrException: ADDREPLICA failed to
create replica\n\tat
org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:65)\n\tat
org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:286)\n\tat
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:257)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:214)\n\tat
org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:854)\n\tat
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:818)\n\tat
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:566)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1610)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1300)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1215)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n\tat
org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:500)\n\tat
org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)\n\tat
org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)\n\tat
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)\n\tat
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273)\n\tat
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)\n\tat
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThr

solrcloud with EKS kubernetes

2020-12-08 Thread Abhishek Mishra
Hello guys,
We are kind of facing some of the issues(Like timeout etc.) which are very
inconsistent. By any chance can it be related to EKS? We are using solr 7.7
and zookeeper 3.4.13. Should we move to ECS?

Regards,
Abhishek


Re: solrcloud with EKS kubernetes

2020-12-13 Thread Abhishek Mishra
Hi Houston,
Sorry for the late reply. Each shard has a 9GB size around.
Yeah, we are providing enough resources to pods. We are currently
using c5.4xlarge.
XMS and XMX is 16GB. The machine is having 32 GB and 16 core.
No, I haven't run it outside Kubernetes. But I do have colleagues who did
the same on 7.2 and didn't face any issue regarding it.
Storage volume is gp2 50GB.
It's not the search query where we are facing inconsistencies or timeouts.
Seems some internal admin APIs sometimes have issues. So while adding new
replica in clusters sometimes result in inconsistencies. Like recovery
takes some time more than one hour.

Regards,
Abhishek

On Thu, Dec 10, 2020 at 10:23 AM Houston Putman 
wrote:

> Hello Abhishek,
>
> It's really hard to provide any advice without knowing any information
> about your setup/usage.
>
> Are you giving your Solr pods enough resources on EKS?
> Have you run Solr in the same configuration outside of kubernetes in the
> past without timeouts?
> What type of storage volumes are you using to store your data?
> Are you using headless services to connect your Solr Nodes, or ingresses?
>
> If this is the first time that you are using this data + Solr
> configuration, maybe it's just that your data within Solr isn't optimized
> for the type of queries that you are doing.
> If you have run it successfully in the past outside of Kubernetes, then I
> would look at the resources that you are giving your pods and the storage
> volumes that you are using.
> If you are using Ingresses, that might be causing slow connections between
> nodes, or between your client and Solr.
>
> - Houston
>
> On Wed, Dec 9, 2020 at 3:24 PM Abhishek Mishra 
> wrote:
>
> > Hello guys,
> > We are kind of facing some of the issues(Like timeout etc.) which are
> very
> > inconsistent. By any chance can it be related to EKS? We are using solr
> 7.7
> > and zookeeper 3.4.13. Should we move to ECS?
> >
> > Regards,
> > Abhishek
> >
>


Re: solrcloud with EKS kubernetes

2020-12-23 Thread Abhishek Mishra
Hi Jonathan,
Merry Christmas.
Thanks for the suggestion. To manage IOPS can we do something on
rate-limiting behalf?

Regards,
Abhishek


On Thu, Dec 17, 2020 at 5:07 AM Jonathan Tan  wrote:

> Hi Abhishek,
>
> We're running Solr Cloud 8.6 on GKE.
> 3 node cluster, running 4 cpus (configured) and 8gb of min & max JVM
> configured, all with anti-affinity so they never exist on the same node.
> It's got 2 collections of ~13documents each, 6 shards, 3 replicas each,
> disk usage on each node is ~54gb (we've got all the shards replicated to
> all nodes)
>
> We're also using a 200gb zonal SSD, which *has* been necessary just so that
> we've got the right IOPS & bandwidth. (That's approximately 6000 IOPS for
> read & write each, and 96MB/s for read & write each)
>
> Various lessons learnt...
> You definitely don't want them ever on the same kubernetes node. From a
> resilience perspective, yes, but also when one SOLR node gets busy, they
> tend to all get busy, so now you'll have resource contention. Recovery can
> also get very busy and resource intensive, and again, sitting on the same
> node is problematic. We also saw the need to move to SSDs because of how
> IOPS bound we were.
>
> Did I mention use SSDs? ;)
>
> Good luck!
>
> On Mon, Dec 14, 2020 at 5:34 PM Abhishek Mishra 
> wrote:
>
> > Hi Houston,
> > Sorry for the late reply. Each shard has a 9GB size around.
> > Yeah, we are providing enough resources to pods. We are currently
> > using c5.4xlarge.
> > XMS and XMX is 16GB. The machine is having 32 GB and 16 core.
> > No, I haven't run it outside Kubernetes. But I do have colleagues who did
> > the same on 7.2 and didn't face any issue regarding it.
> > Storage volume is gp2 50GB.
> > It's not the search query where we are facing inconsistencies or
> timeouts.
> > Seems some internal admin APIs sometimes have issues. So while adding new
> > replica in clusters sometimes result in inconsistencies. Like recovery
> > takes some time more than one hour.
> >
> > Regards,
> > Abhishek
> >
> > On Thu, Dec 10, 2020 at 10:23 AM Houston Putman  >
> > wrote:
> >
> > > Hello Abhishek,
> > >
> > > It's really hard to provide any advice without knowing any information
> > > about your setup/usage.
> > >
> > > Are you giving your Solr pods enough resources on EKS?
> > > Have you run Solr in the same configuration outside of kubernetes in
> the
> > > past without timeouts?
> > > What type of storage volumes are you using to store your data?
> > > Are you using headless services to connect your Solr Nodes, or
> ingresses?
> > >
> > > If this is the first time that you are using this data + Solr
> > > configuration, maybe it's just that your data within Solr isn't
> optimized
> > > for the type of queries that you are doing.
> > > If you have run it successfully in the past outside of Kubernetes,
> then I
> > > would look at the resources that you are giving your pods and the
> storage
> > > volumes that you are using.
> > > If you are using Ingresses, that might be causing slow connections
> > between
> > > nodes, or between your client and Solr.
> > >
> > > - Houston
> > >
> > > On Wed, Dec 9, 2020 at 3:24 PM Abhishek Mishra 
> > > wrote:
> > >
> > > > Hello guys,
> > > > We are kind of facing some of the issues(Like timeout etc.) which are
> > > very
> > > > inconsistent. By any chance can it be related to EKS? We are using
> solr
> > > 7.7
> > > > and zookeeper 3.4.13. Should we move to ECS?
> > > >
> > > > Regards,
> > > > Abhishek
> > > >
> > >
> >
>


How pull replica works

2021-01-05 Thread Abhishek Mishra
I want to know how pull replica replicate from leader in real? Does
internally admin API get data from the leader in form of batches?

Regards,
Abhishek


Re: How pull replica works

2021-01-07 Thread Abhishek Mishra
Thanks, Tomas. It was really helpful.
Regards,
Abhishek

On Thu, Jan 7, 2021 at 7:03 AM Tomás Fernández Löbbe 
wrote:

> Hi Abhishek,
> The pull replicas uses the "/replication" endpoint to copy full segment
> files (sections of the index) from the leader. It works in a similar way to
> the legacy leader/follower replication. This[1] talk tries to explain the
> different replica types and how they work.
>
> HTH,
>
> Tomás
>
> [1] https://www.youtube.com/watch?v=C8C9GRTCSzY
>
> On Tue, Jan 5, 2021 at 10:29 PM Abhishek Mishra 
> wrote:
>
> > I want to know how pull replica replicate from leader in real? Does
> > internally admin API get data from the leader in form of batches?
> >
> > Regards,
> > Abhishek
> >
>


Re: Solr background merge in case of pull replicas

2021-01-07 Thread Abhishek Mishra
Hi Kshitij

What I can guess over here. Pull replicas replicate segments from tlog, so
whenever merge happens on tlog it will decrease the number of segments
which is more than ideal case(i.e. adding a new segment). Afaik
adding/deleting the segment is kind of a stop the world moment. This can be
the reason for the increase in response time.

Regards,
Abhishek

On Thu, Jan 7, 2021 at 12:43 PM kshitij tyagi 
wrote:

> Hi,
>
> I am not querying on tlog replicas, solr version is 8.6 and 2 tlogs and 4
> pull replica setup.
>
> why should pull replicas be affected during background segment merges?
>
> Regards,
> kshitij
>
> On Wed, Jan 6, 2021 at 9:48 PM Ritvik Sharma 
> wrote:
>
> > Hi
> > It may be the cause of rebalancing and querying is not available not on
> > tlog at that moment.
> > You can check tlog logs and pull log when u are facing this issue.
> >
> > May i know which version of solr you are using? and what is the ration of
> > tlog and pull nodes.
> >
> > On Wed, 6 Jan 2021 at 2:46 PM, kshitij tyagi 
> > wrote:
> >
> > > Hi,
> > >
> > > I am having a  tlog + pull replica solr cloud setup.
> > >
> > > 1. I am observing that whenever background segment merge is triggered
> > > automatically, i see high response time on all of my solr nodes.
> > >
> > > As far as I know merges must be happening on tlog and hence the
> increase
> > > response time, i am not able to understand that why my pull replicas
> are
> > > affected during background index merges.
> > >
> > > Can someone give some insights on this? What is affecting my pull
> > replicas
> > > during index merges?
> > >
> > > Regards,
> > > kshitij
> > >
> >
>


Re: solrcloud with EKS kubernetes

2021-01-14 Thread Abhishek Mishra
Hi Jonathan,
it was really helpful. Some of the metrics were crossing threshold like
network bandwidth etc.

Regards,
Abhishek

On Sat, Dec 26, 2020 at 7:54 PM Jonathan Tan  wrote:

> Hi Abhishek,
>
> Merry Christmas to you too!
> I think it's really a question regarding your indexing speed NFRs.
>
> Have you had a chance to take a look at your IOPS & write bytes/second
> graphs for that host & PVC?
>
> I'd suggest that's the first thing to go look at, so that you can find out
> whether you're actually IOPS bound or not.
> If you are, then it becomes a question of *how* you're indexing, and
> whether that can be "slowed down" or not.
>
>
>
> On Thu, Dec 24, 2020 at 5:55 PM Abhishek Mishra 
> wrote:
>
> > Hi Jonathan,
> > Merry Christmas.
> > Thanks for the suggestion. To manage IOPS can we do something on
> > rate-limiting behalf?
> >
> > Regards,
> > Abhishek
> >
> >
> > On Thu, Dec 17, 2020 at 5:07 AM Jonathan Tan  wrote:
> >
> > > Hi Abhishek,
> > >
> > > We're running Solr Cloud 8.6 on GKE.
> > > 3 node cluster, running 4 cpus (configured) and 8gb of min & max JVM
> > > configured, all with anti-affinity so they never exist on the same
> node.
> > > It's got 2 collections of ~13documents each, 6 shards, 3 replicas each,
> > > disk usage on each node is ~54gb (we've got all the shards replicated
> to
> > > all nodes)
> > >
> > > We're also using a 200gb zonal SSD, which *has* been necessary just so
> > that
> > > we've got the right IOPS & bandwidth. (That's approximately 6000 IOPS
> for
> > > read & write each, and 96MB/s for read & write each)
> > >
> > > Various lessons learnt...
> > > You definitely don't want them ever on the same kubernetes node. From a
> > > resilience perspective, yes, but also when one SOLR node gets busy,
> they
> > > tend to all get busy, so now you'll have resource contention. Recovery
> > can
> > > also get very busy and resource intensive, and again, sitting on the
> same
> > > node is problematic. We also saw the need to move to SSDs because of
> how
> > > IOPS bound we were.
> > >
> > > Did I mention use SSDs? ;)
> > >
> > > Good luck!
> > >
> > > On Mon, Dec 14, 2020 at 5:34 PM Abhishek Mishra 
> > > wrote:
> > >
> > > > Hi Houston,
> > > > Sorry for the late reply. Each shard has a 9GB size around.
> > > > Yeah, we are providing enough resources to pods. We are currently
> > > > using c5.4xlarge.
> > > > XMS and XMX is 16GB. The machine is having 32 GB and 16 core.
> > > > No, I haven't run it outside Kubernetes. But I do have colleagues who
> > did
> > > > the same on 7.2 and didn't face any issue regarding it.
> > > > Storage volume is gp2 50GB.
> > > > It's not the search query where we are facing inconsistencies or
> > > timeouts.
> > > > Seems some internal admin APIs sometimes have issues. So while adding
> > new
> > > > replica in clusters sometimes result in inconsistencies. Like
> recovery
> > > > takes some time more than one hour.
> > > >
> > > > Regards,
> > > > Abhishek
> > > >
> > > > On Thu, Dec 10, 2020 at 10:23 AM Houston Putman <
> > houstonput...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Hello Abhishek,
> > > > >
> > > > > It's really hard to provide any advice without knowing any
> > information
> > > > > about your setup/usage.
> > > > >
> > > > > Are you giving your Solr pods enough resources on EKS?
> > > > > Have you run Solr in the same configuration outside of kubernetes
> in
> > > the
> > > > > past without timeouts?
> > > > > What type of storage volumes are you using to store your data?
> > > > > Are you using headless services to connect your Solr Nodes, or
> > > ingresses?
> > > > >
> > > > > If this is the first time that you are using this data + Solr
> > > > > configuration, maybe it's just that your data within Solr isn't
> > > optimized
> > > > > for the type of queries that you are doing.
> > > > > If you have run it successfully in the past outside of Kubernetes,
> > > then I
> > > > > would look at the resources that you are giving your pods and the
> > > storage
> > > > > volumes that you are using.
> > > > > If you are using Ingresses, that might be causing slow connections
> > > > between
> > > > > nodes, or between your client and Solr.
> > > > >
> > > > > - Houston
> > > > >
> > > > > On Wed, Dec 9, 2020 at 3:24 PM Abhishek Mishra <
> solrmis...@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Hello guys,
> > > > > > We are kind of facing some of the issues(Like timeout etc.) which
> > are
> > > > > very
> > > > > > inconsistent. By any chance can it be related to EKS? We are
> using
> > > solr
> > > > > 7.7
> > > > > > and zookeeper 3.4.13. Should we move to ECS?
> > > > > >
> > > > > > Regards,
> > > > > > Abhishek
> > > > > >
> > > > >
> > > >
> > >
> >
>