Re: Query modification

2013-10-20 Thread Sidharth
Hi,

Even I am using the QueryComponent to perform similar modification to the
query. I am modifying the query in the process() method of the Component. 
The problem I am facing is that after modifying the query and setting it in
the response builder, I make a call to super.process(rb).

This call is taking around 100ms and is degrading component's performance.
Wanted to know that is process the right place to do it and do we need to
make a call to super.process() method?

Regards,
Sidharth.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-modification-tp939584p4096753.html
Sent from the Solr - User mailing list archive at Nabble.com.


Understanding Performance of Function Query

2019-04-09 Thread Sidharth Negi
Hi,

I'm working with "edismax" and "function-query" parsers in Solr and have
difficulty in understanding whether the query time taken by
"function-query" makes sense. The query I'm trying to optimize looks as
follows:

q={!func sum($q1,$q2,$q3)} where q1,q2,q3 are edismax queries.

The QTime returned by edismax queries takes well under 50ms but it seems
that function-query is the rate determining step since combined query above
takes around 200-300ms. I also analyzed the performance of function query
using only constants.

The QTime results for different q are as follows:

   -

   097ms for q={!func} sum(10,20)
   -

   109ms for q={!func} sum(10,20,30)
   -

   127ms for q={!func} sum(10,20,30,40)
   -

   145ms for q={!func} sum(10,20,30,40,50)

Does this trend make sense? Are function-queries expected to be this slow?

What makes edismax queries so much faster?

What can I do to optimize my original query (which has edismax subqueries
q1,q2,q3) to work under 100ms?

I originally posted this question

on
StackOverflow with no success, so any help here would be appreciated.


Understanding Performance of Function Query

2019-04-09 Thread Sidharth Negi
Hi,

I'm working with "edismax" and "function-query" parsers in Solr and have
difficulty in understanding whether the query time taken by
"function-query" makes sense. The query I'm trying to optimize looks as
follows:

q={!func sum($q1,$q2,$q3)} where q1,q2,q3 are edismax queries.

The QTime returned by edismax queries takes well under 50ms but it seems
that function-query is the rate determining step since combined query above
takes around 200-300ms. I also analyzed the performance of function query
using only constants.

The QTime results for different q are as follows:

   -

   097ms for q={!func} sum(10,20)
   -

   109ms for q={!func} sum(10,20,30)
   -

   127ms for q={!func} sum(10,20,30,40)
   -

   145ms for q={!func} sum(10,20,30,40,50)

Does this trend make sense? Are function-queries expected to be this slow?

What makes edismax queries so much faster?

What can I do to optimize my original query (which has edismax subqueries
q1,q2,q3) to work under 100ms?

I originally posted this question

on
StackOverflow with no success, so any help here would be appreciated.


Issue :-Unable to write response, client closed connection or we are shutting down org.eclipse.jetty.io.EofException: Closed

2018-02-19 Thread Sidharth Aggarwal
Hello Team,


Hello We are getting below error while downloading indexing data (basically 
tagging them)



o.a.s.s.HttpSolrCall Unable to write response, client closed connection or we 
are shutting down
org.eclipse.jetty.io.EofException: Closed
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:620)
at 
org.apache.commons.io.output.ProxyOutputStream.write(ProxyOutputStream.java:55)
at 
org.apache.solr.response.QueryResponseWriterUtil$1.write(QueryResponseWriterUtil.java:54)
at java.io.OutputStream.write(OutputStream.java:116)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207)
at org.apache.solr.util.FastWriter.flush(FastWriter.java:140)
at org.apache.solr.util.FastWriter.flushBuffer(FastWriter.java:154)
at org.apache.solr.response.TextResponseWriter.close(TextResponseWriter.java:93)
at org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:73)
at 
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)
at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:809)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:538)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at 
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:748)




Kernal version : Linux qa-solr-lx21 4.4.103-92.56-default #1 SMP Wed Dec 27 
16:24:31 UTC 2017 (2fd2155) x86_64 x86_64 x86_64 GNU/Linux



Java Version : java version "1.8.0_131"  Java(TM) SE Runtime Environment (build 
1.8.0_131-b11)

Solr version :6.6

CPU :6



Please help to rectify this issue.


Regards,
Sidharth Aggarwal | Senior IT Operations Specialist - Internal Platforms
McKinsey & Company, Inc. | Vatika Business Park | Sector - 49 Sohna Road | 
Gurgaon 122018 | India
T +91 124 333 1378 | M +91 9278987563  | Internal 871 1378

++
This email is confidential and may be privileged. If you have received it
in error, please notify us immediately and then delete it.  Please do not
copy it, disclose its contents or use it for any purpose.
++


Query Elevation Component

2020-02-03 Thread Sidharth Negi
Hi,

I want to use the Solr query elevation component. Let's say I want to
elevate "doc_id" when a user inputs the query "qwerty". I am able to get a
prototype to work by filling these values in elevate.xml and hitting the
Solr API with q="qwerty".

However, in our service, where I want to plug this in, the 'q' parameter
isn't as pure and looks more like q="'qwerty' (field1:value1)
(field2:value2)".

Any suggestions on the best way to go about this?

Thanks


Max number of documents in update request

2020-07-07 Thread Sidharth Negi
Hi,

Could someone help me with the best way to go about determining the maximum
number of docs I can send in a single update call to Solr in a master /
slave architecture.

Thanks!


Re: Max number of documents in update request

2020-07-07 Thread Sidharth Negi
Thanks. This was useful, really appreciate it! :)

On Tue, Jul 7, 2020, 8:07 PM Walter Underwood  wrote:

> Agreed, I do something between 20 and 1000. If the master node is not
> handling any search traffic, use twice as many client threads as there are
> CPUs in the node. That should get you close to 100% CPU utilization.
> One thread will be waiting while a batch is being processed and another
> thread will be sending the next batch so there is no pause in processing.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Jul 7, 2020, at 6:12 AM, Erick Erickson 
> wrote:
> >
> > As many as you can send before blowing up.
> >
> > Really, the question is not answerable. 1K docs? 1G docs? 1 field or 500?
> >
> > And I don’t think it’s a good use of time to pursue much. See:
> >
> > https://lucidworks.com/post/really-batch-updates-solr-2/
> >
> > If you’re looking at trying to maximize throughput, adding
> > client threads that send Solr documents is a better approach.
> >
> > All that said, I usually just pick 1,000 and don’t worry about it.
> >
> > Best,
> > Erick
> >
> >> On Jul 7, 2020, at 8:59 AM, Sidharth Negi 
> wrote:
> >>
> >> Hi,
> >>
> >> Could someone help me with the best way to go about determining the
> maximum
> >> number of docs I can send in a single update call to Solr in a master /
> >> slave architecture.
> >>
> >> Thanks!
> >
>
>


Re: Understanding Performance of Function Query

2019-04-17 Thread Sidharth Negi
This does indeed reduce the time. but doesn't quite do what I wanted. This
approach penalizes the docs based on "coord" factor. In other words, for a
doc with scores=5 on just one query (and nothing on others), the resulting
score would now be 5/3 since only one clause matches.

1. I wonder why does the above query work at all? I can't find the above
query syntax anywhere in any docs or books on Solr, can you point me to
your source for this syntax?

2. Which parser is used to parse the larger query? No info about the parser
used for the larger query is given from parsedQuery field. (using
debug=true)

3. What if I did not want to sum (the scores of q1, q2, q3) but rather
wanted to use their values in some other way (eg. sqrt(q1) + sqrt(q2) +
0.6*q3). Is there no way of cleanly implementing a flow of computations to
be done on sub-query scores?

On Tue, Apr 9, 2019 at 7:40 PM Erik Hatcher  wrote:

> maybe something like q=
>
> ({!edismax  v=$q1} OR {!edismax  v=$q2} OR {!edismax ...
> v=$q3})
>
>  and setting q1, q2, q3 as needed (or all to the same maybe with different
> qf’s and such)
>
>   Erik
>
> > On Apr 9, 2019, at 09:12, sidharth228  wrote:
> >
> > I did infact use "bf" parameter for individual edismax queries.
> >
> > However, the reason I can't condense these edismax queries into a single
> > edismax query is because each of them uses different fields in "qf".
> >
> > Basically what I'm trying to do is this: each of these edismax queries
> (q1,
> > q2, q3) has a logic, and scores docs using it. I am then trying to
> combine
> > the scores (to get an overall score) from these scores later by summing
> > them.
> >
> > What options do I have of implementing this?
> >
> >
> >
> >
> > --
> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Understanding Performance of Function Query

2019-05-09 Thread Sidharth Negi
To those interested, I was able to disable coord factor by overriding it in
a new CustomSimilarity jar file. This can effectively sum the scores from
multiple edismax queries.
However, I'd be interested in any other methods which are able to do
not-just-direct-sums and can work on other logics for scores, eg. sqrt(q1)
+ sqrt(q2) + 0.6*q3.

On Wed, Apr 17, 2019 at 6:20 PM Sidharth Negi 
wrote:

> This does indeed reduce the time. but doesn't quite do what I wanted. This
> approach penalizes the docs based on "coord" factor. In other words, for a
> doc with scores=5 on just one query (and nothing on others), the resulting
> score would now be 5/3 since only one clause matches.
>
> 1. I wonder why does the above query work at all? I can't find the above
> query syntax anywhere in any docs or books on Solr, can you point me to
> your source for this syntax?
>
> 2. Which parser is used to parse the larger query? No info about the
> parser used for the larger query is given from parsedQuery field. (using
> debug=true)
>
> 3. What if I did not want to sum (the scores of q1, q2, q3) but rather
> wanted to use their values in some other way (eg. sqrt(q1) + sqrt(q2) +
> 0.6*q3). Is there no way of cleanly implementing a flow of computations to
> be done on sub-query scores?
>
> On Tue, Apr 9, 2019 at 7:40 PM Erik Hatcher 
> wrote:
>
>> maybe something like q=
>>
>> ({!edismax  v=$q1} OR {!edismax  v=$q2} OR {!edismax ...
>> v=$q3})
>>
>>  and setting q1, q2, q3 as needed (or all to the same maybe with
>> different qf’s and such)
>>
>>   Erik
>>
>> > On Apr 9, 2019, at 09:12, sidharth228  wrote:
>> >
>> > I did infact use "bf" parameter for individual edismax queries.
>> >
>> > However, the reason I can't condense these edismax queries into a single
>> > edismax query is because each of them uses different fields in "qf".
>> >
>> > Basically what I'm trying to do is this: each of these edismax queries
>> (q1,
>> > q2, q3) has a logic, and scores docs using it. I am then trying to
>> combine
>> > the scores (to get an overall score) from these scores later by summing
>> > them.
>> >
>> > What options do I have of implementing this?
>> >
>> >
>> >
>> >
>> > --
>> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>
>


Replicate Now Not Working

2019-07-23 Thread Sidharth Negi
Hi,

The "replicateNow" button in the admin UI doesn't seem to work since the
"schema.xml" (which I modified on slave) is not being updated to reflect
that of the master. I have used this button before and it has always cloned
index right away. Any ideas on what could be the possible reason for this?

The master and slave have proper "/replication" handlers and "schema.xml"
is in the confFiles.

Master's Solrconfig:
---
  commit startup 
schema.xml,stopwords.txt,synonyms.txt  

Slave's Solrconfig:
-
  MASTER_URL 01:00:00  

Thanks!


Re: Replicate Now Not Working

2019-07-23 Thread Sidharth Negi
Ah nevermind, I managed to resolve the issue.

It seems that replication only works if the index changes. I noticed that
both master and slave had same index versions since I had only changed the
schema.

When I modified a random field of a random document, the index versions of
master and slave became different, and replication worked as usual.

Is this common knowledge that I missed somehow?

Thanks!



On Tue, Jul 23, 2019, 7:10 PM Erick Erickson 
wrote:

> Are you sure that you’re _using_ schema.xml and not managed-schema? the
> default has changed. If no explicit entry is made in solrconfig.xml to
> define , you’ll be using managed-schema, not schema.xml.
>
> Best,
> Erick
>
> > On Jul 23, 2019, at 5:51 AM, Sidharth Negi 
> wrote:
> >
> > Hi,
> >
> > The "replicateNow" button in the admin UI doesn't seem to work since the
> > "schema.xml" (which I modified on slave) is not being updated to reflect
> > that of the master. I have used this button before and it has always
> cloned
> > index right away. Any ideas on what could be the possible reason for
> this?
> >
> > The master and slave have proper "/replication" handlers and "schema.xml"
> > is in the confFiles.
> >
> > Master's Solrconfig:
> > ---
> > 
>  > name="master"> commit  > "replicateAfter">startup 
> > schema.xml,stopwords.txt,synonyms.txt  
> >
> > Slave's Solrconfig:
> > -
> > 
>  > name="slave"> MASTER_URL  > "pollInterval">01:00:00  
> >
> > Thanks!
>
>


Re: Searches across Cores

2019-08-09 Thread Sidharth Negi
Hi,

If the number of cores spanned is low, I guess firing parallel queries and
taking union or intersection should work since their schema is the same. Do
you notice any perceivable difference in performance?

Best,
Sidharth

On Fri, Aug 9, 2019 at 2:54 PM Komal Motwani 
wrote:

> Hi,
>
>
>
> I have a use case where I would like a query to span across Cores
> (Multi-Core); all the cores involved do have same schema. I have started
> using solr just recently and have been trying to find ways to achieve this
> but couldn’t find any solution so far (Distributed searches, shards are not
> what I am looking for). I remember in one of the tech talks, there was a
> mention of this feature to be included in future releases. Appreciate any
> pointers to help me progress further.
>
>
>
> Thanks,
>
> Komal Motwani
>


Analysing Multivalued Fields

2019-12-30 Thread Sidharth Negi
Hi,

Is there a way to analyze how multiple values in a multivalued field are
being tokenized and processed during indexing?

The "Analysis" page on the UI assumes that my multiple comma-separated
values is a single value. It filters out the comma and acts as if it's a
single value that I specified.

Thanks in advance!