Re: Solr cloud questions

2019-08-16 Thread Ere Maijala
Does your web application, by any chance, allow deep paging or something
like that which requires returning rows at the end of a large result
set? Something like a query where you could have parameters like
&rows=10&start=100 ? That can easily cause OOM with Solr when using
a sharded index. It would typically require a large number of rows to be
returned and combined from all shards just to get the few rows to return
in the correct order.

For the above example with 8 shards, Solr would have to fetch 1 000 010
rows from each shard. That's over 8 million rows! Even if it's just
identifiers, that's a lot of memory required for an operation that seems
so simple from the surface.

If this is the case, you'll need to prevent the web application from
issuing such queries. This may mean something like supporting paging
only among the first 10 000 results. Typical requirement may also be to
be able to see the last results of a query, but this can be accomplished
by allowing sorting in both ascending and descending order.

Regards,
Ere

Kojo kirjoitti 14.8.2019 klo 16.20:
> Shawn,
> 
> Only my web application access this solr. at a first look at http server
> logs I didnt find something different.  Sometimes I have a very big crawler
> access to my servers, this was my first bet.
> 
> No scheduled crons running at this time too.
> 
> I think that I will reconfigure my boxes with two solr nodes each instead
> of four and increase heap to 16GB. This box only run Solr and has 64Gb.
> Each Solr will use 16Gb and the box will still have 32Gb for the OS. What
> do you think?
> 
> This is a production server, so I will plan to migrate.
> 
> Regards,
> Koji
> 
> 
> Em ter, 13 de ago de 2019 às 12:58, Shawn Heisey 
> escreveu:
> 
>> On 8/13/2019 9:28 AM, Kojo wrote:
>>> Here are the last two gc logs:
>>>
>>>
>> https://send.firefox.com/download/6cc902670aa6f7dd/#Ee568G9vUtyK5zr-nAJoMQ
>>
>> Thank you for that.
>>
>> Analyzing the 20MB gc log actually looks like a pretty healthy system.
>> That log covers 58 hours of runtime, and everything looks very good to me.
>>
>> https://www.dropbox.com/s/yu1pyve1bu9maun/gc-analysis-kojo.png?dl=0
>>
>> But the small log shows a different story.  That log only covers a
>> little more than four minutes.
>>
>> https://www.dropbox.com/s/vkxfoihh12brbnr/gc-analysis-kojo2.png?dl=0
>>
>> What happened at approximately 10:55:15 PM on the day that the smaller
>> log was produced?  Whatever happened caused Solr's heap usage to
>> skyrocket and require more than 6GB.
>>
>> Thanks,
>> Shawn
>>
> 

-- 
Ere Maijala
Kansalliskirjasto / The National Library of Finland


using let() with other streaming expressions

2019-08-16 Thread Viktors Belovs
Dear Solr Comunity,

Recently I've been working with the 'let()' expression.
And I got in a sort of trouble, when I was trying combining it with the 
different streaming expressions,
as well as trying to re-assign variables.

As an example:
let(
  a=search(techproducts, q="cat:electronics", fl="id, manu, price", sort="id 
asc"),
  b=search(techproducts, q="cat:electronics", fl="id, popularity, _version_", 
sort="id asc"),
  c=innerJoin(a, b, on=id)
)

In case with re-assigning the variables:
let(
  a=search(techproducts, q="cat:electronics", fl="id, manu, price", sort="id 
asc"),
  b=a,
  c=innerJoin(a, b, on=id)
)

According to documentation (Solr v8.1 the version I use) its possible to store 
any kind values with 'let()'
function but it seems the usage of such a function is strictly limited for 
specific mathematical operations.

I was wondering if there is possible way to reduce the verbosity and 
(potentially)
increase the efficiency of the streaming expression's performance, while 
dealing and constructing complex
combinations of different streaming expressions.

I assume the 'let()' doesn't suited for such purposes, but perhaps there is an 
alternative way to do such a thing.

Regards,
Viktors

Re: "Missing" Docs in Solr

2019-08-16 Thread Zheng Lin Edwin Yeo
Hi,

Did you encounter any error message during those occasions where you get 0
hits returned?

Regards,
Edwin

On Fri, 16 Aug 2019 at 06:02, Brian Lininger 
wrote:

> Hi All,
> I'm seeing some odd behavior that I'm hoping someone might have encountered
> before.  We're using Solr 6.6.6 and very infrequently (happened twice in
> the past year) we're getting 0 hits returned for a query that I know should
> have results.  We've hit this issue once over the past year in 2 separate
> collections (both with a single shard), each with several million
> documents, where a query will return 0 hits.  I see a similar query run
> 5-10s later and it will get the expected # of hits (~1M hits) so I know
> that we haven't reindexed a million docs between the two queries.  Besides
> that I can see that between the 2 queries we only added 150-200 docs with a
> single commit so I don't see how that could affect the results in this
> manner.
>
> We have a moderate indexing load during the time we see this, we seen much
> higher indexing loads without issue but it's also not idle either.  I've
> spent a bunch of time trying to reproduce this, tinkering with queries
> because I assumed that the problem had to be with the search query and not
> with Solr.  Search times for both queries (those with 0 hits and those with
> 10k+ hits) are taking 30-40ms.
>
> Anyone run into something like this?  Any ideas on something to look for?
>
> Thanks,
> Brian Lininger
>


HttpShardHandlerFactory

2019-08-16 Thread Mark Robinson
Hello,

I am trying to understand the socket time out and connection time out in
the HttpShardHandlerFactory:-

   
  10
  20
   

1.Could some one please help me understand the effect of using such low
values of 10 ms
and 20ms as given above inside my /select handler?

2. What is the guidelines for setting these parameters? Should they be low
or high

3. How can I test the effect of this chunk of code after adding it to my
/select handler ie I want to
 make sure the above code snippet is working. That is why I gave such
low values and
 thought when I fire a query I would get both time out errors in the
logs. But did not!
 Or is it that within the above time frame (10 ms, 20ms) if no request
comes the socket will
 time out and the connection will be lost. So to test this should I
give a say 100 TPS load with
 these low values and then increase the values to maybe 1000 ms and
1500 ms respectively
 and see lesser time out error messages?

I am trying to understand how these parameters can be put to good use.

Thanks!
Mark


Re: "Missing" Docs in Solr

2019-08-16 Thread Alexandre Rafalovitch
I would take the server log for those 10 seconds (plus buffer) and really
try to see if something happens in that period.

I am thinking an unexpected commit, index large, alias switch. That may
help you to narrow down the kind of error.

Another option is whether you got empty result or a connection error. I am
thinking firewall that held on but then dropped a connection.

Both of these are unlikely but since you seem to be stuck

Regards,
Alex

On Thu, Aug 15, 2019, 6:02 PM Brian Lininger, 
wrote:

> Hi All,
> I'm seeing some odd behavior that I'm hoping someone might have encountered
> before.  We're using Solr 6.6.6 and very infrequently (happened twice in
> the past year) we're getting 0 hits returned for a query that I know should
> have results.  We've hit this issue once over the past year in 2 separate
> collections (both with a single shard), each with several million
> documents, where a query will return 0 hits.  I see a similar query run
> 5-10s later and it will get the expected # of hits (~1M hits) so I know
> that we haven't reindexed a million docs between the two queries.  Besides
> that I can see that between the 2 queries we only added 150-200 docs with a
> single commit so I don't see how that could affect the results in this
> manner.
>
> We have a moderate indexing load during the time we see this, we seen much
> higher indexing loads without issue but it's also not idle either.  I've
> spent a bunch of time trying to reproduce this, tinkering with queries
> because I assumed that the problem had to be with the search query and not
> with Solr.  Search times for both queries (those with 0 hits and those with
> 10k+ hits) are taking 30-40ms.
>
> Anyone run into something like this?  Any ideas on something to look for?
>
> Thanks,
> Brian Lininger
>


Re: Solr is very slow with term vectors

2019-08-16 Thread Vignan Malyala
I want response time below 3 seconds.
And fyi I'm already using 32 cores.
My cache is already full too and obviously same requests don't occur in my
case.


On Fri 16 Aug, 2019, 11:47 AM Jörn Franke,  wrote:

> How much response time do you require?
> I think you have to solve the issue in your code by introducing higher
> parallelism during calculation and potentially more cores.
>
> Maybe you can also precalculate what you do, cache it and use during
> request the precalculated values.
>
> > Am 16.08.2019 um 05:08 schrieb Vignan Malyala :
> >
> > Hi
> > Any solution for this? Taking around 50 seconds to get response.
> >
> >> On Mon 12 Aug, 2019, 3:28 PM Vignan Malyala, 
> wrote:
> >>
> >> Hi Doug / Walter,
> >>
> >> I'm just using this methodology.
> >> PFB link of my sample code.
> >> https://github.com/saaay71/solr-vector-scoring
> >>
> >> The only issue is speed of response for 1M records.
> >>
> >> On Mon, Aug 12, 2019 at 12:24 AM Walter Underwood <
> wun...@wunderwood.org>
> >> wrote:
> >>
> >>> tf.idf was invented because cosine similarity is too much computation.
> >>> tf.idf gives similar results much, much faster than cosine distance.
> >>>
> >>> I would expect cosine similarity to be slow. I would also expect
> >>> retrieving 1 million records to be slow. Doing both of those in one
> minute
> >>> is pretty good.
> >>>
> >>> As Kernighan and Paugher said in 1978, "Don’t diddle code to make it
> >>> faster—find a better algorithm.”
> >>>
> >>> https://en.wikipedia.org/wiki/The_Elements_of_Programming_Style
> >>>
> >>> wunder
> >>> Walter Underwood
> >>> wun...@wunderwood.org
> >>> http://observer.wunderwood.org/  (my blog)
> >>>
>  On Aug 11, 2019, at 10:40 AM, Doug Turnbull <
> >>> dturnb...@opensourceconnections.com> wrote:
> 
>  Hi Vignan,
> 
>  We need to see more details / code of what your query parser plugin
> does
>  exactly with term vectors, we can't really help you without more
> >>> details.
>  Is it open source? Can you share a minimal example that recreates the
>  problem?
> 
>  On Sun, Aug 11, 2019 at 1:19 PM Vignan Malyala 
> >>> wrote:
> 
> > Hi guys,
> >
> > I made my custom qparser plugin in Solr for scoring. The plugin only
> >>> does
> > cosine similarity of vectors for each record. I use term vectors
> here.
> > Results are fine!
> >
> > BUT, Solr response is very slow with term vectors. It takes around 55
> > seconds for each request for 100 records.
> > How do I make it faster to get my results in ms ?
> > Please respond soon as its lil urgent.
> >
> > Note: All my values are stored and indexed. I am not using Solr
> Cloud.
> >
> 
> 
>  --
>  *Doug Turnbull **| CTO* | OpenSource Connections
>  , LLC | 240.476.9983
>  Author: Relevant Search 
>  This e-mail and all contents, including attachments, is considered to
> be
>  Company Confidential unless explicitly stated otherwise, regardless
>  of whether attachments are marked as such.
> >>>
> >>>
>


Re: Solr is very slow with term vectors

2019-08-16 Thread Jörn Franke
Is your custom query parser multithreaded and leverages all cores?

> Am 16.08.2019 um 13:12 schrieb Vignan Malyala :
> 
> I want response time below 3 seconds.
> And fyi I'm already using 32 cores.
> My cache is already full too and obviously same requests don't occur in my
> case.
> 
> 
>> On Fri 16 Aug, 2019, 11:47 AM Jörn Franke,  wrote:
>> 
>> How much response time do you require?
>> I think you have to solve the issue in your code by introducing higher
>> parallelism during calculation and potentially more cores.
>> 
>> Maybe you can also precalculate what you do, cache it and use during
>> request the precalculated values.
>> 
>>> Am 16.08.2019 um 05:08 schrieb Vignan Malyala :
>>> 
>>> Hi
>>> Any solution for this? Taking around 50 seconds to get response.
>>> 
 On Mon 12 Aug, 2019, 3:28 PM Vignan Malyala, 
>> wrote:
 
 Hi Doug / Walter,
 
 I'm just using this methodology.
 PFB link of my sample code.
 https://github.com/saaay71/solr-vector-scoring
 
 The only issue is speed of response for 1M records.
 
 On Mon, Aug 12, 2019 at 12:24 AM Walter Underwood <
>> wun...@wunderwood.org>
 wrote:
 
> tf.idf was invented because cosine similarity is too much computation.
> tf.idf gives similar results much, much faster than cosine distance.
> 
> I would expect cosine similarity to be slow. I would also expect
> retrieving 1 million records to be slow. Doing both of those in one
>> minute
> is pretty good.
> 
> As Kernighan and Paugher said in 1978, "Don’t diddle code to make it
> faster—find a better algorithm.”
> 
> https://en.wikipedia.org/wiki/The_Elements_of_Programming_Style
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Aug 11, 2019, at 10:40 AM, Doug Turnbull <
> dturnb...@opensourceconnections.com> wrote:
>> 
>> Hi Vignan,
>> 
>> We need to see more details / code of what your query parser plugin
>> does
>> exactly with term vectors, we can't really help you without more
> details.
>> Is it open source? Can you share a minimal example that recreates the
>> problem?
>> 
>> On Sun, Aug 11, 2019 at 1:19 PM Vignan Malyala 
> wrote:
>> 
>>> Hi guys,
>>> 
>>> I made my custom qparser plugin in Solr for scoring. The plugin only
> does
>>> cosine similarity of vectors for each record. I use term vectors
>> here.
>>> Results are fine!
>>> 
>>> BUT, Solr response is very slow with term vectors. It takes around 55
>>> seconds for each request for 100 records.
>>> How do I make it faster to get my results in ms ?
>>> Please respond soon as its lil urgent.
>>> 
>>> Note: All my values are stored and indexed. I am not using Solr
>> Cloud.
>>> 
>> 
>> 
>> --
>> *Doug Turnbull **| CTO* | OpenSource Connections
>> , LLC | 240.476.9983
>> Author: Relevant Search 
>> This e-mail and all contents, including attachments, is considered to
>> be
>> Company Confidential unless explicitly stated otherwise, regardless
>> of whether attachments are marked as such.
> 
> 
>> 


Re: HttpShardHandlerFactory

2019-08-16 Thread Shawn Heisey

On 8/16/2019 3:51 AM, Mark Robinson wrote:

I am trying to understand the socket time out and connection time out in
the HttpShardHandlerFactory:-


   10
   20



The shard handler is used when that Solr instance needs to make 
connections to another Solr instance (which could be itself, as odd as 
that might sound).  It does not apply to the requests that you make from 
outside Solr.



1.Could some one please help me understand the effect of using such low
values of 10 ms
 and 20ms as given above inside my /select handler?


A connection timeout of 10 milliseconds *might* result in connections 
not establishing at all.  This is translated down to the TCP socket as 
the TCP connection timeout -- the time limit imposed on making the TCP 
connection itself.  Which as I understand it, is the completion of the 
"SYN", "SYN/ACK", and "ACK" sequence.  If the two endpoints of the 
connection are on a LAN, you might never see a problem from this -- LAN 
connections are very low latency.  But if they are across the Internet, 
they might never work.


The socket timeout of 20 milliseconds means that if the connection goes 
idle for 20 milliseconds, it will be forcibly closed.  So if it took 25 
milliseconds for the remote Solr instance to respond, this Solr instance 
would have given up and closed the connection.  It is extremely common 
for requests to take 100, 500, 2000, or more milliseconds to respond.



2. What is the guidelines for setting these parameters? Should they be low
or high


I would probably use a value of about 5000 (five seconds) for the 
connection timeout if everything's on a local LAN.  I might go as high 
as 15 seconds if there's a high latency network between them, but five 
seconds is probably long enough too.


For the socket timeout, you want a value that's considerably longer than 
you expect requests to ever take.  Probably somewhere between two and 
five minutes.



3. How can I test the effect of this chunk of code after adding it to my
/select handler ie I want to
  make sure the above code snippet is working. That is why I gave such
low values and
  thought when I fire a query I would get both time out errors in the
logs. But did not!
  Or is it that within the above time frame (10 ms, 20ms) if no request
comes the socket will
  time out and the connection will be lost. So to test this should I
give a say 100 TPS load with
  these low values and then increase the values to maybe 1000 ms and
1500 ms respectively
  and see lesser time out error messages?


If you were running a multi-server SolrCloud setup (or a single-server 
setup with multiple shards and/or replicas), you probably would see 
problems from values that low.  But if Solr never has any need to make 
connections to satisfy a request, then the values will never take effect.


If you want to control these values for requests made from outside Solr, 
you will need to do it in your client software that is making the request.


Thanks,
Shawn


Re: Solr is very slow with term vectors

2019-08-16 Thread Vignan Malyala
How do I check that in solr? Can anyone share link on implementation of
threads in solr?

On Fri 16 Aug, 2019, 4:52 PM Jörn Franke,  wrote:

> Is your custom query parser multithreaded and leverages all cores?
>
> > Am 16.08.2019 um 13:12 schrieb Vignan Malyala :
> >
> > I want response time below 3 seconds.
> > And fyi I'm already using 32 cores.
> > My cache is already full too and obviously same requests don't occur in
> my
> > case.
> >
> >
> >> On Fri 16 Aug, 2019, 11:47 AM Jörn Franke, 
> wrote:
> >>
> >> How much response time do you require?
> >> I think you have to solve the issue in your code by introducing higher
> >> parallelism during calculation and potentially more cores.
> >>
> >> Maybe you can also precalculate what you do, cache it and use during
> >> request the precalculated values.
> >>
> >>> Am 16.08.2019 um 05:08 schrieb Vignan Malyala :
> >>>
> >>> Hi
> >>> Any solution for this? Taking around 50 seconds to get response.
> >>>
>  On Mon 12 Aug, 2019, 3:28 PM Vignan Malyala, 
> >> wrote:
> 
>  Hi Doug / Walter,
> 
>  I'm just using this methodology.
>  PFB link of my sample code.
>  https://github.com/saaay71/solr-vector-scoring
> 
>  The only issue is speed of response for 1M records.
> 
>  On Mon, Aug 12, 2019 at 12:24 AM Walter Underwood <
> >> wun...@wunderwood.org>
>  wrote:
> 
> > tf.idf was invented because cosine similarity is too much
> computation.
> > tf.idf gives similar results much, much faster than cosine distance.
> >
> > I would expect cosine similarity to be slow. I would also expect
> > retrieving 1 million records to be slow. Doing both of those in one
> >> minute
> > is pretty good.
> >
> > As Kernighan and Paugher said in 1978, "Don’t diddle code to make it
> > faster—find a better algorithm.”
> >
> > https://en.wikipedia.org/wiki/The_Elements_of_Programming_Style
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >> On Aug 11, 2019, at 10:40 AM, Doug Turnbull <
> > dturnb...@opensourceconnections.com> wrote:
> >>
> >> Hi Vignan,
> >>
> >> We need to see more details / code of what your query parser plugin
> >> does
> >> exactly with term vectors, we can't really help you without more
> > details.
> >> Is it open source? Can you share a minimal example that recreates
> the
> >> problem?
> >>
> >> On Sun, Aug 11, 2019 at 1:19 PM Vignan Malyala <
> dsmsvig...@gmail.com>
> > wrote:
> >>
> >>> Hi guys,
> >>>
> >>> I made my custom qparser plugin in Solr for scoring. The plugin
> only
> > does
> >>> cosine similarity of vectors for each record. I use term vectors
> >> here.
> >>> Results are fine!
> >>>
> >>> BUT, Solr response is very slow with term vectors. It takes around
> 55
> >>> seconds for each request for 100 records.
> >>> How do I make it faster to get my results in ms ?
> >>> Please respond soon as its lil urgent.
> >>>
> >>> Note: All my values are stored and indexed. I am not using Solr
> >> Cloud.
> >>>
> >>
> >>
> >> --
> >> *Doug Turnbull **| CTO* | OpenSource Connections
> >> , LLC | 240.476.9983
> >> Author: Relevant Search 
> >> This e-mail and all contents, including attachments, is considered
> to
> >> be
> >> Company Confidential unless explicitly stated otherwise, regardless
> >> of whether attachments are marked as such.
> >
> >
> >>
>


Re: Solr is very slow with term vectors

2019-08-16 Thread Jörn Franke
You would have to implement that I don’t think that Solr is threading the query 
parser magically for you, but maybe some people have more insight on this topic.

> Am 16.08.2019 um 15:42 schrieb Vignan Malyala :
> 
> How do I check that in solr? Can anyone share link on implementation of
> threads in solr?
> 
>> On Fri 16 Aug, 2019, 4:52 PM Jörn Franke,  wrote:
>> 
>> Is your custom query parser multithreaded and leverages all cores?
>> 
>>> Am 16.08.2019 um 13:12 schrieb Vignan Malyala :
>>> 
>>> I want response time below 3 seconds.
>>> And fyi I'm already using 32 cores.
>>> My cache is already full too and obviously same requests don't occur in
>> my
>>> case.
>>> 
>>> 
 On Fri 16 Aug, 2019, 11:47 AM Jörn Franke, 
>> wrote:
 
 How much response time do you require?
 I think you have to solve the issue in your code by introducing higher
 parallelism during calculation and potentially more cores.
 
 Maybe you can also precalculate what you do, cache it and use during
 request the precalculated values.
 
> Am 16.08.2019 um 05:08 schrieb Vignan Malyala :
> 
> Hi
> Any solution for this? Taking around 50 seconds to get response.
> 
>> On Mon 12 Aug, 2019, 3:28 PM Vignan Malyala, 
 wrote:
>> 
>> Hi Doug / Walter,
>> 
>> I'm just using this methodology.
>> PFB link of my sample code.
>> https://github.com/saaay71/solr-vector-scoring
>> 
>> The only issue is speed of response for 1M records.
>> 
>> On Mon, Aug 12, 2019 at 12:24 AM Walter Underwood <
 wun...@wunderwood.org>
>> wrote:
>> 
>>> tf.idf was invented because cosine similarity is too much
>> computation.
>>> tf.idf gives similar results much, much faster than cosine distance.
>>> 
>>> I would expect cosine similarity to be slow. I would also expect
>>> retrieving 1 million records to be slow. Doing both of those in one
 minute
>>> is pretty good.
>>> 
>>> As Kernighan and Paugher said in 1978, "Don’t diddle code to make it
>>> faster—find a better algorithm.”
>>> 
>>> https://en.wikipedia.org/wiki/The_Elements_of_Programming_Style
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
 On Aug 11, 2019, at 10:40 AM, Doug Turnbull <
>>> dturnb...@opensourceconnections.com> wrote:
 
 Hi Vignan,
 
 We need to see more details / code of what your query parser plugin
 does
 exactly with term vectors, we can't really help you without more
>>> details.
 Is it open source? Can you share a minimal example that recreates
>> the
 problem?
 
 On Sun, Aug 11, 2019 at 1:19 PM Vignan Malyala <
>> dsmsvig...@gmail.com>
>>> wrote:
 
> Hi guys,
> 
> I made my custom qparser plugin in Solr for scoring. The plugin
>> only
>>> does
> cosine similarity of vectors for each record. I use term vectors
 here.
> Results are fine!
> 
> BUT, Solr response is very slow with term vectors. It takes around
>> 55
> seconds for each request for 100 records.
> How do I make it faster to get my results in ms ?
> Please respond soon as its lil urgent.
> 
> Note: All my values are stored and indexed. I am not using Solr
 Cloud.
> 
 
 
 --
 *Doug Turnbull **| CTO* | OpenSource Connections
 , LLC | 240.476.9983
 Author: Relevant Search 
 This e-mail and all contents, including attachments, is considered
>> to
 be
 Company Confidential unless explicitly stated otherwise, regardless
 of whether attachments are marked as such.
>>> 
>>> 
 
>> 


Solr crash | GC issue

2019-08-16 Thread Rohan Kasat
Hi All,

I have a Solr Cloud setup of 3 solr servers 7.5 version.
24GB heap memory is allocated to each solr server and i have around 655 GB
of data in indexes to be searched for.

Few last 2-3 days, the solr servers are crashing and am able to see the
heap memory is almost full but the CPU usage is just 1 %.

I am attaching the gc logs from 3 servers. Can you please help in analyzing
yje logs and comments to improve

https://gist.github.com/rohankasat/cee8203c0c12983d9839b7a59047733b

-- 

*Regards,Rohan Kasat*


Re: Solr cloud questions

2019-08-16 Thread Kojo
Ere,
thanks for the advice. I don´t have this specific use case, but I am doing
some operations that I think could be risky, due to the first time I am
using.

There is a page that groups by one specific attribute of documents
distributed accros shards. I am using Composite ID to allow grouping
correctly, but I don´t know the performance of this task. This page groups
and lists this attributes like "snippets". And it is allowed to page.

I am doing some graph queries too, using streaming.  As far as I observe,
this features are not causing the problem I described.

Thank you,
Koji






Em sex, 16 de ago de 2019 às 04:34, Ere Maijala 
escreveu:

> Does your web application, by any chance, allow deep paging or something
> like that which requires returning rows at the end of a large result
> set? Something like a query where you could have parameters like
> &rows=10&start=100 ? That can easily cause OOM with Solr when using
> a sharded index. It would typically require a large number of rows to be
> returned and combined from all shards just to get the few rows to return
> in the correct order.
>
> For the above example with 8 shards, Solr would have to fetch 1 000 010
> rows from each shard. That's over 8 million rows! Even if it's just
> identifiers, that's a lot of memory required for an operation that seems
> so simple from the surface.
>
> If this is the case, you'll need to prevent the web application from
> issuing such queries. This may mean something like supporting paging
> only among the first 10 000 results. Typical requirement may also be to
> be able to see the last results of a query, but this can be accomplished
> by allowing sorting in both ascending and descending order.
>
> Regards,
> Ere
>
> Kojo kirjoitti 14.8.2019 klo 16.20:
> > Shawn,
> >
> > Only my web application access this solr. at a first look at http server
> > logs I didnt find something different.  Sometimes I have a very big
> crawler
> > access to my servers, this was my first bet.
> >
> > No scheduled crons running at this time too.
> >
> > I think that I will reconfigure my boxes with two solr nodes each instead
> > of four and increase heap to 16GB. This box only run Solr and has 64Gb.
> > Each Solr will use 16Gb and the box will still have 32Gb for the OS. What
> > do you think?
> >
> > This is a production server, so I will plan to migrate.
> >
> > Regards,
> > Koji
> >
> >
> > Em ter, 13 de ago de 2019 às 12:58, Shawn Heisey 
> > escreveu:
> >
> >> On 8/13/2019 9:28 AM, Kojo wrote:
> >>> Here are the last two gc logs:
> >>>
> >>>
> >>
> https://send.firefox.com/download/6cc902670aa6f7dd/#Ee568G9vUtyK5zr-nAJoMQ
> >>
> >> Thank you for that.
> >>
> >> Analyzing the 20MB gc log actually looks like a pretty healthy system.
> >> That log covers 58 hours of runtime, and everything looks very good to
> me.
> >>
> >> https://www.dropbox.com/s/yu1pyve1bu9maun/gc-analysis-kojo.png?dl=0
> >>
> >> But the small log shows a different story.  That log only covers a
> >> little more than four minutes.
> >>
> >> https://www.dropbox.com/s/vkxfoihh12brbnr/gc-analysis-kojo2.png?dl=0
> >>
> >> What happened at approximately 10:55:15 PM on the day that the smaller
> >> log was produced?  Whatever happened caused Solr's heap usage to
> >> skyrocket and require more than 6GB.
> >>
> >> Thanks,
> >> Shawn
> >>
> >
>
> --
> Ere Maijala
> Kansalliskirjasto / The National Library of Finland
>


Re: "Missing" Docs in Solr

2019-08-16 Thread Brian Lininger
Yeah, I thought of those same problems at first and expected to find
something but no luck.
There are no errors in the solr log for the hour before/after the time that
we saw the issue, the only warnings I see are "PERFORMANCE WARNING:
Overlapping onDeckSearchers=2" but these are for other Collections and
as I understand it, this is really just a load issue not a potential
functional issue.

We're getting a valid response sent back from Solr (the search is logged
with 0 hits) & SolrJ, so it doesn't seem to be a network issue.  We're not
using aliases, but that shouldn't be a problem as updates to aliases are
atomic as I understand them.  GC's also are fine during that period.

It's really weird

On Fri, Aug 16, 2019 at 3:51 AM Alexandre Rafalovitch 
wrote:

> I would take the server log for those 10 seconds (plus buffer) and really
> try to see if something happens in that period.
>
> I am thinking an unexpected commit, index large, alias switch. That may
> help you to narrow down the kind of error.
>
> Another option is whether you got empty result or a connection error. I am
> thinking firewall that held on but then dropped a connection.
>
> Both of these are unlikely but since you seem to be stuck
>
> Regards,
> Alex
>
> On Thu, Aug 15, 2019, 6:02 PM Brian Lininger, 
> wrote:
>
> > Hi All,
> > I'm seeing some odd behavior that I'm hoping someone might have
> encountered
> > before.  We're using Solr 6.6.6 and very infrequently (happened twice in
> > the past year) we're getting 0 hits returned for a query that I know
> should
> > have results.  We've hit this issue once over the past year in 2 separate
> > collections (both with a single shard), each with several million
> > documents, where a query will return 0 hits.  I see a similar query run
> > 5-10s later and it will get the expected # of hits (~1M hits) so I know
> > that we haven't reindexed a million docs between the two queries.
> Besides
> > that I can see that between the 2 queries we only added 150-200 docs
> with a
> > single commit so I don't see how that could affect the results in this
> > manner.
> >
> > We have a moderate indexing load during the time we see this, we seen
> much
> > higher indexing loads without issue but it's also not idle either.  I've
> > spent a bunch of time trying to reproduce this, tinkering with queries
> > because I assumed that the problem had to be with the search query and
> not
> > with Solr.  Search times for both queries (those with 0 hits and those
> with
> > 10k+ hits) are taking 30-40ms.
> >
> > Anyone run into something like this?  Any ideas on something to look for?
> >
> > Thanks,
> > Brian Lininger
> >
>


-- 


*Brian Lininger*
Technical Architect, Infrastructure & Search
*Veeva Systems *
brian.linin...@veeva.com
www.veeva.com

*This email and the information it contains are intended for the intended
recipient only, are confidential and may be privileged information exempt
from disclosure by law.*
*If you have received this email in error, please notify us immediately by
reply email and delete this message from your computer.*
*Please do not retain, copy or distribute this email.*


RE: Solr crash | GC issue

2019-08-16 Thread Paul Russell
For quick analysis we use https://gceasy.io

 

Very information and quick turnaround. 

 

Paul

 

--- Begin Message ---
Hi All,

I have a Solr Cloud setup of 3 solr servers 7.5 version.
24GB heap memory is allocated to each solr server and i have around 655 GB
of data in indexes to be searched for.

Few last 2-3 days, the solr servers are crashing and am able to see the
heap memory is almost full but the CPU usage is just 1 %.

I am attaching the gc logs from 3 servers. Can you please help in analyzing
yje logs and comments to improve

https://gist.github.com/rohankasat/cee8203c0c12983d9839b7a59047733b

-- 

*Regards,Rohan Kasat*
--- End Message ---


Re: Solr crash | GC issue

2019-08-16 Thread Shawn Heisey

On 8/16/2019 8:23 AM, Rohan Kasat wrote:

I have a Solr Cloud setup of 3 solr servers 7.5 version.
24GB heap memory is allocated to each solr server and i have around 655 GB
of data in indexes to be searched for.

Few last 2-3 days, the solr servers are crashing and am able to see the
heap memory is almost full but the CPU usage is just 1 %.

I am attaching the gc logs from 3 servers. Can you please help in analyzing
yje logs and comments to improve

https://gist.github.com/rohankasat/cee8203c0c12983d9839b7a59047733b


These three GC logs do not indicate that all the heap is used.

The peak heap usage during these GC logs is 18.86GB, 19.42GB, and 
18.91GB.  That's quite a bit below the 24GB max.


There are some very long GC pauses recorded.  Increasing the heap size 
MIGHT help with that, or it might not.


The typical way that Solr appears to "crash" is when an OutOfMemoryError 
exception is thrown, at which time a Solr instance that is running on an 
OS like Linux will kill itself with a -9 signal.  This scripting is not 
present when starting on Windows.


An OOME can be thrown for a resource other than memory, so despite the 
exception name, it might not actually be memory that has been depleted. 
The exception will need to be examined to learn why it was thrown.


GC logs do not indicate the cause of OOME.  If that information is 
logged at all, and it might not be, it will be in solr.log.


Looking at the GC logs to see how your Solr is laid out... the following 
command might find the cause, if it was logged, and if the relevant log 
has not been rotated out:


grep -r OutOfMemory /apps/solr/solr_data/logs/*

At the very least it might help you find out which log file to 
investigate further.


Thanks,
Shawn


Re: Solr is very slow with term vectors

2019-08-16 Thread Jan Høydahl
I bet your main issue is assuming that this particular plugin is the only way 
to solve your ranking requirements.
I would advise you to start looking into the various built-in Similarities and 
instead try to tweak one of those, and/or adding more ranking signals to your 
solution, perhaps see if ReRanking on top 1000 hits is good enough etc. Not 
knowing anything about what lead you to that custom bad-performing 3rd party 
plugin in the first place, it is hard to guess, but take 10 steps back and 
re-consider that choice.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 16. aug. 2019 kl. 15:50 skrev Jörn Franke :
> 
> You would have to implement that I don’t think that Solr is threading the 
> query parser magically for you, but maybe some people have more insight on 
> this topic.
> 
>> Am 16.08.2019 um 15:42 schrieb Vignan Malyala :
>> 
>> How do I check that in solr? Can anyone share link on implementation of
>> threads in solr?
>> 
>>> On Fri 16 Aug, 2019, 4:52 PM Jörn Franke,  wrote:
>>> 
>>> Is your custom query parser multithreaded and leverages all cores?
>>> 
 Am 16.08.2019 um 13:12 schrieb Vignan Malyala :
 
 I want response time below 3 seconds.
 And fyi I'm already using 32 cores.
 My cache is already full too and obviously same requests don't occur in
>>> my
 case.
 
 
> On Fri 16 Aug, 2019, 11:47 AM Jörn Franke, 
>>> wrote:
> 
> How much response time do you require?
> I think you have to solve the issue in your code by introducing higher
> parallelism during calculation and potentially more cores.
> 
> Maybe you can also precalculate what you do, cache it and use during
> request the precalculated values.
> 
>> Am 16.08.2019 um 05:08 schrieb Vignan Malyala :
>> 
>> Hi
>> Any solution for this? Taking around 50 seconds to get response.
>> 
>>> On Mon 12 Aug, 2019, 3:28 PM Vignan Malyala, 
> wrote:
>>> 
>>> Hi Doug / Walter,
>>> 
>>> I'm just using this methodology.
>>> PFB link of my sample code.
>>> https://github.com/saaay71/solr-vector-scoring
>>> 
>>> The only issue is speed of response for 1M records.
>>> 
>>> On Mon, Aug 12, 2019 at 12:24 AM Walter Underwood <
> wun...@wunderwood.org>
>>> wrote:
>>> 
 tf.idf was invented because cosine similarity is too much
>>> computation.
 tf.idf gives similar results much, much faster than cosine distance.
 
 I would expect cosine similarity to be slow. I would also expect
 retrieving 1 million records to be slow. Doing both of those in one
> minute
 is pretty good.
 
 As Kernighan and Paugher said in 1978, "Don’t diddle code to make it
 faster—find a better algorithm.”
 
 https://en.wikipedia.org/wiki/The_Elements_of_Programming_Style
 
 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/  (my blog)
 
> On Aug 11, 2019, at 10:40 AM, Doug Turnbull <
 dturnb...@opensourceconnections.com> wrote:
> 
> Hi Vignan,
> 
> We need to see more details / code of what your query parser plugin
> does
> exactly with term vectors, we can't really help you without more
 details.
> Is it open source? Can you share a minimal example that recreates
>>> the
> problem?
> 
> On Sun, Aug 11, 2019 at 1:19 PM Vignan Malyala <
>>> dsmsvig...@gmail.com>
 wrote:
> 
>> Hi guys,
>> 
>> I made my custom qparser plugin in Solr for scoring. The plugin
>>> only
 does
>> cosine similarity of vectors for each record. I use term vectors
> here.
>> Results are fine!
>> 
>> BUT, Solr response is very slow with term vectors. It takes around
>>> 55
>> seconds for each request for 100 records.
>> How do I make it faster to get my results in ms ?
>> Please respond soon as its lil urgent.
>> 
>> Note: All my values are stored and indexed. I am not using Solr
> Cloud.
>> 
> 
> 
> --
> *Doug Turnbull **| CTO* | OpenSource Connections
> , LLC | 240.476.9983
> Author: Relevant Search 
> This e-mail and all contents, including attachments, is considered
>>> to
> be
> Company Confidential unless explicitly stated otherwise, regardless
> of whether attachments are marked as such.
 
 
> 
>>> 



Re: Solr cloud questions

2019-08-16 Thread Shawn Heisey

On 8/15/2019 8:14 AM, Kojo wrote:

I am starting to think that my setup has more than one problem.
As I said before, I am not balancing my load to Solr nodes, and I have
eight nodes. All of my web application requests go to one Solr node, the
only one that dies. If I distribute the load across the other nodes, is it
possible that these problems may end?

Even if I downsize the Solr cloud setup to 2 boxes 2 nodes each with less
shards than the 16 shards that I have now, I would like to know your
oppinion about the question above.


Based on those GC logs, we have 58 hours of good steady operation, 
followed by something bad.  Something happened in those few minutes that 
*didn't* happen in the previous 58 hours.


You could try increasing the heap beyond 6GB, but depending on what went 
wrong, that might not help.  And as Erick was hinting at, large heaps 
can create their own problems.


The better option is to figure out what's happening when it all goes bad 
and keep that from happening.  Load balancing might help, or it might 
cause whatever's happening on the one node to happen to all your nodes.


Thanks,
Shawn


Re: "Missing" Docs in Solr

2019-08-16 Thread Alexandre Rafalovitch
is there several 0 results in a row as an anomaly. Or really just one?

You could nearly add SolrJ code to rerun 0-result query with full debug on
if it is a rare enough event.

Regards,
 Alex

On Fri, Aug 16, 2019, 12:05 PM Brian Lininger, 
wrote:

> Yeah, I thought of those same problems at first and expected to find
> something but no luck.
> There are no errors in the solr log for the hour before/after the time that
> we saw the issue, the only warnings I see are "PERFORMANCE WARNING:
> Overlapping onDeckSearchers=2" but these are for other Collections and
> as I understand it, this is really just a load issue not a potential
> functional issue.
>
> We're getting a valid response sent back from Solr (the search is logged
> with 0 hits) & SolrJ, so it doesn't seem to be a network issue.  We're not
> using aliases, but that shouldn't be a problem as updates to aliases are
> atomic as I understand them.  GC's also are fine during that period.
>
> It's really weird
>
> On Fri, Aug 16, 2019 at 3:51 AM Alexandre Rafalovitch 
> wrote:
>
> > I would take the server log for those 10 seconds (plus buffer) and really
> > try to see if something happens in that period.
> >
> > I am thinking an unexpected commit, index large, alias switch. That may
> > help you to narrow down the kind of error.
> >
> > Another option is whether you got empty result or a connection error. I
> am
> > thinking firewall that held on but then dropped a connection.
> >
> > Both of these are unlikely but since you seem to be stuck
> >
> > Regards,
> > Alex
> >
> > On Thu, Aug 15, 2019, 6:02 PM Brian Lininger, 
> > wrote:
> >
> > > Hi All,
> > > I'm seeing some odd behavior that I'm hoping someone might have
> > encountered
> > > before.  We're using Solr 6.6.6 and very infrequently (happened twice
> in
> > > the past year) we're getting 0 hits returned for a query that I know
> > should
> > > have results.  We've hit this issue once over the past year in 2
> separate
> > > collections (both with a single shard), each with several million
> > > documents, where a query will return 0 hits.  I see a similar query run
> > > 5-10s later and it will get the expected # of hits (~1M hits) so I know
> > > that we haven't reindexed a million docs between the two queries.
> > Besides
> > > that I can see that between the 2 queries we only added 150-200 docs
> > with a
> > > single commit so I don't see how that could affect the results in this
> > > manner.
> > >
> > > We have a moderate indexing load during the time we see this, we seen
> > much
> > > higher indexing loads without issue but it's also not idle either.
> I've
> > > spent a bunch of time trying to reproduce this, tinkering with queries
> > > because I assumed that the problem had to be with the search query and
> > not
> > > with Solr.  Search times for both queries (those with 0 hits and those
> > with
> > > 10k+ hits) are taking 30-40ms.
> > >
> > > Anyone run into something like this?  Any ideas on something to look
> for?
> > >
> > > Thanks,
> > > Brian Lininger
> > >
> >
>
>
> --
>
>
> *Brian Lininger*
> Technical Architect, Infrastructure & Search
> *Veeva Systems *
> brian.linin...@veeva.com
> www.veeva.com
>
> *This email and the information it contains are intended for the intended
> recipient only, are confidential and may be privileged information exempt
> from disclosure by law.*
> *If you have received this email in error, please notify us immediately by
> reply email and delete this message from your computer.*
> *Please do not retain, copy or distribute this email.*
>


Re: "Missing" Docs in Solr

2019-08-16 Thread Brian Lininger
It's just a single query that results in 0 hits, I had the same thought of
just adding code to retry the query when we get 0 hits (assuming that we
expect there to be hits).  That's likely going to be the interim solution
so that we can get more info when this occurs.  It's hard to triage when
it's only happened 2 times in 7+ years of Solr usage ;)
Thanks!

On Fri, Aug 16, 2019 at 9:58 AM Alexandre Rafalovitch 
wrote:

> is there several 0 results in a row as an anomaly. Or really just one?
>
> You could nearly add SolrJ code to rerun 0-result query with full debug on
> if it is a rare enough event.
>
> Regards,
>  Alex
>
> On Fri, Aug 16, 2019, 12:05 PM Brian Lininger, 
> wrote:
>
> > Yeah, I thought of those same problems at first and expected to find
> > something but no luck.
> > There are no errors in the solr log for the hour before/after the time
> that
> > we saw the issue, the only warnings I see are "PERFORMANCE WARNING:
> > Overlapping onDeckSearchers=2" but these are for other Collections
> and
> > as I understand it, this is really just a load issue not a potential
> > functional issue.
> >
> > We're getting a valid response sent back from Solr (the search is logged
> > with 0 hits) & SolrJ, so it doesn't seem to be a network issue.  We're
> not
> > using aliases, but that shouldn't be a problem as updates to aliases are
> > atomic as I understand them.  GC's also are fine during that period.
> >
> > It's really weird
> >
> > On Fri, Aug 16, 2019 at 3:51 AM Alexandre Rafalovitch <
> arafa...@gmail.com>
> > wrote:
> >
> > > I would take the server log for those 10 seconds (plus buffer) and
> really
> > > try to see if something happens in that period.
> > >
> > > I am thinking an unexpected commit, index large, alias switch. That may
> > > help you to narrow down the kind of error.
> > >
> > > Another option is whether you got empty result or a connection error. I
> > am
> > > thinking firewall that held on but then dropped a connection.
> > >
> > > Both of these are unlikely but since you seem to be stuck
> > >
> > > Regards,
> > > Alex
> > >
> > > On Thu, Aug 15, 2019, 6:02 PM Brian Lininger, <
> brian.linin...@veeva.com>
> > > wrote:
> > >
> > > > Hi All,
> > > > I'm seeing some odd behavior that I'm hoping someone might have
> > > encountered
> > > > before.  We're using Solr 6.6.6 and very infrequently (happened twice
> > in
> > > > the past year) we're getting 0 hits returned for a query that I know
> > > should
> > > > have results.  We've hit this issue once over the past year in 2
> > separate
> > > > collections (both with a single shard), each with several million
> > > > documents, where a query will return 0 hits.  I see a similar query
> run
> > > > 5-10s later and it will get the expected # of hits (~1M hits) so I
> know
> > > > that we haven't reindexed a million docs between the two queries.
> > > Besides
> > > > that I can see that between the 2 queries we only added 150-200 docs
> > > with a
> > > > single commit so I don't see how that could affect the results in
> this
> > > > manner.
> > > >
> > > > We have a moderate indexing load during the time we see this, we seen
> > > much
> > > > higher indexing loads without issue but it's also not idle either.
> > I've
> > > > spent a bunch of time trying to reproduce this, tinkering with
> queries
> > > > because I assumed that the problem had to be with the search query
> and
> > > not
> > > > with Solr.  Search times for both queries (those with 0 hits and
> those
> > > with
> > > > 10k+ hits) are taking 30-40ms.
> > > >
> > > > Anyone run into something like this?  Any ideas on something to look
> > for?
> > > >
> > > > Thanks,
> > > > Brian Lininger
> > > >
> > >
> >
> >
> > --
> >
> >
> > *Brian Lininger*
> > Technical Architect, Infrastructure & Search
> > *Veeva Systems *
> > brian.linin...@veeva.com
> > www.veeva.com
> >
> > *This email and the information it contains are intended for the intended
> > recipient only, are confidential and may be privileged information exempt
> > from disclosure by law.*
> > *If you have received this email in error, please notify us immediately
> by
> > reply email and delete this message from your computer.*
> > *Please do not retain, copy or distribute this email.*
> >
>


-- 


*Brian Lininger*
Technical Architect, Infrastructure & Search
*Veeva Systems *
brian.linin...@veeva.com
www.veeva.com

*This email and the information it contains are intended for the intended
recipient only, are confidential and may be privileged information exempt
from disclosure by law.*
*If you have received this email in error, please notify us immediately by
reply email and delete this message from your computer.*
*Please do not retain, copy or distribute this email.*


Re: Solr is very slow with term vectors

2019-08-16 Thread Walter Underwood
First, time fetching one million records with all the fields you need, both for 
display and for re-ranking. If that is slow, then no amount of cosine code 
tweaking will make it fast.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Aug 16, 2019, at 9:23 AM, Jan Høydahl  wrote:
> 
> I bet your main issue is assuming that this particular plugin is the only way 
> to solve your ranking requirements.
> I would advise you to start looking into the various built-in Similarities 
> and instead try to tweak one of those, and/or adding more ranking signals to 
> your solution, perhaps see if ReRanking on top 1000 hits is good enough etc. 
> Not knowing anything about what lead you to that custom bad-performing 3rd 
> party plugin in the first place, it is hard to guess, but take 10 steps back 
> and re-consider that choice.
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
>> 16. aug. 2019 kl. 15:50 skrev Jörn Franke :
>> 
>> You would have to implement that I don’t think that Solr is threading the 
>> query parser magically for you, but maybe some people have more insight on 
>> this topic.
>> 
>>> Am 16.08.2019 um 15:42 schrieb Vignan Malyala :
>>> 
>>> How do I check that in solr? Can anyone share link on implementation of
>>> threads in solr?
>>> 
 On Fri 16 Aug, 2019, 4:52 PM Jörn Franke,  wrote:
 
 Is your custom query parser multithreaded and leverages all cores?
 
> Am 16.08.2019 um 13:12 schrieb Vignan Malyala :
> 
> I want response time below 3 seconds.
> And fyi I'm already using 32 cores.
> My cache is already full too and obviously same requests don't occur in
 my
> case.
> 
> 
>> On Fri 16 Aug, 2019, 11:47 AM Jörn Franke, 
 wrote:
>> 
>> How much response time do you require?
>> I think you have to solve the issue in your code by introducing higher
>> parallelism during calculation and potentially more cores.
>> 
>> Maybe you can also precalculate what you do, cache it and use during
>> request the precalculated values.
>> 
>>> Am 16.08.2019 um 05:08 schrieb Vignan Malyala :
>>> 
>>> Hi
>>> Any solution for this? Taking around 50 seconds to get response.
>>> 
 On Mon 12 Aug, 2019, 3:28 PM Vignan Malyala, 
>> wrote:
 
 Hi Doug / Walter,
 
 I'm just using this methodology.
 PFB link of my sample code.
 https://github.com/saaay71/solr-vector-scoring
 
 The only issue is speed of response for 1M records.
 
 On Mon, Aug 12, 2019 at 12:24 AM Walter Underwood <
>> wun...@wunderwood.org>
 wrote:
 
> tf.idf was invented because cosine similarity is too much
 computation.
> tf.idf gives similar results much, much faster than cosine distance.
> 
> I would expect cosine similarity to be slow. I would also expect
> retrieving 1 million records to be slow. Doing both of those in one
>> minute
> is pretty good.
> 
> As Kernighan and Paugher said in 1978, "Don’t diddle code to make it
> faster—find a better algorithm.”
> 
> https://en.wikipedia.org/wiki/The_Elements_of_Programming_Style
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Aug 11, 2019, at 10:40 AM, Doug Turnbull <
> dturnb...@opensourceconnections.com> wrote:
>> 
>> Hi Vignan,
>> 
>> We need to see more details / code of what your query parser plugin
>> does
>> exactly with term vectors, we can't really help you without more
> details.
>> Is it open source? Can you share a minimal example that recreates
 the
>> problem?
>> 
>> On Sun, Aug 11, 2019 at 1:19 PM Vignan Malyala <
 dsmsvig...@gmail.com>
> wrote:
>> 
>>> Hi guys,
>>> 
>>> I made my custom qparser plugin in Solr for scoring. The plugin
 only
> does
>>> cosine similarity of vectors for each record. I use term vectors
>> here.
>>> Results are fine!
>>> 
>>> BUT, Solr response is very slow with term vectors. It takes around
 55
>>> seconds for each request for 100 records.
>>> How do I make it faster to get my results in ms ?
>>> Please respond soon as its lil urgent.
>>> 
>>> Note: All my values are stored and indexed. I am not using Solr
>> Cloud.
>>> 
>> 
>> 
>> --
>> *Doug Turnbull **| CTO* | OpenSource Connections
>> , LLC | 240.476.9983
>> Author: Relevant Search 
>> This e-mail and all contents, including attachments, i

Re: Solr crash | GC issue

2019-08-16 Thread Rohan Kasat
Thanks Shawn and Paul.
I tried using the https://gceasy.io/ but was not able to understand much.

I see the OOM file getting created with "not much heap space" as the error
.
Shawn, i have tried your CMS settings too and now will try increasing the
heap memory, hope it works this time.
Any things specific i should be checking ?

Regards,
Rohan Kasat




On Fri, Aug 16, 2019 at 12:23 PM Shawn Heisey  wrote:

> On 8/16/2019 8:23 AM, Rohan Kasat wrote:
> > I have a Solr Cloud setup of 3 solr servers 7.5 version.
> > 24GB heap memory is allocated to each solr server and i have around 655
> GB
> > of data in indexes to be searched for.
> >
> > Few last 2-3 days, the solr servers are crashing and am able to see the
> > heap memory is almost full but the CPU usage is just 1 %.
> >
> > I am attaching the gc logs from 3 servers. Can you please help in
> analyzing
> > yje logs and comments to improve
> >
> > https://gist.github.com/rohankasat/cee8203c0c12983d9839b7a59047733b
>
> These three GC logs do not indicate that all the heap is used.
>
> The peak heap usage during these GC logs is 18.86GB, 19.42GB, and
> 18.91GB.  That's quite a bit below the 24GB max.
>
> There are some very long GC pauses recorded.  Increasing the heap size
> MIGHT help with that, or it might not.
>
> The typical way that Solr appears to "crash" is when an OutOfMemoryError
> exception is thrown, at which time a Solr instance that is running on an
> OS like Linux will kill itself with a -9 signal.  This scripting is not
> present when starting on Windows.
>
> An OOME can be thrown for a resource other than memory, so despite the
> exception name, it might not actually be memory that has been depleted.
> The exception will need to be examined to learn why it was thrown.
>
> GC logs do not indicate the cause of OOME.  If that information is
> logged at all, and it might not be, it will be in solr.log.
>
> Looking at the GC logs to see how your Solr is laid out... the following
> command might find the cause, if it was logged, and if the relevant log
> has not been rotated out:
>
> grep -r OutOfMemory /apps/solr/solr_data/logs/*
>
> At the very least it might help you find out which log file to
> investigate further.
>
> Thanks,
> Shawn
>


-- 

*Regards,Rohan Kasat*


Re: Solr crash | GC issue

2019-08-16 Thread Shawn Heisey

On 8/16/2019 11:59 AM, Rohan Kasat wrote:

I see the OOM file getting created with "not much heap space" as the error


Can you get the precise error cause?  I haven't ever seen that 
particular text before.  If you can paste the entire error (which will 
be many lines), that can be helpful.



Shawn, i have tried your CMS settings too and now will try increasing the
heap memory, hope it works this time.


Changing GC tuning can never fix an OOME problem.  The only way to fix 
it is to increase the resource that's running out or adjust things so 
less of that resource is needed.


Thanks,
Shawn


Re: Solr crash | GC issue

2019-08-16 Thread Rohan Kasat
Thanks Shawn.
I saw that error when the solr crashed last time. Am waiting to see if it
happens again and to capture the compete error log.

Regards,
Rohan Kasat

On Fri, Aug 16, 2019 at 2:36 PM Shawn Heisey  wrote:

> On 8/16/2019 11:59 AM, Rohan Kasat wrote:
> > I see the OOM file getting created with "not much heap space" as the
> error
>
> Can you get the precise error cause?  I haven't ever seen that
> particular text before.  If you can paste the entire error (which will
> be many lines), that can be helpful.
>
> > Shawn, i have tried your CMS settings too and now will try increasing the
> > heap memory, hope it works this time.
>
> Changing GC tuning can never fix an OOME problem.  The only way to fix
> it is to increase the resource that's running out or adjust things so
> less of that resource is needed.
>
> Thanks,
> Shawn
>
-- 

*Regards,Rohan Kasat*


Re: using let() with other streaming expressions

2019-08-16 Thread Joel Bernstein
Yes, the examples you show will fail because the "let" expression reads
streams into an in-memory List. All the Streaming Expressions expect a
TupleStream to be passed in rather that a List.

There is an undocumented function that turns a List of tuples back into a
Stream. The function is called "stream".

Here is the syntax:

let(
  a=search(techproducts, q="cat:electronics", fl="id, manu, price",
sort="id asc"),
  b=search(techproducts, q="cat:electronics", fl="id, popularity,
_version_", sort="id asc"),
  c=innerJoin(stream(a),stream(b), on=id)
)




Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, Aug 16, 2019 at 4:30 AM Viktors Belovs 
wrote:

> Dear Solr Comunity,
>
> Recently I've been working with the 'let()' expression.
> And I got in a sort of trouble, when I was trying combining it with the
> different streaming expressions,
> as well as trying to re-assign variables.
>
> As an example:
> let(
>   a=search(techproducts, q="cat:electronics", fl="id, manu, price",
> sort="id asc"),
>   b=search(techproducts, q="cat:electronics", fl="id, popularity,
> _version_", sort="id asc"),
>   c=innerJoin(a, b, on=id)
> )
>
> In case with re-assigning the variables:
> let(
>   a=search(techproducts, q="cat:electronics", fl="id, manu, price",
> sort="id asc"),
>   b=a,
>   c=innerJoin(a, b, on=id)
> )
>
> According to documentation (Solr v8.1 the version I use) its possible to
> store any kind values with 'let()'
> function but it seems the usage of such a function is strictly limited for
> specific mathematical operations.
>
> I was wondering if there is possible way to reduce the verbosity and
> (potentially)
> increase the efficiency of the streaming expression's performance, while
> dealing and constructing complex
> combinations of different streaming expressions.
>
> I assume the 'let()' doesn't suited for such purposes, but perhaps there
> is an alternative way to do such a thing.
>
> Regards,
> Viktors