Re: Solr is very slow with term vectors

Walter Underwood Fri, 16 Aug 2019 10:22:42 -0700

First, time fetching one million records with all the fields you need, both for 
display and for re-ranking. If that is slow, then no amount of cosine code 
tweaking will make it fast.


wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Aug 16, 2019, at 9:23 AM, Jan Høydahl <jan....@cominvent.com> wrote:
> 
> I bet your main issue is assuming that this particular plugin is the only way 
> to solve your ranking requirements.
> I would advise you to start looking into the various built-in Similarities 
> and instead try to tweak one of those, and/or adding more ranking signals to 
> your solution, perhaps see if ReRanking on top 1000 hits is good enough etc. 
> Not knowing anything about what lead you to that custom bad-performing 3rd 
> party plugin in the first place, it is hard to guess, but take 10 steps back 
> and re-consider that choice.
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
>> 16. aug. 2019 kl. 15:50 skrev Jörn Franke <jornfra...@gmail.com>:
>> 
>> You would have to implement that I don’t think that Solr is threading the 
>> query parser magically for you, but maybe some people have more insight on 
>> this topic.
>> 
>>> Am 16.08.2019 um 15:42 schrieb Vignan Malyala <dsmsvig...@gmail.com>:
>>> 
>>> How do I check that in solr? Can anyone share link on implementation of
>>> threads in solr?
>>> 
>>>> On Fri 16 Aug, 2019, 4:52 PM Jörn Franke, <jornfra...@gmail.com> wrote:
>>>> 
>>>> Is your custom query parser multithreaded and leverages all cores?
>>>> 
>>>>> Am 16.08.2019 um 13:12 schrieb Vignan Malyala <dsmsvig...@gmail.com>:
>>>>> 
>>>>> I want response time below 3 seconds.
>>>>> And fyi I'm already using 32 cores.
>>>>> My cache is already full too and obviously same requests don't occur in
>>>> my
>>>>> case.
>>>>> 
>>>>> 
>>>>>> On Fri 16 Aug, 2019, 11:47 AM Jörn Franke, <jornfra...@gmail.com>
>>>> wrote:
>>>>>> 
>>>>>> How much response time do you require?
>>>>>> I think you have to solve the issue in your code by introducing higher
>>>>>> parallelism during calculation and potentially more cores.
>>>>>> 
>>>>>> Maybe you can also precalculate what you do, cache it and use during
>>>>>> request the precalculated values.
>>>>>> 
>>>>>>> Am 16.08.2019 um 05:08 schrieb Vignan Malyala <dsmsvig...@gmail.com>:
>>>>>>> 
>>>>>>> Hi
>>>>>>> Any solution for this? Taking around 50 seconds to get response.
>>>>>>> 
>>>>>>>> On Mon 12 Aug, 2019, 3:28 PM Vignan Malyala, <dsmsvig...@gmail.com>
>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Hi Doug / Walter,
>>>>>>>> 
>>>>>>>> I'm just using this methodology.
>>>>>>>> PFB link of my sample code.
>>>>>>>> https://github.com/saaay71/solr-vector-scoring
>>>>>>>> 
>>>>>>>> The only issue is speed of response for 1M records.
>>>>>>>> 
>>>>>>>> On Mon, Aug 12, 2019 at 12:24 AM Walter Underwood <
>>>>>> wun...@wunderwood.org>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> tf.idf was invented because cosine similarity is too much
>>>> computation.
>>>>>>>>> tf.idf gives similar results much, much faster than cosine distance.
>>>>>>>>> 
>>>>>>>>> I would expect cosine similarity to be slow. I would also expect
>>>>>>>>> retrieving 1 million records to be slow. Doing both of those in one
>>>>>> minute
>>>>>>>>> is pretty good.
>>>>>>>>> 
>>>>>>>>> As Kernighan and Paugher said in 1978, "Don’t diddle code to make it
>>>>>>>>> faster—find a better algorithm.”
>>>>>>>>> 
>>>>>>>>> https://en.wikipedia.org/wiki/The_Elements_of_Programming_Style
>>>>>>>>> 
>>>>>>>>> wunder
>>>>>>>>> Walter Underwood
>>>>>>>>> wun...@wunderwood.org
>>>>>>>>> http://observer.wunderwood.org/  (my blog)
>>>>>>>>> 
>>>>>>>>>> On Aug 11, 2019, at 10:40 AM, Doug Turnbull <
>>>>>>>>> dturnb...@opensourceconnections.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi Vignan,
>>>>>>>>>> 
>>>>>>>>>> We need to see more details / code of what your query parser plugin
>>>>>> does
>>>>>>>>>> exactly with term vectors, we can't really help you without more
>>>>>>>>> details.
>>>>>>>>>> Is it open source? Can you share a minimal example that recreates
>>>> the
>>>>>>>>>> problem?
>>>>>>>>>> 
>>>>>>>>>> On Sun, Aug 11, 2019 at 1:19 PM Vignan Malyala <
>>>> dsmsvig...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi guys,
>>>>>>>>>>> 
>>>>>>>>>>> I made my custom qparser plugin in Solr for scoring. The plugin
>>>> only
>>>>>>>>> does
>>>>>>>>>>> cosine similarity of vectors for each record. I use term vectors
>>>>>> here.
>>>>>>>>>>> Results are fine!
>>>>>>>>>>> 
>>>>>>>>>>> BUT, Solr response is very slow with term vectors. It takes around
>>>> 55
>>>>>>>>>>> seconds for each request for 1000000 records.
>>>>>>>>>>> How do I make it faster to get my results in ms ?
>>>>>>>>>>> Please respond soon as its lil urgent.
>>>>>>>>>>> 
>>>>>>>>>>> Note: All my values are stored and indexed. I am not using Solr
>>>>>> Cloud.
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> *Doug Turnbull **| CTO* | OpenSource Connections
>>>>>>>>>> <http://opensourceconnections.com>, LLC | 240.476.9983
>>>>>>>>>> Author: Relevant Search <http://manning.com/turnbull>
>>>>>>>>>> This e-mail and all contents, including attachments, is considered
>>>> to
>>>>>> be
>>>>>>>>>> Company Confidential unless explicitly stated otherwise, regardless
>>>>>>>>>> of whether attachments are marked as such.
>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>> 
>

Re: Solr is very slow with term vectors

Reply via email to