Re: Search by similarity?

Erick Erickson Mon, 28 Aug 2017 09:36:05 -0700

What are the results of adding &debug=query to the URL? The parsed
query will be especially illuminating.


Best,
Erick

On Mon, Aug 28, 2017 at 4:37 AM, Emir Arnautovic
<emir.arnauto...@sematext.com> wrote:
> Hi Darko,
>
> The issue is the wrong expectations: title-1-end is parsed to 3 tokens
> (guessing) and mm=99% of 3 tokens is 2.99 and it is rounded down to 2. Since
> all your documents have 'title' and 'end' tokens, all match. If you want to
> round up, you can use mm=-1% - that will result in zero (or one match if you
> do not filter out original document).
>
> You have to play with your tokenizers and define what is similarity match
> percentage (if you want to stick with mm).
>
> Regards,
> Emir
>
>
>
> On 28.08.2017 09:17, Darko Todoric wrote:
>>
>> Hm... I cannot make that this DisMax work on my Solr...
>>
>> In solr I have document with title:
>>  - "title-1-end"
>>  - "title-2-end"
>>  - "title-3-end"
>>  - ...
>>  - ...
>>  - "title-312-end"
>>
>> and when I make query
>> "*http://localhost:8983/solr/SciLit/select?defType=dismax&indent=on&mm=99%&q=title:"title-123123123-end"&wt=json*'
>> I get all documents from solr :\
>> What I doing wrong?
>>
>> Also, I don't know if affecting results, but on "title" field I use
>> "WhitespaceTokenizerFactory".
>>
>> Kind regards,
>> Darko
>>
>>
>> On 08/25/2017 06:38 PM, Junte Zhang wrote:
>>>
>>> If you already have the title of the document, then you could run that
>>> title as a new query against the whole index and exclude the source document
>>> from the results as a filter.
>>>
>>> You could use the DisMax query parser:
>>> https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser
>>>
>>> And then set the minimum match ratio of the OR clauses to 90%.
>>>
>>> /JZ
>>>
>>> -----Original Message-----
>>> From: Darko Todoric [mailto:todo...@mdpi.com]
>>> Sent: Friday, August 25, 2017 5:49 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Search by similarity?
>>>
>>> Hi,
>>>
>>>
>>> I have 90.000.000 documents in Solr and I need to compare "title" of this
>>> document and get all documents with more than 80% similarity. PHP have
>>> "similar_text" but it's not so smart inserting 90m documents in the array...
>>> Can I do some query in Solr which will give me the more the 80%
>>> similarity?
>>>
>>>
>>> Kind regards,
>>> Darko Todoric
>>>
>>> --
>>> Darko Todoric
>>> Web Engineer, MDPI DOO
>>> Veljka Dugosevica 54, 11060 Belgrade, Serbia
>>> +381 65 43 90 620
>>> www.mdpi.com
>>>
>>> Disclaimer: The information and files contained in this message are
>>> confidential and intended solely for the use of the individual or entity to
>>> whom they are addressed.
>>> f you have received this message in error, please notify me and delete
>>> this message from your system.
>>> You may not copy this message in its entirety or in part, or disclose its
>>> contents to anyone.
>>>
>>
>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>

Re: Search by similarity?

Reply via email to