What are the results of adding &debug=query to the URL? The parsed query will be especially illuminating.
Best, Erick On Mon, Aug 28, 2017 at 4:37 AM, Emir Arnautovic <emir.arnauto...@sematext.com> wrote: > Hi Darko, > > The issue is the wrong expectations: title-1-end is parsed to 3 tokens > (guessing) and mm=99% of 3 tokens is 2.99 and it is rounded down to 2. Since > all your documents have 'title' and 'end' tokens, all match. If you want to > round up, you can use mm=-1% - that will result in zero (or one match if you > do not filter out original document). > > You have to play with your tokenizers and define what is similarity match > percentage (if you want to stick with mm). > > Regards, > Emir > > > > On 28.08.2017 09:17, Darko Todoric wrote: >> >> Hm... I cannot make that this DisMax work on my Solr... >> >> In solr I have document with title: >> - "title-1-end" >> - "title-2-end" >> - "title-3-end" >> - ... >> - ... >> - "title-312-end" >> >> and when I make query >> "*http://localhost:8983/solr/SciLit/select?defType=dismax&indent=on&mm=99%&q=title:"title-123123123-end"&wt=json*' >> I get all documents from solr :\ >> What I doing wrong? >> >> Also, I don't know if affecting results, but on "title" field I use >> "WhitespaceTokenizerFactory". >> >> Kind regards, >> Darko >> >> >> On 08/25/2017 06:38 PM, Junte Zhang wrote: >>> >>> If you already have the title of the document, then you could run that >>> title as a new query against the whole index and exclude the source document >>> from the results as a filter. >>> >>> You could use the DisMax query parser: >>> https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser >>> >>> And then set the minimum match ratio of the OR clauses to 90%. >>> >>> /JZ >>> >>> -----Original Message----- >>> From: Darko Todoric [mailto:todo...@mdpi.com] >>> Sent: Friday, August 25, 2017 5:49 PM >>> To: solr-user@lucene.apache.org >>> Subject: Search by similarity? >>> >>> Hi, >>> >>> >>> I have 90.000.000 documents in Solr and I need to compare "title" of this >>> document and get all documents with more than 80% similarity. PHP have >>> "similar_text" but it's not so smart inserting 90m documents in the array... >>> Can I do some query in Solr which will give me the more the 80% >>> similarity? >>> >>> >>> Kind regards, >>> Darko Todoric >>> >>> -- >>> Darko Todoric >>> Web Engineer, MDPI DOO >>> Veljka Dugosevica 54, 11060 Belgrade, Serbia >>> +381 65 43 90 620 >>> www.mdpi.com >>> >>> Disclaimer: The information and files contained in this message are >>> confidential and intended solely for the use of the individual or entity to >>> whom they are addressed. >>> f you have received this message in error, please notify me and delete >>> this message from your system. >>> You may not copy this message in its entirety or in part, or disclose its >>> contents to anyone. >>> >> > > -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elasticsearch Support * http://sematext.com/ >