I reviewed the dismax docs and it doesn't support the fieldname:term portion of the lucene syntax. To restrict a search to a field and use mm you can either A) use edismax exactly as you're currently trying to use dismax B) use dismax, with the following changes * remove the title: portion of the query and just pass q="title-123123123-end" * set qf=title
On Tue, Aug 29, 2017 at 10:25 AM Josh Lincoln <josh.linc...@gmail.com> wrote: > Darko, > Can you use edismax instead? > > When using dismax, solr is parsing the title field as if it's a query > term. E.g. the query seems to be interpreted as > title "title-123123123-end" > (note the lack of a colon)...which results in querying all your qf fields > for both "title" and "title-123123123-end" > I haven't used dismax in a very long time, so I don't know if this is > intentional, but it's not what I expected. > > I'm able to reproduce the issue in 6.4.2 using the default techproducts > Notice that in the below the parsedquery expands to both text:title and > text:name (df=text) > http://localhost:8983/solr/techproducts/select?indent=on&q=title > :"name"&wt=json&debug=true&defType=dismax > rawquerystring: "title:"name"", > querystring: "title:"name"", > parsedquery: "(+(DisjunctionMaxQuery(((text:title)^1.0)) > DisjunctionMaxQuery(((text:name)^1.0))) ())/no_coord", > parsedquery_toString: "+(((text:title)^1.0) ((text:name)^1.0)) ()" > > But it's not an issue if you use edismax > http://localhost:8983/solr/techproducts/select?indent=on&q=title > :"name"&wt=json&debug=true&defType=edismax > rawquerystring: "title:"name"", > querystring: "title:"name"", > parsedquery: "(+title:name)/no_coord", > parsedquery_toString: "+title:name", > > > > On Tue, Aug 29, 2017 at 8:44 AM Darko Todoric <todo...@mdpi.com> wrote: > >> Hi Erick, >> >> "debug":{ "rawquerystring":"title:\"title-123123123-end\"", >> "querystring":"title:\"title-123123123-end\"", >> "parsedquery":"(+(DisjunctionMaxQuery(((author_full:title)^7.0 | >> (abstract:titl)^2.0 | (title:titl)^3.0 | (keywords:titl)^5.0 | >> (authors:title)^4.0 | (doi:title:)^1.0)) >> DisjunctionMaxQuery(((author_full:\"title 123123123 end\"~1)^7.0 | >> (abstract:\"titl 123123123 end\"~1)^2.0 | (title:\"titl 123123123 >> end\"~1)^3.0 | (keywords:\"titl 123123123 end\"~1)^5.0 | >> (authors:\"title 123123123 end\"~1)^4.0 | >> (doi:title-123123123-end)^1.0)))~1 ())/no_coord", >> "parsedquery_toString":"+((((author_full:title)^7.0 | >> (abstract:titl)^2.0 | (title:titl)^3.0 | (keywords:titl)^5.0 | >> (authors:title)^4.0 | (doi:title:)^1.0) ((author_full:\"title 123123123 >> end\"~1)^7.0 | (abstract:\"titl 123123123 end\"~1)^2.0 | (title:\"titl >> 123123123 end\"~1)^3.0 | (keywords:\"titl 123123123 end\"~1)^5.0 | >> (authors:\"title 123123123 end\"~1)^4.0 | >> (doi:title-123123123-end)^1.0))~1) ()", "explain":{ "23251":"\n16.848969 >> = sum of:\n 16.848969 = sum of:\n 16.848969 = max of:\n 16.848969 = >> weight(abstract:titl in 23194) [], result of:\n 16.848969 = >> score(doc=23194,freq=1.0 = termFreq=1.0\n), product of:\n 2.0 = boost\n >> 5.503748 = idf(docFreq=74, docCount=18297)\n 1.5306814 = tfNorm, >> computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75 = >> parameter b\n 186.49593 = avgFieldLength\n 28.444445 = fieldLength\n >> 3.816711E-5 = weight(title:titl in 23194) [], result of:\n 3.816711E-5 = >> score(doc=23194,freq=1.0 = termFreq=1.0\n), product of:\n 3.0 = boost\n >> 1.4457239E-5 = idf(docFreq=34584, docCount=34584)\n 0.88 = tfNorm, >> computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75 = >> parameter b\n 3.0 = avgFieldLength\n 4.0 = fieldLength\n", >> "20495":"\n16.169483 = sum of:\n 16.169483 = sum of:\n 16.169483 = max >> of:\n 16.169483 = weight(abstract:titl in 20489) [], result of:\n >> 16.169483 = score(doc=20489,freq=1.0 = termFreq=1.0\n), product of:\n >> 2.0 = boost\n 5.503748 = idf(docFreq=74, docCount=18297)\n 1.468952 = >> tfNorm, computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75 >> = parameter b\n 186.49593 = avgFieldLength\n 40.96 = fieldLength\n >> 3.816711E-5 = weight(title:titl in 20489) [], result of:\n 3.816711E-5 = >> score(doc=20489,freq=1.0 = termFreq=1.0\n), product of:\n 3.0 = boost\n >> 1.4457239E-5 = idf(docFreq=34584, docCount=34584)\n 0.88 = tfNorm, >> computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75 = >> parameter b\n 3.0 = avgFieldLength\n 4.0 = fieldLength\n", >> "28227":"\n15.670726 = sum of:\n 15.670726 = sum of:\n 15.670726 = max >> of:\n 15.670726 = weight(abstract:titl in 28156) [], result of:\n >> 15.670726 = score(doc=28156,freq=2.0 = termFreq=2.0\n), product of:\n >> 2.0 = boost\n 5.503748 = idf(docFreq=74, docCount=18297)\n 1.4236413 = >> tfNorm, computed from:\n 2.0 = termFreq=2.0\n 1.2 = parameter k1\n 0.75 >> = parameter b\n 186.49593 = avgFieldLength\n 163.84 = fieldLength\n >> 3.816711E-5 = weight(title:titl in 28156) [], result of:\n 3.816711E-5 = >> score(doc=28156,freq=1.0 = termFreq=1.0\n), product of:\n 3.0 = boost\n >> 1.4457239E-5 = idf(docFreq=34584, docCount=34584)\n 0.88 = tfNorm, >> computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75 = >> parameter b\n 3.0 = avgFieldLength\n 4.0 = fieldLength\n", >> "20375":"\n15.052014 = sum of:\n 15.052014 = sum of:\n 15.052014 = max >> of:\n 15.052014 = weight(abstract:titl in 20369) [], result of:\n >> 15.052014 = score(doc=20369,freq=1.0 = termFreq=1.0\n), product of:\n >> 2.0 = boost\n 5.503748 = idf(docFreq=74, docCount=18297)\n 1.3674331 = >> tfNorm, computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75 >> = parameter b\n 186.49593 = avgFieldLength\n 64.0 = fieldLength\n >> 3.816711E-5 = weight(title:titl in 20369) [], result of:\n 3.816711E-5 = >> score(doc=20369,freq=1.0 = termFreq=1.0\n), product of:\n 3.0 = boost\n >> 1.4457239E-5 = idf(docFreq=34584, docCount=34584)\n 0.88 = tfNorm, >> computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75 = >> parameter b\n 3.0 = avgFieldLength\n 4.0 = fieldLength\n", >> "20381":"\n15.052014 = sum of:\n 15.052014 = sum of:\n 15.052014 = max >> of:\n 15.052014 = weight(abstract:titl in 20375) [], result of:\n >> 15.052014 = score(doc=20375,freq=1.0 = termFreq=1.0\n), product of:\n >> 2.0 = boost\n 5.503748 = idf(docFreq=74, docCount=18297)\n 1.3674331 = >> tfNorm, computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75 >> = parameter b\n 186.49593 = avgFieldLength\n 64.0 = fieldLength\n >> 3.816711E-5 = weight(title:titl in 20375) [], result of:\n 3.816711E-5 = >> score(doc=20375,freq=1.0 = termFreq=1.0\n), product of:\n 3.0 = boost\n >> 1.4457239E-5 = idf(docFreq=34584, docCount=34584)\n 0.88 = tfNorm, >> computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75 = >> parameter b\n 3.0 = avgFieldLength\n 4.0 = fieldLength\n", >> "29030":"\n13.699375 = sum of:\n 13.699375 = sum of:\n 13.699375 = max >> of:\n 13.699375 = weight(abstract:titl in 28959) [], result of:\n >> 13.699375 = score(doc=28959,freq=2.0 = termFreq=2.0\n), product of:\n >> 2.0 = boost\n 5.503748 = idf(docFreq=74, docCount=18297)\n 1.2445496 = >> tfNorm, computed from:\n 2.0 = termFreq=2.0\n 1.2 = parameter k1\n 0.75 >> = parameter b\n 186.49593 = avgFieldLength\n 256.0 = fieldLength\n >> 3.816711E-5 = weight(title:titl in 28959) [], result of:\n 3.816711E-5 = >> score(doc=28959,freq=1.0 = termFreq=1.0\n), product of:\n 3.0 = boost\n >> 1.4457239E-5 = idf(docFreq=34584, docCount=34584)\n 0.88 = tfNorm, >> computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75 = >> parameter b\n 3.0 = avgFieldLength\n 4.0 = fieldLength\n", >> "31444":"\n13.699375 = sum of:\n 13.699375 = sum of:\n 13.699375 = max >> of:\n 13.699375 = weight(abstract:titl in 31373) [], result of:\n >> 13.699375 = score(doc=31373,freq=2.0 = termFreq=2.0\n), product of:\n >> 2.0 = boost\n 5.503748 = idf(docFreq=74, docCount=18297)\n 1.2445496 = >> tfNorm, computed from:\n 2.0 = termFreq=2.0\n 1.2 = parameter k1\n 0.75 >> = parameter b\n 186.49593 = avgFieldLength\n 256.0 = fieldLength\n >> 3.816711E-5 = weight(title:titl in 31373) [], result of:\n 3.816711E-5 = >> score(doc=31373,freq=1.0 = termFreq=1.0\n), product of:\n 3.0 = boost\n >> 1.4457239E-5 = idf(docFreq=34584, docCount=34584)\n 0.88 = tfNorm, >> computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75 = >> parameter b\n 3.0 = avgFieldLength\n 4.0 = fieldLength\n", >> "30621":"\n13.096554 = sum of:\n 13.096554 = sum of:\n 13.096554 = max >> of:\n 13.096554 = weight(abstract:titl in 30550) [], result of:\n >> 13.096554 = score(doc=30550,freq=1.0 = termFreq=1.0\n), product of:\n >> 2.0 = boost\n 5.503748 = idf(docFreq=74, docCount=18297)\n 1.189785 = >> tfNorm, computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75 >> = parameter b\n 186.49593 = avgFieldLength\n 113.77778 = fieldLength\n >> 3.816711E-5 = weight(title:titl in 30550) [], result of:\n 3.816711E-5 = >> score(doc=30550,freq=1.0 = termFreq=1.0\n), product of:\n 3.0 = boost\n >> 1.4457239E-5 = idf(docFreq=34584, docCount=34584)\n 0.88 = tfNorm, >> computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75 = >> parameter b\n 3.0 = avgFieldLength\n 4.0 = fieldLength\n", >> "32067":"\n13.096554 = sum of:\n 13.096554 = sum of:\n 13.096554 = max >> of:\n 13.096554 = weight(abstract:titl in 31996) [], result of:\n >> 13.096554 = score(doc=31996,freq=1.0 = termFreq=1.0\n), product of:\n >> 2.0 = boost\n 5.503748 = idf(docFreq=74, docCount=18297)\n 1.189785 = >> tfNorm, computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75 >> = parameter b\n 186.49593 = avgFieldLength\n 113.77778 = fieldLength\n >> 3.816711E-5 = weight(title:titl in 31996) [], result of:\n 3.816711E-5 = >> score(doc=31996,freq=1.0 = termFreq=1.0\n), product of:\n 3.0 = boost\n >> 1.4457239E-5 = idf(docFreq=34584, docCount=34584)\n 0.88 = tfNorm, >> computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75 = >> parameter b\n 3.0 = avgFieldLength\n 4.0 = fieldLength\n", >> "1935":"\n11.583146 = sum of:\n 11.583146 = sum of:\n 11.583146 = max >> of:\n 11.583146 = weight(abstract:titl in 1934) [], result of:\n >> 11.583146 = score(doc=1934,freq=1.0 = termFreq=1.0\n), product of:\n 2.0 >> = boost\n 5.503748 = idf(docFreq=74, docCount=18297)\n 1.0522962 = >> tfNorm, computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75 >> = parameter b\n 186.49593 = avgFieldLength\n 163.84 = fieldLength\n >> 3.816711E-5 = weight(title:titl in 1934) [], result of:\n 3.816711E-5 = >> score(doc=1934,freq=1.0 = termFreq=1.0\n), product of:\n 3.0 = boost\n >> 1.4457239E-5 = idf(docFreq=34584, docCount=34584)\n 0.88 = tfNorm, >> computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75 = >> parameter b\n 3.0 = avgFieldLength\n 4.0 = fieldLength\n"}, >> "QParser":"DisMaxQParser", "altquerystring":null, "boostfuncs":null, >> >> Kind regards, >> Darko Todoric >> >> On 08/28/2017 06:35 PM, Erick Erickson wrote: >> > What are the results of adding &debug=query to the URL? The parsed >> > query will be especially illuminating. >> > >> > Best, >> > Erick >> > >> > On Mon, Aug 28, 2017 at 4:37 AM, Emir Arnautovic >> > <emir.arnauto...@sematext.com> wrote: >> >> Hi Darko, >> >> >> >> The issue is the wrong expectations: title-1-end is parsed to 3 tokens >> >> (guessing) and mm=99% of 3 tokens is 2.99 and it is rounded down to 2. >> Since >> >> all your documents have 'title' and 'end' tokens, all match. If you >> want to >> >> round up, you can use mm=-1% - that will result in zero (or one match >> if you >> >> do not filter out original document). >> >> >> >> You have to play with your tokenizers and define what is similarity >> match >> >> percentage (if you want to stick with mm). >> >> >> >> Regards, >> >> Emir >> >> >> >> >> >> >> >> On 28.08.2017 09:17, Darko Todoric wrote: >> >>> Hm... I cannot make that this DisMax work on my Solr... >> >>> >> >>> In solr I have document with title: >> >>> - "title-1-end" >> >>> - "title-2-end" >> >>> - "title-3-end" >> >>> - ... >> >>> - ... >> >>> - "title-312-end" >> >>> >> >>> and when I make query >> >>> "* >> http://localhost:8983/solr/SciLit/select?defType=dismax&indent=on&mm=99%&q=title >> :"title-123123123-end"&wt=json*' >> >>> I get all documents from solr :\ >> >>> What I doing wrong? >> >>> >> >>> Also, I don't know if affecting results, but on "title" field I use >> >>> "WhitespaceTokenizerFactory". >> >>> >> >>> Kind regards, >> >>> Darko >> >>> >> >>> >> >>> On 08/25/2017 06:38 PM, Junte Zhang wrote: >> >>>> If you already have the title of the document, then you could run >> that >> >>>> title as a new query against the whole index and exclude the source >> document >> >>>> from the results as a filter. >> >>>> >> >>>> You could use the DisMax query parser: >> >>>> >> https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser >> >>>> >> >>>> And then set the minimum match ratio of the OR clauses to 90%. >> >>>> >> >>>> /JZ >> >>>> >> >>>> -----Original Message----- >> >>>> From: Darko Todoric [mailto:todo...@mdpi.com] >> >>>> Sent: Friday, August 25, 2017 5:49 PM >> >>>> To: solr-user@lucene.apache.org >> >>>> Subject: Search by similarity? >> >>>> >> >>>> Hi, >> >>>> >> >>>> >> >>>> I have 90.000.000 documents in Solr and I need to compare "title" of >> this >> >>>> document and get all documents with more than 80% similarity. PHP >> have >> >>>> "similar_text" but it's not so smart inserting 90m documents in the >> array... >> >>>> Can I do some query in Solr which will give me the more the 80% >> >>>> similarity? >> >>>> >> >>>> >> >>>> Kind regards, >> >>>> Darko Todoric >> >>>> >> >>>> -- >> >>>> Darko Todoric >> >>>> Web Engineer, MDPI DOO >> >>>> Veljka Dugosevica 54, 11060 Belgrade, Serbia >> >>>> +381 65 43 90 620 >> >>>> www.mdpi.com >> >>>> >> >>>> Disclaimer: The information and files contained in this message are >> >>>> confidential and intended solely for the use of the individual or >> entity to >> >>>> whom they are addressed. >> >>>> f you have received this message in error, please notify me and >> delete >> >>>> this message from your system. >> >>>> You may not copy this message in its entirety or in part, or >> disclose its >> >>>> contents to anyone. >> >>>> >> >> -- >> >> Monitoring * Alerting * Anomaly Detection * Centralized Log Management >> >> Solr & Elasticsearch Support * http://sematext.com/ >> >> >> >> -- >> Darko Todoric >> Web Engineer, MDPI DOO >> Veljka Dugosevica 54, 11060 Belgrade, Serbia >> +381 65 43 90 620 >> www.mdpi.com >> >> Disclaimer: The information and files contained in this message are >> confidential >> and intended solely for the use of the individual or entity to whom they >> are addressed. >> f you have received this message in error, please notify me and delete >> this message from your system. >> You may not copy this message in its entirety or in part, or disclose its >> contents to anyone. >> >>