I reviewed the dismax docs and it doesn't support the fieldname:term
portion of the lucene syntax.
To restrict a search to a field and use mm you can either
A) use edismax exactly as you're currently trying to use dismax
B) use dismax, with the following changes
* remove the title: portion of the query and just pass
q="title-123123123-end"
* set qf=title

On Tue, Aug 29, 2017 at 10:25 AM Josh Lincoln <josh.linc...@gmail.com>
wrote:

> Darko,
> Can you use edismax instead?
>
> When using dismax, solr is parsing the title field as if it's a query
> term. E.g. the query seems to be interpreted as
> title "title-123123123-end"
> (note the lack of a colon)...which results in querying all your qf fields
> for both "title" and "title-123123123-end"
> I haven't used dismax in a very long time, so I don't know if this is
> intentional, but it's not what I expected.
>
> I'm able to reproduce the issue in 6.4.2 using the default techproducts
> Notice that in the below the parsedquery expands to both text:title and
> text:name (df=text)
> http://localhost:8983/solr/techproducts/select?indent=on&q=title
> :"name"&wt=json&debug=true&defType=dismax
> rawquerystring: "title:"name"",
> querystring: "title:"name"",
> parsedquery: "(+(DisjunctionMaxQuery(((text:title)^1.0))
> DisjunctionMaxQuery(((text:name)^1.0))) ())/no_coord",
> parsedquery_toString: "+(((text:title)^1.0) ((text:name)^1.0)) ()"
>
> But it's not an issue if you use edismax
> http://localhost:8983/solr/techproducts/select?indent=on&q=title
> :"name"&wt=json&debug=true&defType=edismax
> rawquerystring: "title:"name"",
> querystring: "title:"name"",
> parsedquery: "(+title:name)/no_coord",
> parsedquery_toString: "+title:name",
>
>
>
> On Tue, Aug 29, 2017 at 8:44 AM Darko Todoric <todo...@mdpi.com> wrote:
>
>> Hi Erick,
>>
>> "debug":{ "rawquerystring":"title:\"title-123123123-end\"",
>> "querystring":"title:\"title-123123123-end\"",
>> "parsedquery":"(+(DisjunctionMaxQuery(((author_full:title)^7.0 |
>> (abstract:titl)^2.0 | (title:titl)^3.0 | (keywords:titl)^5.0 |
>> (authors:title)^4.0 | (doi:title:)^1.0))
>> DisjunctionMaxQuery(((author_full:\"title 123123123 end\"~1)^7.0 |
>> (abstract:\"titl 123123123 end\"~1)^2.0 | (title:\"titl 123123123
>> end\"~1)^3.0 | (keywords:\"titl 123123123 end\"~1)^5.0 |
>> (authors:\"title 123123123 end\"~1)^4.0 |
>> (doi:title-123123123-end)^1.0)))~1 ())/no_coord",
>> "parsedquery_toString":"+((((author_full:title)^7.0 |
>> (abstract:titl)^2.0 | (title:titl)^3.0 | (keywords:titl)^5.0 |
>> (authors:title)^4.0 | (doi:title:)^1.0) ((author_full:\"title 123123123
>> end\"~1)^7.0 | (abstract:\"titl 123123123 end\"~1)^2.0 | (title:\"titl
>> 123123123 end\"~1)^3.0 | (keywords:\"titl 123123123 end\"~1)^5.0 |
>> (authors:\"title 123123123 end\"~1)^4.0 |
>> (doi:title-123123123-end)^1.0))~1) ()", "explain":{ "23251":"\n16.848969
>> = sum of:\n 16.848969 = sum of:\n 16.848969 = max of:\n 16.848969 =
>> weight(abstract:titl in 23194) [], result of:\n 16.848969 =
>> score(doc=23194,freq=1.0 = termFreq=1.0\n), product of:\n 2.0 = boost\n
>> 5.503748 = idf(docFreq=74, docCount=18297)\n 1.5306814 = tfNorm,
>> computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75 =
>> parameter b\n 186.49593 = avgFieldLength\n 28.444445 = fieldLength\n
>> 3.816711E-5 = weight(title:titl in 23194) [], result of:\n 3.816711E-5 =
>> score(doc=23194,freq=1.0 = termFreq=1.0\n), product of:\n 3.0 = boost\n
>> 1.4457239E-5 = idf(docFreq=34584, docCount=34584)\n 0.88 = tfNorm,
>> computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75 =
>> parameter b\n 3.0 = avgFieldLength\n 4.0 = fieldLength\n",
>> "20495":"\n16.169483 = sum of:\n 16.169483 = sum of:\n 16.169483 = max
>> of:\n 16.169483 = weight(abstract:titl in 20489) [], result of:\n
>> 16.169483 = score(doc=20489,freq=1.0 = termFreq=1.0\n), product of:\n
>> 2.0 = boost\n 5.503748 = idf(docFreq=74, docCount=18297)\n 1.468952 =
>> tfNorm, computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75
>> = parameter b\n 186.49593 = avgFieldLength\n 40.96 = fieldLength\n
>> 3.816711E-5 = weight(title:titl in 20489) [], result of:\n 3.816711E-5 =
>> score(doc=20489,freq=1.0 = termFreq=1.0\n), product of:\n 3.0 = boost\n
>> 1.4457239E-5 = idf(docFreq=34584, docCount=34584)\n 0.88 = tfNorm,
>> computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75 =
>> parameter b\n 3.0 = avgFieldLength\n 4.0 = fieldLength\n",
>> "28227":"\n15.670726 = sum of:\n 15.670726 = sum of:\n 15.670726 = max
>> of:\n 15.670726 = weight(abstract:titl in 28156) [], result of:\n
>> 15.670726 = score(doc=28156,freq=2.0 = termFreq=2.0\n), product of:\n
>> 2.0 = boost\n 5.503748 = idf(docFreq=74, docCount=18297)\n 1.4236413 =
>> tfNorm, computed from:\n 2.0 = termFreq=2.0\n 1.2 = parameter k1\n 0.75
>> = parameter b\n 186.49593 = avgFieldLength\n 163.84 = fieldLength\n
>> 3.816711E-5 = weight(title:titl in 28156) [], result of:\n 3.816711E-5 =
>> score(doc=28156,freq=1.0 = termFreq=1.0\n), product of:\n 3.0 = boost\n
>> 1.4457239E-5 = idf(docFreq=34584, docCount=34584)\n 0.88 = tfNorm,
>> computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75 =
>> parameter b\n 3.0 = avgFieldLength\n 4.0 = fieldLength\n",
>> "20375":"\n15.052014 = sum of:\n 15.052014 = sum of:\n 15.052014 = max
>> of:\n 15.052014 = weight(abstract:titl in 20369) [], result of:\n
>> 15.052014 = score(doc=20369,freq=1.0 = termFreq=1.0\n), product of:\n
>> 2.0 = boost\n 5.503748 = idf(docFreq=74, docCount=18297)\n 1.3674331 =
>> tfNorm, computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75
>> = parameter b\n 186.49593 = avgFieldLength\n 64.0 = fieldLength\n
>> 3.816711E-5 = weight(title:titl in 20369) [], result of:\n 3.816711E-5 =
>> score(doc=20369,freq=1.0 = termFreq=1.0\n), product of:\n 3.0 = boost\n
>> 1.4457239E-5 = idf(docFreq=34584, docCount=34584)\n 0.88 = tfNorm,
>> computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75 =
>> parameter b\n 3.0 = avgFieldLength\n 4.0 = fieldLength\n",
>> "20381":"\n15.052014 = sum of:\n 15.052014 = sum of:\n 15.052014 = max
>> of:\n 15.052014 = weight(abstract:titl in 20375) [], result of:\n
>> 15.052014 = score(doc=20375,freq=1.0 = termFreq=1.0\n), product of:\n
>> 2.0 = boost\n 5.503748 = idf(docFreq=74, docCount=18297)\n 1.3674331 =
>> tfNorm, computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75
>> = parameter b\n 186.49593 = avgFieldLength\n 64.0 = fieldLength\n
>> 3.816711E-5 = weight(title:titl in 20375) [], result of:\n 3.816711E-5 =
>> score(doc=20375,freq=1.0 = termFreq=1.0\n), product of:\n 3.0 = boost\n
>> 1.4457239E-5 = idf(docFreq=34584, docCount=34584)\n 0.88 = tfNorm,
>> computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75 =
>> parameter b\n 3.0 = avgFieldLength\n 4.0 = fieldLength\n",
>> "29030":"\n13.699375 = sum of:\n 13.699375 = sum of:\n 13.699375 = max
>> of:\n 13.699375 = weight(abstract:titl in 28959) [], result of:\n
>> 13.699375 = score(doc=28959,freq=2.0 = termFreq=2.0\n), product of:\n
>> 2.0 = boost\n 5.503748 = idf(docFreq=74, docCount=18297)\n 1.2445496 =
>> tfNorm, computed from:\n 2.0 = termFreq=2.0\n 1.2 = parameter k1\n 0.75
>> = parameter b\n 186.49593 = avgFieldLength\n 256.0 = fieldLength\n
>> 3.816711E-5 = weight(title:titl in 28959) [], result of:\n 3.816711E-5 =
>> score(doc=28959,freq=1.0 = termFreq=1.0\n), product of:\n 3.0 = boost\n
>> 1.4457239E-5 = idf(docFreq=34584, docCount=34584)\n 0.88 = tfNorm,
>> computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75 =
>> parameter b\n 3.0 = avgFieldLength\n 4.0 = fieldLength\n",
>> "31444":"\n13.699375 = sum of:\n 13.699375 = sum of:\n 13.699375 = max
>> of:\n 13.699375 = weight(abstract:titl in 31373) [], result of:\n
>> 13.699375 = score(doc=31373,freq=2.0 = termFreq=2.0\n), product of:\n
>> 2.0 = boost\n 5.503748 = idf(docFreq=74, docCount=18297)\n 1.2445496 =
>> tfNorm, computed from:\n 2.0 = termFreq=2.0\n 1.2 = parameter k1\n 0.75
>> = parameter b\n 186.49593 = avgFieldLength\n 256.0 = fieldLength\n
>> 3.816711E-5 = weight(title:titl in 31373) [], result of:\n 3.816711E-5 =
>> score(doc=31373,freq=1.0 = termFreq=1.0\n), product of:\n 3.0 = boost\n
>> 1.4457239E-5 = idf(docFreq=34584, docCount=34584)\n 0.88 = tfNorm,
>> computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75 =
>> parameter b\n 3.0 = avgFieldLength\n 4.0 = fieldLength\n",
>> "30621":"\n13.096554 = sum of:\n 13.096554 = sum of:\n 13.096554 = max
>> of:\n 13.096554 = weight(abstract:titl in 30550) [], result of:\n
>> 13.096554 = score(doc=30550,freq=1.0 = termFreq=1.0\n), product of:\n
>> 2.0 = boost\n 5.503748 = idf(docFreq=74, docCount=18297)\n 1.189785 =
>> tfNorm, computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75
>> = parameter b\n 186.49593 = avgFieldLength\n 113.77778 = fieldLength\n
>> 3.816711E-5 = weight(title:titl in 30550) [], result of:\n 3.816711E-5 =
>> score(doc=30550,freq=1.0 = termFreq=1.0\n), product of:\n 3.0 = boost\n
>> 1.4457239E-5 = idf(docFreq=34584, docCount=34584)\n 0.88 = tfNorm,
>> computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75 =
>> parameter b\n 3.0 = avgFieldLength\n 4.0 = fieldLength\n",
>> "32067":"\n13.096554 = sum of:\n 13.096554 = sum of:\n 13.096554 = max
>> of:\n 13.096554 = weight(abstract:titl in 31996) [], result of:\n
>> 13.096554 = score(doc=31996,freq=1.0 = termFreq=1.0\n), product of:\n
>> 2.0 = boost\n 5.503748 = idf(docFreq=74, docCount=18297)\n 1.189785 =
>> tfNorm, computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75
>> = parameter b\n 186.49593 = avgFieldLength\n 113.77778 = fieldLength\n
>> 3.816711E-5 = weight(title:titl in 31996) [], result of:\n 3.816711E-5 =
>> score(doc=31996,freq=1.0 = termFreq=1.0\n), product of:\n 3.0 = boost\n
>> 1.4457239E-5 = idf(docFreq=34584, docCount=34584)\n 0.88 = tfNorm,
>> computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75 =
>> parameter b\n 3.0 = avgFieldLength\n 4.0 = fieldLength\n",
>> "1935":"\n11.583146 = sum of:\n 11.583146 = sum of:\n 11.583146 = max
>> of:\n 11.583146 = weight(abstract:titl in 1934) [], result of:\n
>> 11.583146 = score(doc=1934,freq=1.0 = termFreq=1.0\n), product of:\n 2.0
>> = boost\n 5.503748 = idf(docFreq=74, docCount=18297)\n 1.0522962 =
>> tfNorm, computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75
>> = parameter b\n 186.49593 = avgFieldLength\n 163.84 = fieldLength\n
>> 3.816711E-5 = weight(title:titl in 1934) [], result of:\n 3.816711E-5 =
>> score(doc=1934,freq=1.0 = termFreq=1.0\n), product of:\n 3.0 = boost\n
>> 1.4457239E-5 = idf(docFreq=34584, docCount=34584)\n 0.88 = tfNorm,
>> computed from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75 =
>> parameter b\n 3.0 = avgFieldLength\n 4.0 = fieldLength\n"},
>> "QParser":"DisMaxQParser", "altquerystring":null, "boostfuncs":null,
>>
>> Kind regards,
>> Darko Todoric
>>
>> On 08/28/2017 06:35 PM, Erick Erickson wrote:
>> > What are the results of adding &debug=query to the URL? The parsed
>> > query will be especially illuminating.
>> >
>> > Best,
>> > Erick
>> >
>> > On Mon, Aug 28, 2017 at 4:37 AM, Emir Arnautovic
>> > <emir.arnauto...@sematext.com> wrote:
>> >> Hi Darko,
>> >>
>> >> The issue is the wrong expectations: title-1-end is parsed to 3 tokens
>> >> (guessing) and mm=99% of 3 tokens is 2.99 and it is rounded down to 2.
>> Since
>> >> all your documents have 'title' and 'end' tokens, all match. If you
>> want to
>> >> round up, you can use mm=-1% - that will result in zero (or one match
>> if you
>> >> do not filter out original document).
>> >>
>> >> You have to play with your tokenizers and define what is similarity
>> match
>> >> percentage (if you want to stick with mm).
>> >>
>> >> Regards,
>> >> Emir
>> >>
>> >>
>> >>
>> >> On 28.08.2017 09:17, Darko Todoric wrote:
>> >>> Hm... I cannot make that this DisMax work on my Solr...
>> >>>
>> >>> In solr I have document with title:
>> >>>   - "title-1-end"
>> >>>   - "title-2-end"
>> >>>   - "title-3-end"
>> >>>   - ...
>> >>>   - ...
>> >>>   - "title-312-end"
>> >>>
>> >>> and when I make query
>> >>> "*
>> http://localhost:8983/solr/SciLit/select?defType=dismax&indent=on&mm=99%&q=title
>> :"title-123123123-end"&wt=json*'
>> >>> I get all documents from solr :\
>> >>> What I doing wrong?
>> >>>
>> >>> Also, I don't know if affecting results, but on "title" field I use
>> >>> "WhitespaceTokenizerFactory".
>> >>>
>> >>> Kind regards,
>> >>> Darko
>> >>>
>> >>>
>> >>> On 08/25/2017 06:38 PM, Junte Zhang wrote:
>> >>>> If you already have the title of the document, then you could run
>> that
>> >>>> title as a new query against the whole index and exclude the source
>> document
>> >>>> from the results as a filter.
>> >>>>
>> >>>> You could use the DisMax query parser:
>> >>>>
>> https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser
>> >>>>
>> >>>> And then set the minimum match ratio of the OR clauses to 90%.
>> >>>>
>> >>>> /JZ
>> >>>>
>> >>>> -----Original Message-----
>> >>>> From: Darko Todoric [mailto:todo...@mdpi.com]
>> >>>> Sent: Friday, August 25, 2017 5:49 PM
>> >>>> To: solr-user@lucene.apache.org
>> >>>> Subject: Search by similarity?
>> >>>>
>> >>>> Hi,
>> >>>>
>> >>>>
>> >>>> I have 90.000.000 documents in Solr and I need to compare "title" of
>> this
>> >>>> document and get all documents with more than 80% similarity. PHP
>> have
>> >>>> "similar_text" but it's not so smart inserting 90m documents in the
>> array...
>> >>>> Can I do some query in Solr which will give me the more the 80%
>> >>>> similarity?
>> >>>>
>> >>>>
>> >>>> Kind regards,
>> >>>> Darko Todoric
>> >>>>
>> >>>> --
>> >>>> Darko Todoric
>> >>>> Web Engineer, MDPI DOO
>> >>>> Veljka Dugosevica 54, 11060 Belgrade, Serbia
>> >>>> +381 65 43 90 620
>> >>>> www.mdpi.com
>> >>>>
>> >>>> Disclaimer: The information and files contained in this message are
>> >>>> confidential and intended solely for the use of the individual or
>> entity to
>> >>>> whom they are addressed.
>> >>>> f you have received this message in error, please notify me and
>> delete
>> >>>> this message from your system.
>> >>>> You may not copy this message in its entirety or in part, or
>> disclose its
>> >>>> contents to anyone.
>> >>>>
>> >> --
>> >> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>> >> Solr & Elasticsearch Support * http://sematext.com/
>> >>
>>
>> --
>> Darko Todoric
>> Web Engineer, MDPI DOO
>> Veljka Dugosevica 54, 11060 Belgrade, Serbia
>> +381 65 43 90 620
>> www.mdpi.com
>>
>> Disclaimer: The information and files contained in this message are
>> confidential
>> and intended solely for the use of the individual or entity to whom they
>> are addressed.
>> f you have received this message in error, please notify me and delete
>> this message from your system.
>> You may not copy this message in its entirety or in part, or disclose its
>> contents to anyone.
>>
>>

Reply via email to