Re: Solr search engine configuration

Erick Erickson Sun, 11 Mar 2018 11:32:59 -0700

bq: I tried the query with and without the &defType=edismax parameter but I'm
getting the EXACT same results. Does that mean some configuration error?

Well, not an error at all, this line:
 <str name="QParser">ExtendedDismaxQParser</str>

Means you're using edismax. If that happens both with or without
&defType, that means
that your request handler in solrconfig.xml has this defined as a
default. Look for the
entry like:

<requestHandler name="/select" class="solr.SearchHandler">
   <lst name="defaults">
         <str name="defType">edismax</str>

So any search you send to Solr like
http://blah blah/solr/collection/select?

will use edismax if no defType overrides it on the URL.

-------
Let's talk about what "exact match" means ;)

Exact match "dieren zaak". Does "Exact match" here mean it would or
would not be an exact match on "dieren zaak soemthingelse"?

I you do NOT consider the above "exact match", the usual trick is to
use a copyField directive to a field that uses KeywordTokenizerFactory
(probably) followed by LowerCaseFilterFactory etc.
KeywordTokenizerFactory takes the entire input field as a _single_
token, then you can transform it various ways, things like folding
accents, lowercasing and the like if desired.

I you DO consider the above "exact match", take a look at the pf, pf2
and pf3 parameters in edismax. They're all about forming phrases,
bigrams and trigrams respectively for this form of "exact match".

Exact match "dierenzaak". This one is tricky. There's little OOB that
understands that "dieren zaak" is equivalent to "dierenzaak". I know
that in German there's prior art on "decompounding" filters, I don't
know about Dutch. Further, given my total lack of understanding the
rules of either language I don't know if it does "compounding" too,
i.e. understanding that "dieren zaak" is equivalent to "dierenzaak".
Can't help much there.

For a start I'd get rid of the gramming until I'd explored other
alternatives. Gramming is generally a good thing for pre-and-post
wildcards, i.e. matching *some*. Since you're concerned with
relevance, I suspect that gramming will make your task harder.

And if you haven't discovered the admin UI/analysis page, I recommend
you spend some time with it (hint, un-check the "verbose" checkbox).
As you play with various combinations of tokenizers and filters it'll
give you a much better understanding of what the effects of various
combinations are.

If only human language followed strict rules ;)

Professor:                            "In English, two negatives are
allowed and mean a positive, but two positives don't mean a negative."
Bored voice from the back: "Yeah, right".

Erick

On Sun, Mar 11, 2018 at 5:19 AM, PeterKerk <petervdk...@hotmail.com> wrote:
> Thanks! That provides me with some more insight, I altered the search query
> to "dieren zaak" to see how queries consisting of more than 1 word are
> handled.
> I see that words are tokenized into groups of 3, I think because of my
> NGramFilterFactory with minGramSize of 3.
>
> <lst name="debug">
>         <str name="rawquerystring">
>         (title_search_global:(dieren zaak) OR 
> description_search_global:(dieren
> zaak))
>         </str>
>         <str name="querystring">
>         (title_search_global:(dieren zaak) OR 
> description_search_global:(dieren
> zaak))
>         </str>
>         <str name="parsedquery">
>         (+(((title_search_global:die title_search_global:ier
> title_search_global:ere title_search_global:ren title_search_global:dier
> title_search_global:iere title_search_global:eren title_search_global:diere
> title_search_global:ieren title_search_global:dieren)
> (title_search_global:zaa title_search_global:aak title_search_global:zaak))
> (((description_search_global:dier description_search_global:diere
> description_search_global:dieren)/no_coord)
> description_search_global:zaak)))/no_coord
>         </str>
>         <str name="parsedquery_toString">
>         +(((title_search_global:die title_search_global:ier 
> title_search_global:ere
> title_search_global:ren title_search_global:dier title_search_global:iere
> title_search_global:eren title_search_global:diere title_search_global:ieren
> title_search_global:dieren) (title_search_global:zaa title_search_global:aak
> title_search_global:zaak)) ((description_search_global:dier
> description_search_global:diere description_search_global:dieren)
> description_search_global:zaak))
>         </str>
>         <str name="QParser">ExtendedDismaxQParser</str>
>         <null name="altquerystring"/>
>         <null name="boost_queries"/>
>         <arr name="parsed_boost_queries"/>
>         <null name="boostfuncs"/>
>         <arr name="filter_queries">
>                 <str>(lang:"nl" OR lang:"all")</str>
>         </arr>
>         <arr name="parsed_filter_queries">
>                 <str>lang:nl lang:all</str>
>         </arr>
> </lst>
>
>
> I tried the query with and without the &defType=edismax parameter but I'm
> getting the EXACT same results. Does that mean some configuration error?
>
> I'm not sure how to progress from here. Can you see if your presumption that
> I'm mixing two different parsers is correct? My schema.xml is here:
> http://www.telefonievergelijken.nl/schema.xml
>
>
> Related: do you know of the existence of any sample schema.xml config that
> would be usable for a search engine? Seems like something so obvious to
> float around out there. I feel that would go a long way.
>
>
>
> Not sure if it matters but my requirements are:
>
> Exact match "dieren zaak" boost result with 1000
> Exact match "dierenzaak" boost result with 900
> Exact match "dieren" or "zaak" boost result with 600
>
> Partial match "huisdierenzaak" or "huisdieren zaak" boost result with 500
> Stem match "dier" boost result with 100
> Stem partial match "huisdier" boost result with 70
> Other partial matches "die" boost result with 10
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr search engine configuration

Reply via email to