Re: edge ngram/find as you type sorting

Erick Erickson Wed, 25 Mar 2020 07:40:02 -0700

Why do you want to deal with score at all? Sorting
overrides score-based sorting. Well, unless you 
specify score as a secondary sort. But since you’re
sorting by length anyway, trying to score
based on proximity to the end does nothing.


The weirdness you’re going to get here, though, is
that the order of the results will not be alphabetical.
Say you have two docs, one with abcd and one with 
abce. Now say you search on abc. Whether abcd or 
abce comes first is indeterminant.

If you simply stored the keyword-lowercased value
in a copyfield and sorted on _that_, you wouldn’t have
this problem. But if you’re really worried about space,
that might not be an option.

Best,
Erick

> On Mar 25, 2020, at 9:49 AM, matthew sporleder <msporle...@gmail.com> wrote:
> 
> Where I landed:
> 
>  <fieldType name="string_ci" class="solr.TextField"
> sortMissingLast="true" omitNorms="false">
>     <analyzer>
>          <tokenizer class="solr.KeywordTokenizerFactory"/>
>          <filter class="solr.LowerCaseFilterFactory" />
>     </analyzer>
>  </fieldType>
> 
> <fieldType name="edgytext" class="solr.TextField" positionIncrementGap="100">
> <analyzer type="index">
>   <filter class="solr.LowerCaseFilterFactory" />
>   <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
> maxGramSize="25" />
>   <tokenizer class="solr.KeywordTokenizerFactory"/>
> </analyzer>
> <analyzer type="query">
>   <tokenizer class="solr.KeywordTokenizerFactory"/>
>   <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> </fieldType>
> 
> 
>  <field name="slug" type="string_ci" indexed="true" stored="true"
> multiValued="false" />
>  <field name="fayt" type="edgytext" indexed="true" stored="false"
> omitNorms="false" omitTermFreqAndPositions="false" multiValued="true"
> />
>  <field name="qt_len" type="int" indexed="true" stored="true"
> multiValued="false" />
> 
> ---
> 
> I can then do a search for
> 
> q=fayt:my_article_slu&sort=qt_len asc
> 
> to get the shortest/most exact find-as-you-type match.  I couldn't get
> around all results having the same score (can I boost proximity to the
> end of a string?) in the edge ngram search but I am hoping this is the
> fastest way to do this type of search since I can avoid wildcards
> "my_article_slu*" and stuff.
> 
> More suggestions welcome and thanks for the help.  I will re-index
> with omitNorms=true again to see if I can save a little space.
> 
> 
> 
> 
> 
> On Tue, Mar 24, 2020 at 11:39 AM matthew sporleder <msporle...@gmail.com> 
> wrote:
>> 
>> Okay I appreciate you responding.
>> 
>> Switching "slug" from "string_ci" class="solr.StrField" accomplished
>> about the same results, which makes sense to me now :)
>> 
>> The previous definition of string_ci was:
>>  <fieldType name="string_ci" class="solr.TextField"
>> sortMissingLast="true" omitNorms="true">
>>     <analyzer>
>>          <tokenizer class="solr.KeywordTokenizerFactory"/>
>>          <filter class="solr.LowerCaseFilterFactory" />
>>     </analyzer>
>>  </fieldType>
>> 
>> So lowercase + KeywordTokenizerFactory;
>> 
>> I am trying again with omitNorms=false  to see if I can get the more
>> "exact" matches to score better this time around.
>> 
>> 
>> On Tue, Mar 24, 2020 at 9:54 AM Erick Erickson <erickerick...@gmail.com> 
>> wrote:
>>> 
>>> Won’t work. String types are totally unanalyzed. Your string_ci fieldType 
>>> is what I was looking for.
>>> 
>>> No, you shouldn’t kill the lowercasefilter unless you want all of your 
>>> searches will then be case-sensitive.
>>> 
>>> So you should try:
>>> 
>>> q=edgy_text:whatever&sort=string_ci asc
>>> 
>>> Please use the admin>>pick_core>>analysis page when thinking about changing 
>>> your schema, it’ll answer a _lot_ of these questions immediately.
>>> 
>>> Best,
>>> Erick
>>> 
>>>> On Mar 24, 2020, at 8:37 AM, matthew sporleder <msporle...@gmail.com> 
>>>> wrote:
>>>> 
>>>> Oh maybe a schema bug!
>>>> 
>>>> my string_ci:
>>>> <fieldType name="string_ci" class="solr.TextField"
>>>> sortMissingLast="true" omitNorms="true">
>>>>    <analyzer>
>>>>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>>>>         <filter class="solr.LowerCaseFilterFactory" />
>>>>    </analyzer>
>>>> </fieldType>
>>>> 
>>>> going to try this instead:
>>>> <fieldType name="string_lctoken" class="solr.StrField"
>>>> sortMissingLast="true" omitNorms="true">
>>>>    <analyzer>
>>>>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>>>>         <filter class="solr.LowerCaseFilterFactory" />
>>>>    </analyzer>
>>>> </fieldType>
>>>> 
>>>> Then I can probably kill the lowercasefilter on edgeytext:
>>>> 
>>>> 
>>>> 
>>>> On Tue, Mar 24, 2020 at 7:44 AM Erick Erickson <erickerick...@gmail.com> 
>>>> wrote:
>>>>> 
>>>>> Sort by the full field. You’ll need to copy to a field with 
>>>>> keywordTokenizer and lowercaseFilter (string_ci? assuming it’s not really 
>>>>> a :”string”) type.
>>>>> 
>>>>> Best,
>>>>> Erick
>>>>> 
>>>>>> On Mar 24, 2020, at 7:10 AM, matthew sporleder <msporle...@gmail.com> 
>>>>>> wrote:
>>>>>> 
>>>>>> I have added an edge ngram field to my index and get decent results
>>>>>> with partial words but the results appear randomly sorted and all
>>>>>> contain the same score.  Ideally I would like to sort by shortest
>>>>>> ngram match within my other qualifiers.
>>>>>> 
>>>>>> Is there a canonical solution to this?
>>>>>> 
>>>>>> Thanks,
>>>>>> Matt
>>>>>> 
>>>>>> p.s. I mostly followed
>>>>>> https://lucidworks.com/post/auto-suggest-from-popular-queries-using-edgengrams/
>>>>>> 
>>>>>> schema bits:
>>>>>> 
>>>>>> <fieldType name="edgytext" class="solr.TextField" 
>>>>>> positionIncrementGap="100">
>>>>>> <analyzer type="index">
>>>>>> <tokenizer class="solr.KeywordTokenizerFactory"/>
>>>>>> <filter class="solr.LowerCaseFilterFactory"/>
>>>>>> <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
>>>>>> maxGramSize="25" />
>>>>>> </analyzer>
>>>>>> 
>>>>>> <field name="slug" type="string_ci" indexed="true" stored="true"
>>>>>> multiValued="false" />
>>>>>> 
>>>>>> <field name="fayt" type="edgytext" indexed="true" stored="false"
>>>>>> omitNorms="false" omitTermFreqAndPositions="true" multiValued="true"
>>>>>> />
>>>>>> 
>>>>>> 
>>>>>> <copyField source="slug" dest="fayt" maxChars="65" />
>>>>> 
>>>

Re: edge ngram/find as you type sorting

Reply via email to