My original goal was to avoid indexing the string length because I wanted edge ngram to "score" based on how "exact" the match was:
q=abc "abc" has a high score "abcd" has a lower score "abcde" has an even lower score You say sorting by by the original field will do that but in practice it is not happening so I am probably missing something. I *am* getting a close version of what I said above with sorting on the length, which I added to the index. searching for my keyword-lowercase field:abc* + sorting by length is also working so maybe I can skip the edge ngram field entirely and just do that but I was hoping the trade some disk space for performance. This field will get queried a lot. On Wed, Mar 25, 2020 at 10:39 AM Erick Erickson <erickerick...@gmail.com> wrote: > > Why do you want to deal with score at all? Sorting > overrides score-based sorting. Well, unless you > specify score as a secondary sort. But since you’re > sorting by length anyway, trying to score > based on proximity to the end does nothing. > > The weirdness you’re going to get here, though, is > that the order of the results will not be alphabetical. > Say you have two docs, one with abcd and one with > abce. Now say you search on abc. Whether abcd or > abce comes first is indeterminant. > > If you simply stored the keyword-lowercased value > in a copyfield and sorted on _that_, you wouldn’t have > this problem. But if you’re really worried about space, > that might not be an option. > > Best, > Erick > > > On Mar 25, 2020, at 9:49 AM, matthew sporleder <msporle...@gmail.com> wrote: > > > > Where I landed: > > > > <fieldType name="string_ci" class="solr.TextField" > > sortMissingLast="true" omitNorms="false"> > > <analyzer> > > <tokenizer class="solr.KeywordTokenizerFactory"/> > > <filter class="solr.LowerCaseFilterFactory" /> > > </analyzer> > > </fieldType> > > > > <fieldType name="edgytext" class="solr.TextField" > > positionIncrementGap="100"> > > <analyzer type="index"> > > <filter class="solr.LowerCaseFilterFactory" /> > > <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" > > maxGramSize="25" /> > > <tokenizer class="solr.KeywordTokenizerFactory"/> > > </analyzer> > > <analyzer type="query"> > > <tokenizer class="solr.KeywordTokenizerFactory"/> > > <filter class="solr.LowerCaseFilterFactory"/> > > </analyzer> > > </fieldType> > > > > > > <field name="slug" type="string_ci" indexed="true" stored="true" > > multiValued="false" /> > > <field name="fayt" type="edgytext" indexed="true" stored="false" > > omitNorms="false" omitTermFreqAndPositions="false" multiValued="true" > > /> > > <field name="qt_len" type="int" indexed="true" stored="true" > > multiValued="false" /> > > > > --- > > > > I can then do a search for > > > > q=fayt:my_article_slu&sort=qt_len asc > > > > to get the shortest/most exact find-as-you-type match. I couldn't get > > around all results having the same score (can I boost proximity to the > > end of a string?) in the edge ngram search but I am hoping this is the > > fastest way to do this type of search since I can avoid wildcards > > "my_article_slu*" and stuff. > > > > More suggestions welcome and thanks for the help. I will re-index > > with omitNorms=true again to see if I can save a little space. > > > > > > > > > > > > On Tue, Mar 24, 2020 at 11:39 AM matthew sporleder <msporle...@gmail.com> > > wrote: > >> > >> Okay I appreciate you responding. > >> > >> Switching "slug" from "string_ci" class="solr.StrField" accomplished > >> about the same results, which makes sense to me now :) > >> > >> The previous definition of string_ci was: > >> <fieldType name="string_ci" class="solr.TextField" > >> sortMissingLast="true" omitNorms="true"> > >> <analyzer> > >> <tokenizer class="solr.KeywordTokenizerFactory"/> > >> <filter class="solr.LowerCaseFilterFactory" /> > >> </analyzer> > >> </fieldType> > >> > >> So lowercase + KeywordTokenizerFactory; > >> > >> I am trying again with omitNorms=false to see if I can get the more > >> "exact" matches to score better this time around. > >> > >> > >> On Tue, Mar 24, 2020 at 9:54 AM Erick Erickson <erickerick...@gmail.com> > >> wrote: > >>> > >>> Won’t work. String types are totally unanalyzed. Your string_ci fieldType > >>> is what I was looking for. > >>> > >>> No, you shouldn’t kill the lowercasefilter unless you want all of your > >>> searches will then be case-sensitive. > >>> > >>> So you should try: > >>> > >>> q=edgy_text:whatever&sort=string_ci asc > >>> > >>> Please use the admin>>pick_core>>analysis page when thinking about > >>> changing your schema, it’ll answer a _lot_ of these questions immediately. > >>> > >>> Best, > >>> Erick > >>> > >>>> On Mar 24, 2020, at 8:37 AM, matthew sporleder <msporle...@gmail.com> > >>>> wrote: > >>>> > >>>> Oh maybe a schema bug! > >>>> > >>>> my string_ci: > >>>> <fieldType name="string_ci" class="solr.TextField" > >>>> sortMissingLast="true" omitNorms="true"> > >>>> <analyzer> > >>>> <tokenizer class="solr.KeywordTokenizerFactory"/> > >>>> <filter class="solr.LowerCaseFilterFactory" /> > >>>> </analyzer> > >>>> </fieldType> > >>>> > >>>> going to try this instead: > >>>> <fieldType name="string_lctoken" class="solr.StrField" > >>>> sortMissingLast="true" omitNorms="true"> > >>>> <analyzer> > >>>> <tokenizer class="solr.KeywordTokenizerFactory"/> > >>>> <filter class="solr.LowerCaseFilterFactory" /> > >>>> </analyzer> > >>>> </fieldType> > >>>> > >>>> Then I can probably kill the lowercasefilter on edgeytext: > >>>> > >>>> > >>>> > >>>> On Tue, Mar 24, 2020 at 7:44 AM Erick Erickson <erickerick...@gmail.com> > >>>> wrote: > >>>>> > >>>>> Sort by the full field. You’ll need to copy to a field with > >>>>> keywordTokenizer and lowercaseFilter (string_ci? assuming it’s not > >>>>> really a :”string”) type. > >>>>> > >>>>> Best, > >>>>> Erick > >>>>> > >>>>>> On Mar 24, 2020, at 7:10 AM, matthew sporleder <msporle...@gmail.com> > >>>>>> wrote: > >>>>>> > >>>>>> I have added an edge ngram field to my index and get decent results > >>>>>> with partial words but the results appear randomly sorted and all > >>>>>> contain the same score. Ideally I would like to sort by shortest > >>>>>> ngram match within my other qualifiers. > >>>>>> > >>>>>> Is there a canonical solution to this? > >>>>>> > >>>>>> Thanks, > >>>>>> Matt > >>>>>> > >>>>>> p.s. I mostly followed > >>>>>> https://lucidworks.com/post/auto-suggest-from-popular-queries-using-edgengrams/ > >>>>>> > >>>>>> schema bits: > >>>>>> > >>>>>> <fieldType name="edgytext" class="solr.TextField" > >>>>>> positionIncrementGap="100"> > >>>>>> <analyzer type="index"> > >>>>>> <tokenizer class="solr.KeywordTokenizerFactory"/> > >>>>>> <filter class="solr.LowerCaseFilterFactory"/> > >>>>>> <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" > >>>>>> maxGramSize="25" /> > >>>>>> </analyzer> > >>>>>> > >>>>>> <field name="slug" type="string_ci" indexed="true" stored="true" > >>>>>> multiValued="false" /> > >>>>>> > >>>>>> <field name="fayt" type="edgytext" indexed="true" stored="false" > >>>>>> omitNorms="false" omitTermFreqAndPositions="true" multiValued="true" > >>>>>> /> > >>>>>> > >>>>>> > >>>>>> <copyField source="slug" dest="fayt" maxChars="65" /> > >>>>> > >>> >