the ngramming is a time/space tradeoff. Typically, if you restrict the wildcards to have three or more “real” characters performance is fine. One real character (i.e. a*) will be your worst-case. I’ve seen requiring two characters in the prefix work well too. It Depends (tm).
Conceptually what happens here is that Lucene has to enumerate all of the terms that start with the prefix and create a ginormous OR clause. The term enumeration will take longer the more terms there are. Things are more efficient than that, but still... So make sure you’re testing with a real corpus. Having a test index with just a few terms will be misleading. Best, Erick > On Mar 25, 2020, at 9:37 PM, matthew sporleder <msporle...@gmail.com> wrote: > > Okay confirmed- > I am getting a more predictable results set after adding an additional field: > <fieldType name="string_alpha" class="solr.TextField" > sortMissingLast="true" omitNorms="true"> > <analyzer> > <tokenizer class="solr.KeywordTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory" /> > <filter class="solr.PatternReplaceFilterFactory" > pattern="\p{Punct}" replacement=""/> > </analyzer> > </fieldType> > > q=slug:what_is_lo*&fl=slug&rows=1000&wt=csv&sort=slug_alpha%20asc > > So it appears I can skip edge ngram entirely using this method as > slug:foo* appears to be the exact same results as fayt:foo, but I have > the cost of the alphaOnly field :) > > I will try to figure out some benchmarks or something to decide how to go. > > Thanks again for the help so far. > > > On Wed, Mar 25, 2020 at 2:39 PM Erick Erickson <erickerick...@gmail.com> > wrote: >> >> You’re getting the correct sorted order… The underscore character is >> confusing you. >> >> It’s ascii code for underscore is %2d which sorts before any letter, >> uppercase or lowercase. >> >> See the alphaOnlySort type for a way to remove this, although the output >> there can also >> be confusing. >> >> Best, >> Erick >> >>> On Mar 25, 2020, at 1:30 PM, matthew sporleder <msporle...@gmail.com> wrote: >>> >>> What_is_Lov_Holtz_known_for >>> What_is_lova_after_it_harddens >>> What_is_Lova_Moor's_birthday >>> What_is_lovable_in_Spanish >>> What_is_lovage >>> What_is_Lovagny's_population >>> What_is_lovan_for >>> What_is_lovanox >>> What_is_lovarstan_for >>> What_is_Lovasatin >>