I had been using text_general at the time of my email's writing. Have tried a couple of the other stock ones (text_en, text_en_splitting, _tight). Have now begun writing my own. I began to wonder if simply doing one of the above, such as text_general, with a fuzzy distance (probably just ~1) would be best suited. Another example would be an indexed value of "Phasaix" (which is a typo in the original data) being searched for with the correct spelling of "Phasix" and returning nothing. Adding ~1 in that case works. For some reason it doesn't in the case of the 1234-L and 1234-LT example.
Thanks for any insight- -- *John Blythe* Product Manager & Lead Developer 251.605.3071 | j...@curvolabs.com www.curvolabs.com 58 Adams Ave Evansville, IN 47713 On Mon, Feb 1, 2016 at 3:30 PM, Erick Erickson <erickerick...@gmail.com> wrote: > Likely you also have WordDelimiterFilterFactory in > your fieldType, that's what will split on alphanumeric > transitions. > > So you should be able to use wildcards here, i.e. 1234L* > > However, that'll only work if you have preserveOriginal set in > WordDelimiterFilterFactory in your indexing chain. > > And just to make life "interesting", there are some peculiarities > with parsing wildcards at query time, so be sure to see the > admin/analysis page.... > > Best, > Erick > > On Mon, Feb 1, 2016 at 12:20 PM, John Blythe <j...@curvolabs.com> wrote: > > Hi there > > > > I have a a catch all field called 'text' that I copy my item description, > > manufacturer name, and the item's catalog number into. I'm having an > issue > > with keeping the broadness of the tokenizers in place whilst still > allowing > > some good precision in the case of very specific queries. > > > > The results are generally good. But, for instance, the products named > 1234L > > and 1234LT aren't behaving how i would like. If I search 1234 they both > > show. If I search 1234L only the first one is returned. I'm guessing this > > is due to the splitting of the numeric and string portions. The "1234" > and > > the "L" both hit in the first case ("1234" and "L") but the L is of no > > value in the "1234" and "LT" indexed item. > > > > What is the best way around this so that a small levenstein distance, for > > instance, is picked up? >