I had been using text_general at the time of my email's writing. Have tried
a couple of the other stock ones (text_en, text_en_splitting, _tight). Have
now begun writing my own. I began to wonder if simply doing one of the
above, such as text_general, with a fuzzy distance (probably just ~1) would
be best suited. Another example would be an indexed value of "Phasaix"
(which is a typo in the original data) being searched for with the correct
spelling of "Phasix" and returning nothing. Adding ~1 in that case works.
For some reason it doesn't in the case of the 1234-L and 1234-LT example.

Thanks for any insight-

-- 
*John Blythe*
Product Manager & Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Mon, Feb 1, 2016 at 3:30 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Likely you also have WordDelimiterFilterFactory in
> your fieldType, that's what will split on alphanumeric
> transitions.
>
> So you should be able to use wildcards here, i.e. 1234L*
>
> However, that'll only work if you have preserveOriginal set in
> WordDelimiterFilterFactory in your indexing chain.
>
> And just to make life "interesting", there are some peculiarities
> with parsing wildcards at query time, so be sure to see the
> admin/analysis page....
>
> Best,
> Erick
>
> On Mon, Feb 1, 2016 at 12:20 PM, John Blythe <j...@curvolabs.com> wrote:
> > Hi there
> >
> > I have a a catch all field called 'text' that I copy my item description,
> > manufacturer name, and the item's catalog number into. I'm having an
> issue
> > with keeping the broadness of the tokenizers in place whilst still
> allowing
> > some good precision in the case of very specific queries.
> >
> > The results are generally good. But, for instance, the products named
> 1234L
> > and 1234LT aren't behaving how i would like. If I search 1234 they both
> > show. If I search 1234L only the first one is returned. I'm guessing this
> > is due to the splitting of the numeric and string portions. The "1234"
> and
> > the "L" both hit in the first case ("1234" and "L") but the L is of no
> > value in the "1234" and "LT" indexed item.
> >
> > What is the best way around this so that a small levenstein distance, for
> > instance, is picked up?
>

Reply via email to