You need to play with the (many) parameters for WordDelimiterFilterFactory.
For instance, you have preserveOriginal set to 1. That's what's generating the token with the dot. You have catenateAll and catenateNumbers set to zero. That means that someone searching for 61149008 won't get a hit. The fact that the dot is in the tokens generated doesn't really matter as long as the query tokens produced will match. I think you're getting a bit off track by focusing on the hyphen and dot, you're only seeing them in the index at all since you have preserveOriginal set to 1. Let's say that you set preserveOriginal to 0 and catenateNumbers to 1. Then you'd get: 61149 008 61149008 in your index. No dots, no hyphens. Not your _query_ analysis also has catenateNumbers as 1 and preserveOriginal as 0. The user searches for 61149-008 and the emitted tokens are in the index and you're OK. The user searches for 61149008 and gets a hit there too. The dot is irrelevant. now, all that said if that isn't comfortable you could certainly add PatternReplaceFilterFactory, but really WDFF is designed for this kind of thing, I think you'll be just fine if you play with the options enough to understand the nuances, which can be tricky I'll admit.. Best, Erick On Fri, Nov 24, 2017 at 7:13 AM, Sergio García Maroto <[email protected]> wrote: > Yes. You are right. I understand now. > Let me explain my issue a bit better with the exact problem i have. > > I have this text "Information number 61149-008." > Using the tokenizers and filters described previously i get this list of > tokens. > information > number > 61149-008. > 61149 > 008 > > Basically last token "61149-008." gets tokenized as > 61149-008. > 61149 > 008 > User is searching for "61149-008" without dot, so this is not a match. > I don't want to change the tokenization on the query to avoid altering the > matches for other cases. > > I would like to delete the dot at the end. Basically generate this extra > token > information > number > 61149-008. > 61149 > 008 > 61149-008 > > Not sure if what I am saying make sense or there is other way to do this > right. > > Thanks a lot > Sergio > > > On 24 November 2017 at 15:31, Shawn Heisey <[email protected]> wrote: > >> On 11/24/2017 2:32 AM, marotosg wrote: >> >>> Hi Shaw. >>> Thanks for your reply. Actually my issue is with the last token. It looks >>> like for the last token of a string. It keeps the dot. >>> >>> In your case Testing. This is a test. Test. >>> >>> Keeps the "Test." >>> >>> Is there any reason I can't see for that behauviour? >>> >> >> I am really not sure what you're saying here. >> >> Every token is duplicated, one has the dot and one doesn't. This is what >> you wanted based on what I read in your initial email. >> >> Making a guess as to what you're asking about this time: If you're >> noticing that there isn't a "Test" as the last token on the line for WDF, >> then I have to tell you that it actually is there, the display was simply >> too wide for the browser window. Scrolling horizontally would be required >> to see the whole thing. >> >> Thanks, >> Shawn >> >>
