Re: Split words with period in between into separate tokens

2016-10-12 Thread Derek Poh
Why didn't I thought of that. That's another alternative. Thank you for your suggestion. Appreciate it. On 10/13/2016 5:41 AM, Georg Sorst wrote: You could use a PatternReplaceCharFilter before your tokenizer to replace the dot with a space character. Derek Poh schrieb am Mi., 12. Okt. 2016 1

Re: Split words with period in between into separate tokens

2016-10-12 Thread Georg Sorst
You could use a PatternReplaceCharFilter before your tokenizer to replace the dot with a space character. Derek Poh schrieb am Mi., 12. Okt. 2016 11:38: > Seems like LetterTokenizerFactory tokenise/discard on numbers as well. The > field does has values with numbers in them therefore it is not a

Re: Split words with period in between into separate tokens

2016-10-12 Thread Derek Poh
Seems like LetterTokenizerFactory tokenise/discard on numbers as well. The field does has values with numbers in them therefore it is not applicable. Thank you. On 10/12/2016 4:22 PM, Dheerendra Kulkarni wrote: You can use LetterTokenizerFactory instead. Regards, Dheerendra Kulkarni On Wed,

Re: Split words with period in between into separate tokens

2016-10-12 Thread Dheerendra Kulkarni
You can use LetterTokenizerFactory instead. Regards, Dheerendra Kulkarni On Wed, Oct 12, 2016 at 6:24 AM, Derek Poh wrote: > Hi > > How can I split words with period in between into separate tokens. > Eg. "Co.Ltd" => "Co" "Ltd" . > > I am using StandardTokenizerFactory and it does notreplace pe