Hi Ahmet,
Ok. Thanks for your advice.
Regards,
Edwin
On 25 November 2017 at 10:23, Ahmet Arslan wrote:
>
>
> Hi Zheng,
>
> UAX29UET recognizes URLs and e-mails. It does not tokenize them. It keeps
> them single token.
>
> StandardTokenizer produce two or more tokens for an entity.
>
> Please t
Hi Rick,
For both of the tokenizers, it does not split on the hyphens for email like
this:
solr-user@lucene.apache.org
The entire email address remains intact for both of the tokenizers.
Regards,
Edwin
On 24 November 2017 at 20:19, Rick Leir wrote:
> Edwin
> There is a spec for which characte
Hi Zheng,
UAX29UET recognizes URLs and e-mails. It does not tokenize them. It keeps them
single token.
StandardTokenizer produce two or more tokens for an entity.
Please try them using the analysis page, use which one suits your requirements.
Ahmet
On Friday, November 24, 2017, 11:46:57 A
Edwin
There is a spec for which characters are acceptable in an email name, and
another spec for chars in a domain name. I suspect you will have more success
with a tokenizer which is specialized for email, but I have not looked at
UAX29URLEmailTokenizerFactory. Does ClassicTokenizerFactory spli
Hi,
I am indexing email addresses into Solr via EML files. Currently, I am
using ClassicTokenizerFactory with LowerCaseFilterFactory. However, I also
found that we can also use UAX29URLEmailTokenizerFactory with
LowerCaseFilterFactory.
Does anyone have any recommendation on which Tokenizer is bet