RE: StandardTokenizer and domain names containing digits

2012-04-23 Thread Steven A Rowe
: StandardTokenizer and domain names containing digits Steven A Rowe syr.edu> writes: > StandardTokenizer in Lucene/Solr v3.1+ implements the Word Boundary > rules from Unicode 6.0.0 Standard > Annex #29, a.k.a. UAX#29: <http://www.unicode.org/reports/tr29/tr29- 17.html#Word_Bounda

Re: StandardTokenizer and domain names containing digits

2012-04-23 Thread Alex Willmer
Steven A Rowe syr.edu> writes: > StandardTokenizer in Lucene/Solr v3.1+ implements the Word Boundary rules > from Unicode 6.0.0 Standard > Annex #29, a.k.a. UAX#29: . > These rules don't include recognition of URLs or domain nam

RE: StandardTokenizer and domain names containing digits

2012-04-19 Thread Steven A Rowe
Hi Alex, TLDR; Try adding WordDelimiterFilter to your analyzer(s). StandardTokenizer in Lucene/Solr v3.1+ implements the Word Boundary rules from Unicode 6.0.0 Standard Annex #29, a.k.a. UAX#29: . These rules don't include reco