Re: Solr, ICUTokenizer with Latin-break-only-on-whitespace

2013-06-20 Thread Jonathan Rochkind
Thank you... I started out writing an email with screenshots proving that it wasn't working for me in 4.3.0... and of course, having to confirm every single detail in order to say I confirmed it... I realized it was a mistake on my part, not testing what I thought I was testing. Does indeed ap

Re: Solr, ICUTokenizer with Latin-break-only-on-whitespace

2013-06-20 Thread Shawn Heisey
On 6/20/2013 1:26 PM, Jonathan Rochkind wrote: I want, for instance, "C++ Language" to be tokenized into "C++", "Language". But the ICUTokenizer, even with the rulefiles="Latn:Latin-break-only-on-whitespace.rbbi", with the rbbi file from the Solr 4.3 source [1]. But the ICUTokenizer, even wi

Solr, ICUTokenizer with Latin-break-only-on-whitespace

2013-06-20 Thread Jonathan Rochkind
(to solr-user, CC'ing author I'm responding to) I found the solr-user listserv contribution at: https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201305.mbox/%3c51965e70.6070...@elyograg.org%3E Which explain a way you can supply custom rulefiles to ICUTokenizer, in this case to tell i