It does indeed appear that use of the "_cz" suffix is a mistake - those suffixes are supposed to be language codes. Sure, generally, there tends to be a one-to-one relationship between language and country, but clearly that is not as absolute as a casual observer might misguidedly think.
I think it's worth a Jira - text types should use language codes, not country codes. -- Jack Krupansky On Tue, Mar 17, 2015 at 1:35 PM, Eduard Moraru <enygma2...@gmail.com> wrote: > Hi, > > First of all, a bit of a disclaimer: I am not a Czech language speaker, at > all. > > We are using Solr's dynamic fields in our project (XWiki), and we have > recently noticed a problem [1] with the Czech language. > > Basically, our mapping says something like this: > > <dynamicField name="*_cz" type="text_cz" indexed="true" stored="true" > multiValued="true" /> > > ...but at runtime, we ask for the language code "cs" (which is the ISO > language code for Czech [2]) and it obviously fails (due to the mapping). > > Now, we can easily fix this on our end by fixing the mapping to > name="*_cs", > but what we are really wondering now is why does Lucene/Solr use "cz" > (country code) instead of "cs" (language code) in both its "text_cz" field > and its "stopwords_cz.txt" file? > > Is that a mistake on the Solr/Lucene side? Is it some kind of convention? > Is it going to be fixed? > > Thanks, > Eduard > > ---------- > [1] http://jira.xwiki.org/browse/XWIKI-11897 > [2] http://en.wikipedia.org/wiki/Czech_language >