Hi Norberto, On 08/14/2008 at 8:10 AM, Norberto Meijome wrote: > > On 8/13/08 9:16 AM, "Steven A Rowe" <[EMAIL PROTECTED]> wrote: > > > > > Hi Norberto, > > > > > > https://issues.apache.org/jira/browse/LUCENE-1343 > > hi Steve, > thanks for the pointer. this is a Lucene entry... I thought the > Latin-filter was a SOLR feature? I, for one, definitely meant a SOLR filter.
A fair portion of Solr is a set of wrappers over Lucene functionality. ISOLatin1FilterFactory, for example, wraps Lucene's ISOLatin1AccentFilter. Here is the entirety of the Solr code: public class ISOLatin1AccentFilterFactory extends BaseTokenFilterFactory { public ISOLatin1AccentFilter create(TokenStream input) { return new ISOLatin1AccentFilter(input); } } Of course, BaseTokenFilterFactory brings more to the party, but my point is that adding Lucene filters to Solr is generally a trivial exercise - a Solr ...FilterFactory around LUCENE-1343 would not be much longer than the four lines listed above, since the configuration aspects are already handled by BaseTokenFilterFactory. > Given what Walter rightly pointed out about differences in language, I suspect > it would be a SOLR-level thing - fieldType name="textDE" language="DE" would > apply the filter of unicode chars to {ascii?} with the appropriate mapping > for German, etc. > > Or is this that Lucene would / should take care of ? The kind of filter Walter is talking about - a generalized language-aware character normalization Solr/Lucene filter - does not yet exist. My guess is that if/when it does materialize, both the Solr and the Lucene projects will want to have it. Historically, most functionality shared by Solr and Lucene is eventually hosted by Lucene, since Solr has a Lucene dependency, but not vice-versa. So, yes, Solr would be responsible for hosting configuration for such a filter, but the responsibility for doing something with the configuration would be Lucene's responsibility, assuming that Lucene would (eventually) host the filter and Solr would host a factory over the filter. Steve