Re: Question about solr.WordDelimiterFilterFactory

2012-04-12 Thread Jian Xu
2012 8:01 AM Subject: Re: Question about solr.WordDelimiterFilterFactory WordDelimiterFilterFactory will _almost_ do what you want by setting things like catenateWords=0 and catenateNumbers=1, _except_ that the punctuation will be removed. So 12.34 -> 1234 ab,cd -> ab cd is that "close

Re: Question about solr.WordDelimiterFilterFactory

2012-04-12 Thread Erick Erickson
WordDelimiterFilterFactory will _almost_ do what you want by setting things like catenateWords=0 and catenateNumbers=1, _except_ that the punctuation will be removed. So 12.34 -> 1234 ab,cd -> ab cd is that "close enough"? Otherwise, writing a simple Filter is probably the way to go. Best Erick

Question about solr.WordDelimiterFilterFactory

2012-04-11 Thread Jian Xu
Hello, I am new to solr/lucene. I am tasked to index a large number of documents. Some of these documents contain decimal points. I am looking for a way to index these documents so that adjacent numeric characters (such as [0-9.,]) are treated as single token. For example, 12.34 => "12.34" 12,