Hello, I am new to solr/lucene. I am tasked to index a large number of documents. Some of these documents contain decimal points. I am looking for a way to index these documents so that adjacent numeric characters (such as [0-9.,]) are treated as single token. For example,
12.34 => "12.34" 12,345 => "12,345" However, "," and "." should be treated as usual when around non-digital characters. For example, ab,cd => "ab" "cd". It is so that searching for "12.34" will match "12.34" not "12 34". Searching for "ab.cd" should match both "ab.cd" and "ab cd". After doing some research on solr, It seems that there is a build-in analyzer called solr.WordDelimiterFilter that supports a "types" attribute which map special characters as different delimiters. However, it isn't exactly what I want. It doesn't provide context check such as "," or "." must surround by digital characters, etc. Does anyone have any experience configuring solr to meet this requirements? Is writing my own plugin necessary for this simple thing? Thanks in advance! -Jian