Hello,

I am new to solr/lucene. I am tasked to index a large number of documents. Some 
of these documents contain decimal points. I am looking for a way to index 
these documents so that adjacent numeric characters (such as [0-9.,]) are 
treated as single token. For example,

12.34 => "12.34"
12,345 => "12,345"

However, "," and "." should be treated as usual when around non-digital 
characters. For example,

ab,cd => "ab" "cd".

It is so that searching for "12.34" will match "12.34" not "12 34". Searching 
for "ab.cd" should match both "ab.cd" and "ab cd".

After doing some research on solr, It seems that there is a build-in analyzer 
called solr.WordDelimiterFilter that supports a "types" attribute which map 
special characters as different delimiters.  However, it isn't exactly what I 
want. It doesn't provide context check such as "," or "." must surround by 
digital characters, etc. 

Does anyone have any experience configuring solr to meet this requirements?  Is 
writing my own plugin necessary for this simple thing?

Thanks in advance!

-Jian

Reply via email to