Thanks for the suggestions, I'll try with the WordDelimiterFilterFactory. My aim is not to have a perfect analysis, just a way to quick search for words in the whole history of a codebase. :)
-- Gian Maria Ricci Mobile: +39 320 0136949 <http://mvp.microsoft.com/en-us/mvp/Gian%20Maria%20Ricci-4025635> <http://www.linkedin.com/in/gianmariaricci> <https://twitter.com/alkampfer> <http://feeds.feedburner.com/AlkampferEng> <skype://alkampferaok/> From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Thursday, June 13, 2013 1:24 PM To: solr-user@lucene.apache.org; Gian Maria Ricci Subject: Re: analyzer for Code Well, WordDelimiterFilterFactory would split on the punctuation, so you could add it to the analyzer chain along with StandardAnalyzer. You could use one of the regex filters to break up tokens that make it through the analyzer as you see fit. But in general, this will be a bunch of compromises since programming languages are, shall we say, not standard <G> Best Erick On Thu, Jun 13, 2013 at 4:19 AM, Gian Maria Ricci <alkamp...@nablasoft.com <mailto:alkamp...@nablasoft.com> > wrote: I did a little search around and did not find anything interesting. Anyone know if some analyzers exists to better index source code (es C#, C++. Java etc)? Standard analyzer is quite good, but I wish to know if there are some more specific analyzers that can do a better indexing. Es I did a little try with C# and the full class name was indexed without splitting by dots. So MyLib.Helpers.Myclass becomes one token and when I search for MyClass I did not find matches. Thanks in advance. -- Gian Maria Ricci Mobile: +39 320 0136949 <tel:%2B39%20320%200136949> <http://mvp.microsoft.com/en-us/mvp/Gian%20Maria%20Ricci-4025635>