Thanks for the suggestions, I'll try with the WordDelimiterFilterFactory. My
aim is not to have a perfect analysis, just a way to quick search for words
in the whole history of a codebase. :)

 

--

Gian Maria Ricci

Mobile: +39 320 0136949

 <http://mvp.microsoft.com/en-us/mvp/Gian%20Maria%20Ricci-4025635>
<http://www.linkedin.com/in/gianmariaricci>
<https://twitter.com/alkampfer>   <http://feeds.feedburner.com/AlkampferEng>
<skype://alkampferaok/> 

 

 

From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, June 13, 2013 1:24 PM
To: solr-user@lucene.apache.org; Gian Maria Ricci
Subject: Re: analyzer for Code

 

Well, WordDelimiterFilterFactory would split on the punctuation, so

you could add it to the analyzer chain along with StandardAnalyzer.

 

You could use one of the regex filters to break up tokens that make it

through the analyzer as you see fit.

 

But in general, this will be a bunch of compromises since programming

languages are, shall we say, not standard <G>

 

Best

Erick

 

On Thu, Jun 13, 2013 at 4:19 AM, Gian Maria Ricci <alkamp...@nablasoft.com
<mailto:alkamp...@nablasoft.com> > wrote:

I did a little search around and did not find anything interesting. Anyone
know if some analyzers exists to better index source code (es C#, C++. Java
etc)?

 

Standard analyzer is quite good, but I wish to know if there are some more
specific analyzers that can do a better indexing. Es I did a little try with
C# and the full class name was indexed without splitting by dots. So
MyLib.Helpers.Myclass becomes one token and when I search for MyClass I did
not find matches. 

 

Thanks in advance.

 

--

Gian Maria Ricci

Mobile: +39 320 0136949 <tel:%2B39%20320%200136949> 

 <http://mvp.microsoft.com/en-us/mvp/Gian%20Maria%20Ricci-4025635>    


 

 



 

Reply via email to