There's a particular confusion I've had with the Solr schema and plugins, Though this stuff is "obvious" to the gurus, looking around I guess I wasn't alone in my confusion.
I believe I understand it now and wanted to capture that on the Wiki, but just double checking.... and maybe the gurus would have some additional comments? Two Syntaxes AND Two Plugin Sets There is an abbreviated syntax for specifying plugins in the schema, but there is a more powerful syntax that is preferred. Also, Solr supports both solr-specific plugins, and is also compatible with Lucene plugins. Solr plugins use the more more modern longer syntax, but Lucene plugins generally must use the abbreviated syntax OR use a custom adapter class. These two differences tend to coincide. Solr plugins use the longer, more powerful syntax, whereas Lucene plugins tend to use the shorter syntax (or an adapter, see below). Two Syntaxes for Defining Field Type Plugins: Abbreviated Syntax: <fieldType name=... class=...> <analyzer class="....SomeAnalyer" /> <!-- Do not put additional plugins here --> </fieldType> Modern Syntax: <fieldType name=... class=...> <analyzer> <tokenizer class="....SomeTokenizer" /> <filter class="....SomeFilter" /> <!-- other filters ... --> </analyzer> </fieldType> Of course you can have multiple <analyzer> blocks in the newer syntax, one for index time and one for search. And the filters can have options, etc. This is confusing because the <analyzer> tag can EITHER have a class= attribute OR nested subelements, usually of type <tokenizer> and <filter>. You should not do both! Futher, the main <fieldType> element also takes a class attribute, which is required, but this is a separate class (...could use some narrative as to why....) Two Common Sources of Plugins: When looking at schema configurations you find online, it's very important to notice the prefixes in the class name. Classes starting with "org.apache.solr.analysis." or the shorthand "solr." are Solr specific, and will use the longhand syntax. Classes starting with "org.apache.lucene.analysis." are NOT native Solr plugins and must EITHER use the short hand syntax (which limits your functionality), or you need to add a custom adapter class. This is generally a good thing. There are quite a few Lucene plugins out there, and Solr can use any of them "out of the box" without the need for breaking out a Java compiler. However, when used in this compatibility mode, you give up some functionality. And you can't just use the longer syntax with the Lucene plugins. The advanced syntax isn't directly compatible (at this time). If you want the advantages of the long form syntax you need to use a Lucene to Solr adapater class, often called a "factory" class. Examples of Right and Wrong Configurations. Asian language Solr users will often want to use the CJK processor (CJK = Chinese, Japanese and Korean). They will typically use the base Lucene plugin, but in various configurations. Examples using CJK Plugins: <!-- Correct Short form using Lucene compatible syntax --> <fieldType name="text_cjk" class="solr.TextField"> <analyzer class="org.apache.lucene. analysis.cjk.CJKAnalyzer"/> </fieldType> <!-- Incorrect attempt to use long form with Lucene plugins --> <fieldType name="text_cjk" class="solr.TextField"> <analyzer class="org.apache.lucene.analysis.cjk.CJKAnalyzer"/> <!-- Wrong: won't be used! --> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <!-- ... other filters ... --> </fieldType> <!-- Correct Long Form syntax for Lucene plugins THAT HAVE AN ADAPTER --> <fieldType name="text_cjk" class="solr.TextField"> <analyzer> <!-- This ONLY works if you have an adapter class --> <tokenizer class="solr.CJKTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <!-- ... other filters ... --> </analyzer> </fieldType> There is a nice thread about the adapter class you need. Later on in the thread the discussion evolves into whether or not to make an "uber" Lucene class loader, and the performance impact that might have here: http://www.mail-archive.com/solr-user@lucene.apache.org/msg04487.html -- Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513