Hi, I'm trying to figure out if SOLR is the component I need and if so that I'm asking the right questions :)
I need to index a large set of multilingual documents against a project specific taxonomy. From what I've read SOLR should be perfect for this. However I'm not sure that my approach is correct. I've been able to run the example solr setup and index the given documents. Now I want to add my taxonomy (in English first), and this is where I'm stumbling (or not understanding the documentation). To do this I understand that I need to define a field to store the result of the taxonomy analysis. I also need to define the analysis steps used to generate the values for this field ( lowercase, synonyms, stemming, etc). In the file solr/conf/schema.xml in the <types> I've added : <fieldType name="Taxonomy" class="solr.TextField" indexed="True"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="ontology- synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.SnowballPorterFilterFactory" language="English"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> <filter class="solr.KeepWordFilterFactory" words="keepwords.txt" ignoreCase="true"/> </analyzer> </fieldType> and <field name="taxonomy" type="Taxonomy" indexed="true" stored="true" required="true" multiValued="true"/> I am able to test my fieldType thru the /solr/admin/analysis.jsp page and it seems to be doing what I expect. When I now add a test document containing several words from the keepwords.txt file the result seems to indicate that it was processed correctly. How can I get the details of what has been indexed for my file? Also I do not know how to perform a search based on the taxonomy ? Any pointers would be greatly appreciated. Thanks in advance, CPH