Well, the LukeRequestHandler lets you peek at the index, see: http://wiki.apache.org/solr/LukeRequestHandler
warning: it'll take a bit for this to make lots of sense. You can get a copy of Luke (google Lucene Luke) for what the above is based on, point it at your index and have at it. One bit of warning though. It'll be easy to confuse what you stored (which is just a raw copy of your input) with what you indexed (which is what's searched on). If you're looking at either tool and what you see looks suspiciously like your raw data, look further to see it you can find the terms... To answer your question about searching, it all depends (tm). What do you mean by Taxonomy? Different people use that term...er...differently. Some example inputs and how searching should behave in your problem space would be very helpful. HTH Erick On Tue, Mar 9, 2010 at 7:53 PM, CP Hennessy <cp.henne...@openapp.ie> wrote: > Hi, > I'm trying to figure out if SOLR is the component I need and if so that > I'm asking the right questions :) > > I need to index a large set of multilingual documents against a project > specific taxonomy. > > From what I've read SOLR should be perfect for this. > > However I'm not sure that my approach is correct. I've been able to run the > example solr setup and index the given documents. > > Now I want to add my taxonomy (in English first), and this is where I'm > stumbling (or not understanding the documentation). > > To do this I understand that I need to define a field to store the result > of > the taxonomy analysis. I also need to define the analysis steps used to > generate the values for this field ( lowercase, synonyms, stemming, etc). > > In the file solr/conf/schema.xml in the <types> I've added : > > <fieldType name="Taxonomy" class="solr.TextField" indexed="True"> > <analyzer type="index"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.SynonymFilterFactory" synonyms="ontology- > synonyms.txt" ignoreCase="true" expand="true"/> > <filter class="solr.SnowballPorterFilterFactory" > language="English"/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > <filter class="solr.KeepWordFilterFactory" words="keepwords.txt" > ignoreCase="true"/> > </analyzer> > </fieldType> > > and > > <field name="taxonomy" type="Taxonomy" indexed="true" stored="true" > required="true" multiValued="true"/> > > I am able to test my fieldType thru the /solr/admin/analysis.jsp page and > it > seems to be doing what I expect. > > When I now add a test document containing several words from the > keepwords.txt > file the result seems to indicate that it was processed correctly. > > How can I get the details of what has been indexed for my file? > > > Also I do not know how to perform a search based on the taxonomy ? > > Any pointers would be greatly appreciated. > > Thanks in advance, > CPH >