Hi, I may not have stated my aim clearly enough or in case I'm using the wrong terms, I'll restate what I want to be able to do:
- I have a fixed set of words and phrases some of which I expect to find in the documents I want to process. This set I call my taxonomy. - I have many documents to process from which the primary thing I want to be able to do is to match the content with my taxonomy. - For every document processed I need to retrieve the matches with the taxonomy. If I was able to do the above I'd be quite happy. Is this the type of thing that solr is well matched to do ? If so, then is luke the right mechanism to retrieve the matched taxonomy terms. Thanks, CPH On Wed 10 Mar 2010 01:17:25 Erick Erickson wrote: > Well, the LukeRequestHandler lets you peek at the > index, see: > http://wiki.apache.org/solr/LukeRequestHandler > > warning: it'll take a bit for this to make lots of sense. > > You can get a copy of Luke (google Lucene Luke) for > what the above is based on, point it at your index and > have at it. > > One bit of warning though. It'll be easy to confuse > what you stored (which is just a raw copy of > your input) with what you indexed (which is > what's searched on). If you're looking at either tool > and what you see looks suspiciously like > your raw data, look further to see it you can find > the terms... > > To answer your question about searching, it all depends > (tm). What do you mean by Taxonomy? Different > people use that term...er...differently. Some example > inputs and how searching should behave in your > problem space would be very helpful. > > HTH > Erick > > On Tue, Mar 9, 2010 at 7:53 PM, CP Hennessy <cp.henne...@openapp.ie> wrote: > > Hi, > > > > I'm trying to figure out if SOLR is the component I need and if so that > > > > I'm asking the right questions :) > > > > I need to index a large set of multilingual documents against a project > > specific taxonomy. > > > > From what I've read SOLR should be perfect for this. > > > > However I'm not sure that my approach is correct. I've been able to run > > the example solr setup and index the given documents. > > > > Now I want to add my taxonomy (in English first), and this is where I'm > > stumbling (or not understanding the documentation). > > > > To do this I understand that I need to define a field to store the result > > of > > the taxonomy analysis. I also need to define the analysis steps used to > > generate the values for this field ( lowercase, synonyms, stemming, etc). > > > > In the file solr/conf/schema.xml in the <types> I've added : > > <fieldType name="Taxonomy" class="solr.TextField" indexed="True"> > > > > <analyzer type="index"> > > > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > <filter class="solr.LowerCaseFilterFactory"/> > > <filter class="solr.SynonymFilterFactory" synonyms="ontology- > > > > synonyms.txt" ignoreCase="true" expand="true"/> > > > > <filter class="solr.SnowballPorterFilterFactory" > > > > language="English"/> > > > > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > > <filter class="solr.KeepWordFilterFactory" words="keepwords.txt" > > > > ignoreCase="true"/> > > > > </analyzer> > > > > </fieldType> > > > > and > > > > <field name="taxonomy" type="Taxonomy" indexed="true" stored="true" > > > > required="true" multiValued="true"/> > > > > I am able to test my fieldType thru the /solr/admin/analysis.jsp page and > > it > > seems to be doing what I expect. > > > > When I now add a test document containing several words from the > > keepwords.txt > > file the result seems to indicate that it was processed correctly. > > > > How can I get the details of what has been indexed for my file? > > > > > > Also I do not know how to perform a search based on the taxonomy ? > > > > Any pointers would be greatly appreciated. > > > > Thanks in advance, > > CPH