Luke won't help you "retrieve the matched taxonomy", it just lets you look at your index and run queries against it....
WARNING: I haven't personally used MoreLikeThis functionality, but it sounds like that's at least in the ballpark if you consider your Taxonomy a document and want the list of documents that are similar... I don't know how you'd get the matches though... er...@notverymuchhelp. 2010/3/10 CP Hennessy <cp.henne...@openapp.ie> > Hi, > > I may not have stated my aim clearly enough or in case I'm using the wrong > terms, I'll restate what I want to be able to do: > > - I have a fixed set of words and phrases some of which I expect to find in > the documents I want to process. This set I call my taxonomy. > > - I have many documents to process from which the primary thing I want to > be > able to do is to match the content with my taxonomy. > > - For every document processed I need to retrieve the matches with the > taxonomy. > > If I was able to do the above I'd be quite happy. > > Is this the type of thing that solr is well matched to do ? > > If so, then is luke the right mechanism to retrieve the matched taxonomy > terms. > > Thanks, > CPH > > > On Wed 10 Mar 2010 01:17:25 Erick Erickson wrote: > > Well, the LukeRequestHandler lets you peek at the > > index, see: > > http://wiki.apache.org/solr/LukeRequestHandler > > > > warning: it'll take a bit for this to make lots of sense. > > > > You can get a copy of Luke (google Lucene Luke) for > > what the above is based on, point it at your index and > > have at it. > > > > One bit of warning though. It'll be easy to confuse > > what you stored (which is just a raw copy of > > your input) with what you indexed (which is > > what's searched on). If you're looking at either tool > > and what you see looks suspiciously like > > your raw data, look further to see it you can find > > the terms... > > > > To answer your question about searching, it all depends > > (tm). What do you mean by Taxonomy? Different > > people use that term...er...differently. Some example > > inputs and how searching should behave in your > > problem space would be very helpful. > > > > HTH > > Erick > > > > On Tue, Mar 9, 2010 at 7:53 PM, CP Hennessy <cp.henne...@openapp.ie> > wrote: > > > Hi, > > > > > > I'm trying to figure out if SOLR is the component I need and if so > that > > > > > > I'm asking the right questions :) > > > > > > I need to index a large set of multilingual documents against a project > > > specific taxonomy. > > > > > > From what I've read SOLR should be perfect for this. > > > > > > However I'm not sure that my approach is correct. I've been able to run > > > the example solr setup and index the given documents. > > > > > > Now I want to add my taxonomy (in English first), and this is where I'm > > > stumbling (or not understanding the documentation). > > > > > > To do this I understand that I need to define a field to store the > result > > > of > > > the taxonomy analysis. I also need to define the analysis steps used to > > > generate the values for this field ( lowercase, synonyms, stemming, > etc). > > > > > > In the file solr/conf/schema.xml in the <types> I've added : > > > <fieldType name="Taxonomy" class="solr.TextField" indexed="True"> > > > > > > <analyzer type="index"> > > > > > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > > <filter class="solr.LowerCaseFilterFactory"/> > > > <filter class="solr.SynonymFilterFactory" synonyms="ontology- > > > > > > synonyms.txt" ignoreCase="true" expand="true"/> > > > > > > <filter class="solr.SnowballPorterFilterFactory" > > > > > > language="English"/> > > > > > > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > > > <filter class="solr.KeepWordFilterFactory" words="keepwords.txt" > > > > > > ignoreCase="true"/> > > > > > > </analyzer> > > > > > > </fieldType> > > > > > > and > > > > > > <field name="taxonomy" type="Taxonomy" indexed="true" stored="true" > > > > > > required="true" multiValued="true"/> > > > > > > I am able to test my fieldType thru the /solr/admin/analysis.jsp page > and > > > it > > > seems to be doing what I expect. > > > > > > When I now add a test document containing several words from the > > > keepwords.txt > > > file the result seems to indicate that it was processed correctly. > > > > > > How can I get the details of what has been indexed for my file? > > > > > > > > > Also I do not know how to perform a search based on the taxonomy ? > > > > > > Any pointers would be greatly appreciated. > > > > > > Thanks in advance, > > > CPH >