Hi,

I may not have stated my aim clearly enough or in case I'm using the wrong 
terms, I'll restate what I want to be able to do:

- I have a fixed set of words and phrases some of which I expect to find in 
the documents I want to process. This set I call my taxonomy.

- I have many documents to process from which the primary thing I want to be 
able to do is to match the content with my taxonomy.

- For every document processed I need to retrieve the matches with the 
taxonomy.

If I was able to do the above I'd be quite happy. 

Is this the type of thing that solr is well matched to do ?

If so, then is luke the right mechanism to retrieve the matched taxonomy 
terms.

Thanks,
CPH


On Wed 10 Mar 2010 01:17:25 Erick Erickson wrote:
> Well, the LukeRequestHandler lets you peek at the
> index, see:
> http://wiki.apache.org/solr/LukeRequestHandler
> 
> warning: it'll take a bit for this to make lots of sense.
> 
> You can get a copy of Luke (google Lucene Luke) for
> what the above is based on, point it at your index and
> have at it.
> 
> One bit of warning though. It'll be easy to confuse
> what you stored (which is just a raw copy of
> your input) with what you indexed (which is
> what's searched on). If you're looking at either tool
> and what you see looks suspiciously like
> your raw data, look further to see it you can find
> the terms...
> 
> To answer your question about searching, it all depends
> (tm). What do you mean by Taxonomy? Different
> people use that term...er...differently. Some example
> inputs and how searching should behave in your
> problem space would be very helpful.
> 
> HTH
> Erick
> 
> On Tue, Mar 9, 2010 at 7:53 PM, CP Hennessy <cp.henne...@openapp.ie> wrote:
> > Hi,
> > 
> >  I'm trying to figure out if SOLR is the component I need and if so that
> > 
> > I'm asking the right questions :)
> > 
> > I need to index a large set of multilingual documents against a project
> > specific taxonomy.
> > 
> > From what I've read SOLR should be perfect for this.
> > 
> > However I'm not sure that my approach is correct. I've been able to run
> > the example solr setup and index the given documents.
> > 
> > Now I want to add my taxonomy (in English first), and this is where I'm
> > stumbling (or not understanding the documentation).
> > 
> > To do this I understand that I need to define a field to store the result
> > of
> > the taxonomy analysis. I also need to define the analysis steps used to
> > generate the values for this field ( lowercase, synonyms, stemming, etc).
> > 
> > In the file solr/conf/schema.xml in the <types> I've added :
> >    <fieldType name="Taxonomy" class="solr.TextField" indexed="True">
> >    
> >      <analyzer type="index">
> >      
> >        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >        <filter class="solr.LowerCaseFilterFactory"/>
> >        <filter class="solr.SynonymFilterFactory" synonyms="ontology-
> > 
> > synonyms.txt" ignoreCase="true" expand="true"/>
> > 
> >        <filter class="solr.SnowballPorterFilterFactory"
> > 
> > language="English"/>
> > 
> >        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> >        <filter class="solr.KeepWordFilterFactory" words="keepwords.txt"
> > 
> > ignoreCase="true"/>
> > 
> >      </analyzer>
> >    
> >    </fieldType>
> > 
> > and
> > 
> >   <field name="taxonomy" type="Taxonomy" indexed="true" stored="true"
> > 
> > required="true" multiValued="true"/>
> > 
> > I am able to test my fieldType thru the /solr/admin/analysis.jsp page and
> > it
> > seems to be doing what I expect.
> > 
> > When I now add a test document containing several words from the
> > keepwords.txt
> > file the result seems to indicate that it was processed correctly.
> > 
> > How can I get the details of what has been indexed for my file?
> > 
> > 
> > Also I do not know how to perform a search based on the taxonomy ?
> > 
> > Any pointers would be greatly appreciated.
> > 
> > Thanks in advance,
> > CPH

Reply via email to