Re: Using SOLR

Erick Erickson Wed, 10 Mar 2010 17:25:44 -0800

Luke won't help you "retrieve the matched taxonomy",
it just lets you look at your index and run queries against
it....


WARNING: I haven't personally used MoreLikeThis
functionality, but it sounds like that's at least in
the ballpark if you consider your Taxonomy a document
and want the list of documents that are similar...

I don't know how you'd get the matches though...

er...@notverymuchhelp.


2010/3/10 CP Hennessy <cp.henne...@openapp.ie>

> Hi,
>
> I may not have stated my aim clearly enough or in case I'm using the wrong
> terms, I'll restate what I want to be able to do:
>
> - I have a fixed set of words and phrases some of which I expect to find in
> the documents I want to process. This set I call my taxonomy.
>
> - I have many documents to process from which the primary thing I want to
> be
> able to do is to match the content with my taxonomy.
>
> - For every document processed I need to retrieve the matches with the
> taxonomy.
>
> If I was able to do the above I'd be quite happy.
>
> Is this the type of thing that solr is well matched to do ?
>
> If so, then is luke the right mechanism to retrieve the matched taxonomy
> terms.
>
> Thanks,
> CPH
>
>
> On Wed 10 Mar 2010 01:17:25 Erick Erickson wrote:
> > Well, the LukeRequestHandler lets you peek at the
> > index, see:
> > http://wiki.apache.org/solr/LukeRequestHandler
> >
> > warning: it'll take a bit for this to make lots of sense.
> >
> > You can get a copy of Luke (google Lucene Luke) for
> > what the above is based on, point it at your index and
> > have at it.
> >
> > One bit of warning though. It'll be easy to confuse
> > what you stored (which is just a raw copy of
> > your input) with what you indexed (which is
> > what's searched on). If you're looking at either tool
> > and what you see looks suspiciously like
> > your raw data, look further to see it you can find
> > the terms...
> >
> > To answer your question about searching, it all depends
> > (tm). What do you mean by Taxonomy? Different
> > people use that term...er...differently. Some example
> > inputs and how searching should behave in your
> > problem space would be very helpful.
> >
> > HTH
> > Erick
> >
> > On Tue, Mar 9, 2010 at 7:53 PM, CP Hennessy <cp.henne...@openapp.ie>
> wrote:
> > > Hi,
> > >
> > >  I'm trying to figure out if SOLR is the component I need and if so
> that
> > >
> > > I'm asking the right questions :)
> > >
> > > I need to index a large set of multilingual documents against a project
> > > specific taxonomy.
> > >
> > > From what I've read SOLR should be perfect for this.
> > >
> > > However I'm not sure that my approach is correct. I've been able to run
> > > the example solr setup and index the given documents.
> > >
> > > Now I want to add my taxonomy (in English first), and this is where I'm
> > > stumbling (or not understanding the documentation).
> > >
> > > To do this I understand that I need to define a field to store the
> result
> > > of
> > > the taxonomy analysis. I also need to define the analysis steps used to
> > > generate the values for this field ( lowercase, synonyms, stemming,
> etc).
> > >
> > > In the file solr/conf/schema.xml in the <types> I've added :
> > >    <fieldType name="Taxonomy" class="solr.TextField" indexed="True">
> > >
> > >      <analyzer type="index">
> > >
> > >        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > >        <filter class="solr.LowerCaseFilterFactory"/>
> > >        <filter class="solr.SynonymFilterFactory" synonyms="ontology-
> > >
> > > synonyms.txt" ignoreCase="true" expand="true"/>
> > >
> > >        <filter class="solr.SnowballPorterFilterFactory"
> > >
> > > language="English"/>
> > >
> > >        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> > >        <filter class="solr.KeepWordFilterFactory" words="keepwords.txt"
> > >
> > > ignoreCase="true"/>
> > >
> > >      </analyzer>
> > >
> > >    </fieldType>
> > >
> > > and
> > >
> > >   <field name="taxonomy" type="Taxonomy" indexed="true" stored="true"
> > >
> > > required="true" multiValued="true"/>
> > >
> > > I am able to test my fieldType thru the /solr/admin/analysis.jsp page
> and
> > > it
> > > seems to be doing what I expect.
> > >
> > > When I now add a test document containing several words from the
> > > keepwords.txt
> > > file the result seems to indicate that it was processed correctly.
> > >
> > > How can I get the details of what has been indexed for my file?
> > >
> > >
> > > Also I do not know how to perform a search based on the taxonomy ?
> > >
> > > Any pointers would be greatly appreciated.
> > >
> > > Thanks in advance,
> > > CPH
>

Re: Using SOLR

Reply via email to