Hi,
  I'm trying to figure out if SOLR is the component I need and if so that 
I'm asking the right questions :)

I need to index a large set of multilingual documents against a project 
specific taxonomy. 

From what I've read SOLR should be perfect for this. 

However I'm not sure that my approach is correct. I've been able to run the 
example solr setup and index the given documents. 

Now I want to add my taxonomy (in English first), and this is where I'm 
stumbling (or not understanding the documentation).

To do this I understand that I need to define a field to store the result of 
the taxonomy analysis. I also need to define the analysis steps used to 
generate the values for this field ( lowercase, synonyms, stemming, etc).

In the file solr/conf/schema.xml in the <types> I've added :

    <fieldType name="Taxonomy" class="solr.TextField" indexed="True">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="ontology-
synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        <filter class="solr.KeepWordFilterFactory" words="keepwords.txt" 
ignoreCase="true"/>
      </analyzer>
    </fieldType>

and 

   <field name="taxonomy" type="Taxonomy" indexed="true" stored="true" 
required="true" multiValued="true"/>

I am able to test my fieldType thru the /solr/admin/analysis.jsp page and it 
seems to be doing what I expect. 

When I now add a test document containing several words from the keepwords.txt 
file the result seems to indicate that it was processed correctly.  

How can I get the details of what has been indexed for my file?


Also I do not know how to perform a search based on the taxonomy ?

Any pointers would be greatly appreciated.

Thanks in advance,
CPH

Reply via email to