> Thank you.
> 
> While interesting what I'm really after is a programmatic
> way to get at
> multi-word terms and their frequencies from a given
> document.  
> 
> Is this possible?
> 

What do you mean by programmatic way? You mean without indexing? Multi-word 
terms means phrases right? Like "tap water"?

you can use this field type to index your documents.

 <fieldType name="shingle_text" class="solr.TextField" 
positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
        <filter class="solr.ShingleFilterFactory" maxShingleSize="2" 
outputUnigrams="false"/>
      </analyzer>
    </fieldType>

and if you register TermsComponent in solrconfig.xml by doing:

<searchComponent name="termsComponent" 
class="org.apache.solr.handler.component.TermsComponent"/>

<requestHandler name="/terms" 
class="org.apache.solr.handler.component.SearchHandler">

<lst name="defaults">
<bool name="terms">true</bool>
<str name="terms.fl">shingle_text_field</str>
</lst>
<arr name="components">
<str>termsComponent</str>
</arr>
</requestHandler>

http://localhost:8983/solr/terms will give you multi-word terms sorted by term 
frequency. Also you can use TermVectorComponent to get term frequencies of 
multi-terms of a particular document. 
Additionally admin/schema.jsp shows top n terms if you want.



Reply via email to