> Thank you. > > While interesting what I'm really after is a programmatic > way to get at > multi-word terms and their frequencies from a given > document. > > Is this possible? >
What do you mean by programmatic way? You mean without indexing? Multi-word terms means phrases right? Like "tap water"? you can use this field type to index your documents. <fieldType name="shingle_text" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.LowerCaseTokenizerFactory"/> <filter class="solr.ShingleFilterFactory" maxShingleSize="2" outputUnigrams="false"/> </analyzer> </fieldType> and if you register TermsComponent in solrconfig.xml by doing: <searchComponent name="termsComponent" class="org.apache.solr.handler.component.TermsComponent"/> <requestHandler name="/terms" class="org.apache.solr.handler.component.SearchHandler"> <lst name="defaults"> <bool name="terms">true</bool> <str name="terms.fl">shingle_text_field</str> </lst> <arr name="components"> <str>termsComponent</str> </arr> </requestHandler> http://localhost:8983/solr/terms will give you multi-word terms sorted by term frequency. Also you can use TermVectorComponent to get term frequencies of multi-terms of a particular document. Additionally admin/schema.jsp shows top n terms if you want.