Hi, I am looking for the best way to find the terms with the highest frequency for a given subset of documents. (terms in the text field) My first thought was to do a count facet search , where the query defines the subset of documents and the facet.field is the text field, this gives me the result but it is very very slow. These are my params: <str name="facet">true</str> <str name="facet.offset">0</str> <str name="facet.mincount">3</str> <str name="indent">on</str> <str name="facet.limit">500</str> <str name="facet.method">enum</str> <str name="wt">xml</str> <str name="rows">0</str> <str name="version">2.2</str> <str name="facet.sort">count</str> <str name="q">in_subset:1</str> <str name="facet.field">text</str> </lst>
The index contains 7M documents, the subset is about 200K. A simple query for the subset takes around 100ms, but the facet search takes 40s. Am i doing something wrong? If facet search is not the correct approach, i thought about using something like org.apache.lucene.misc.HighFreqTerms, but i'm not sure how to do this in solr. Should i implememt a request handler that executes this kind of code? thanks for any help