Thanks for your response. For TermsComponent, I am able to get a list of all terms in a field that have a document frequency under a certain threshold, but I was wondering if I could instead pass a list of terms, and get back only the terms from that list that have a document frequency under a certain threshold in a field. I can't find an easy way to do this, do you know if this is possible?
Thanks, Steve -----Original Message----- From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] Sent: Saturday, August 1, 2015 6:35 AM To: solr-user <solr-user@lucene.apache.org> Subject: Re: Do not match on high frequency terms It seems like you need to develop custom query or query parser. Regarding SolrJ: you can try to call http://wiki.apache.org/solr/TermsComponent https://cwiki.apache.org/confluence/display/solr/The+Terms+Component I'm not sure how exactly call TermsComponent in SolrJ, I just found https://lucene.apache.org/solr/5_2_1/solr-solrj/org/apache/solr/client/solrj/response/TermsResponse.html to read its' response. On Fri, Jul 31, 2015 at 11:31 PM, Swedish, Steve <steve.swed...@noblis.org> wrote: > Hello, > > I'm hoping someone might be able to help me out with this as I do not > have very much solr experience. Basically, I am wondering if it is > possible to not match on terms that have a document frequency above a > certain threshold. For my situation, a stop word list will be > unrealistic to maintain, so I was wondering if there may be an > alternative solution using term document frequency to identify common terms. > > What would actually be ideal is if I could somehow use the > CommonTermsQuery. The problem I ran across when looking at this option > was that the CommonTermsQuery seems to only work for queries on one > field at a time (unless I'm mistaken). However, I have a query of the > structure > q=(field1:(blah) AND (field2:(blah) OR field3:(blah))) OR > field1:(blah) OR > (field2:(blah) AND field3:(blah)). If there are any ideas on how to > use the CommonTermsQuery with this query structure, that would be great. > > If it's possible to extract the document frequency for terms in my > query before the query is run, allowing me to remove the high > frequency terms from the query first, that could also be a valid > solution. I'm using solrj as well, so a solution that works with solrj would > be appreciated. > > Thanks, > Steve > -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com> <mkhlud...@griddynamics.com>