RE: Do not match on high frequency terms

Swedish, Steve Mon, 03 Aug 2015 14:29:26 -0700

Thanks for your response. For TermsComponent, I am able to get a list of all 
terms in a field that have a document frequency under a certain threshold, but 
I was wondering if I could instead pass a list of terms, and get back only the 
terms from that list that have a document frequency under a certain threshold 
in a field. I can't find an easy way to do this, do you know if this is 
possible?


Thanks,
Steve

-----Original Message-----
From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] 
Sent: Saturday, August 1, 2015 6:35 AM
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Do not match on high frequency terms

It seems like you need to develop custom query or query parser. Regarding
SolrJ: you can try to call http://wiki.apache.org/solr/TermsComponent
https://cwiki.apache.org/confluence/display/solr/The+Terms+Component I'm not 
sure how exactly call TermsComponent in SolrJ, I just found 
https://lucene.apache.org/solr/5_2_1/solr-solrj/org/apache/solr/client/solrj/response/TermsResponse.html
to read its' response.

On Fri, Jul 31, 2015 at 11:31 PM, Swedish, Steve <steve.swed...@noblis.org>
wrote:

> Hello,
>
> I'm hoping someone might be able to help me out with this as I do not 
> have very much solr experience. Basically, I am wondering if it is 
> possible to not match on terms that have a document frequency above a 
> certain threshold. For my situation, a stop word list will be 
> unrealistic to maintain, so I was wondering if there may be an 
> alternative solution using term document frequency to identify common terms.
>
> What would actually be ideal is if I could somehow use the 
> CommonTermsQuery. The problem I ran across when looking at this option 
> was that the CommonTermsQuery seems to only work for queries on one 
> field at a time (unless I'm mistaken). However, I have a query of the 
> structure
> q=(field1:(blah) AND (field2:(blah) OR field3:(blah))) OR 
> field1:(blah) OR
> (field2:(blah) AND field3:(blah)). If there are any ideas on how to 
> use the CommonTermsQuery with this query structure, that would be great.
>
> If it's possible to extract the document frequency for terms in my 
> query before the query is run, allowing me to remove the high 
> frequency terms from the query first, that could also be a valid 
> solution. I'm using solrj as well, so a solution that works with solrj would 
> be appreciated.
>
> Thanks,
> Steve
>



--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mkhlud...@griddynamics.com>

RE: Do not match on high frequency terms

Reply via email to