PS - I found that termfreq() actually returns the raw tf, i.e. an integer for each document. However, I have to get the request and add them up on my end.
Unfortunately totaltermfreq() sums the similarity-modified tf values. Is there a way to just get the sum of the termfreq() values? Akos (Aki) Balogh M: 617-682-0066 Co-Founder, MarketMuse https://www.MarketMuse.com On Wed, Feb 4, 2015 at 4:58 PM, Aki Balogh <a...@marketmuse.com> wrote: > Is there a way to set solr to only return raw tf (i.e. by maybe turning > off the DefaultSimilarity), so I could use ttf() to get the sum of raw tf > values? > > Or do I need to parse each tf value, square it and add them up in > post-processing? > > > Thx, > Aki > > On Wed, Feb 4, 2015 at 4:39 PM, Ahmet Arslan <iori...@yahoo.com.invalid> > wrote: > >> Hi, >> >> So you want raw tf. tf method implemented as square root of raw tf. So >> you can re-obtain it by reverse operation. >> 1.424 * 1.424 = 2.02 = int = 2 >> >> Ahmet >> >> >> >> >> On Wednesday, February 4, 2015 11:31 PM, Aki Balogh <a...@marketmuse.com> >> wrote: >> Hi Ahmet, >> >> Thank you for your idea, very helpful. I can indeed get tf values through >> the tf and ttf function queries. >> >> Since tf uses Similarity, I'm getting back some floats (i.e. "dog occurs >> 1.424 times"), when I was expecting ints. >> Is there a way to get back ints (simple word count)? >> >> Thanks, >> Aki >> >> >> >> On Wed, Feb 4, 2015 at 3:41 PM, Ahmet Arslan <iori...@yahoo.com.invalid> >> wrote: >> >> > Hi Aki, >> > >> > How about tf function query? >> > https://cwiki.apache.org/confluence/display/solr/Function+Queries >> > >> > Ahmet >> > >> > >> > >> > On Wednesday, February 4, 2015 7:59 PM, Aki Balogh <a...@marketmuse.com> >> > wrote: >> > I'm using solr TermVectorComponent to get term frequencies for specific >> > terms in a corpus. I.e. I query for "q=dog" and want to get back term >> > frequencies for "dog" in the corpus. >> > >> > However, when I request term frequencies, I get back ALL term >> frequencies >> > for ALL matching documents, which is generating a massive response and >> > wasting I/O. >> > >> > Instead, I would like to get tf for ONLY the terms that are an exact >> match >> > to the term in my query. >> > >> > Word count like this seems like it would be a common use case, but I >> didn't >> > see it in the code. >> > >> > >> http://grepcode.com/file_/repo1.maven.org/maven2/org.dspace.dependencies.solr/dspace-solr-core/1.4.0.1/org/apache/solr/handler/component/TermVectorComponent.java#78 >> > >> > Is there a way to get this behavior without having to modify the source >> > code? >> > >> > Thanks, >> > Aki >> > >> > >