PS - I found that termfreq() actually returns the raw tf, i.e. an integer
for each document. However, I have to get the request and add them up on my
end.

Unfortunately totaltermfreq() sums the similarity-modified tf values.

Is there a way to just get the sum of the termfreq() values?


Akos (Aki) Balogh
M: 617-682-0066
Co-Founder, MarketMuse
https://www.MarketMuse.com

On Wed, Feb 4, 2015 at 4:58 PM, Aki Balogh <a...@marketmuse.com> wrote:

> Is there a way to set solr to only return raw tf (i.e. by maybe turning
> off the DefaultSimilarity), so I could use ttf() to get the sum of raw tf
> values?
>
> Or do I need to parse each tf value, square it and add them up in
> post-processing?
>
>
> Thx,
> Aki
>
> On Wed, Feb 4, 2015 at 4:39 PM, Ahmet Arslan <iori...@yahoo.com.invalid>
> wrote:
>
>> Hi,
>>
>> So you want raw tf. tf method implemented as square root of raw tf. So
>> you can re-obtain it by reverse operation.
>> 1.424 * 1.424 = 2.02 = int = 2
>>
>> Ahmet
>>
>>
>>
>>
>> On Wednesday, February 4, 2015 11:31 PM, Aki Balogh <a...@marketmuse.com>
>> wrote:
>> Hi Ahmet,
>>
>> Thank you for your idea, very helpful. I can indeed get tf values through
>> the tf and ttf function queries.
>>
>> Since tf uses Similarity, I'm getting back some floats (i.e. "dog occurs
>> 1.424 times"), when I was expecting ints.
>> Is there a way to get back ints (simple word count)?
>>
>> Thanks,
>> Aki
>>
>>
>>
>> On Wed, Feb 4, 2015 at 3:41 PM, Ahmet Arslan <iori...@yahoo.com.invalid>
>> wrote:
>>
>> > Hi Aki,
>> >
>> > How about tf function query?
>> > https://cwiki.apache.org/confluence/display/solr/Function+Queries
>> >
>> > Ahmet
>> >
>> >
>> >
>> > On Wednesday, February 4, 2015 7:59 PM, Aki Balogh <a...@marketmuse.com>
>> > wrote:
>> > I'm using solr TermVectorComponent to get term frequencies for specific
>> > terms in a corpus. I.e. I query for "q=dog" and want to get back term
>> > frequencies for "dog" in the corpus.
>> >
>> > However, when I request term frequencies, I get back ALL term
>> frequencies
>> > for ALL matching documents, which is generating a massive response and
>> > wasting I/O.
>> >
>> > Instead, I would like to get tf for ONLY the terms that are an exact
>> match
>> > to the term in my query.
>> >
>> > Word count like this seems like it would be a common use case, but I
>> didn't
>> > see it in the code.
>> >
>> >
>> http://grepcode.com/file_/repo1.maven.org/maven2/org.dspace.dependencies.solr/dspace-solr-core/1.4.0.1/org/apache/solr/handler/component/TermVectorComponent.java#78
>> >
>> > Is there a way to get this behavior without having to modify the source
>> > code?
>> >
>> > Thanks,
>> > Aki
>> >
>>
>
>

Reply via email to