Hello,
earlier, I was trying to retrieve the total token count per index
http://lucene.472066.n3.nabble.com/how-to-retrieve-total-token-count-per-collection-index-td4000161.html
.
now, I would like to have a token (word) count within the document-set
(resulting of a query),
both for the matching word and as sum of all tokens of matching documents.
The ultimate goal is to be able to compute relative frequencies of
terms, on token-base instead of per article base.
so if I search for word "Haus" within a subcollection (defined by a
separate query) and the word appears in a matching doc A 2 times and doc
B 5 times, i need as hit-count: 7 not 2.
+ if the subcollection contains documents
A with 300 tokens (i.e. running words, not different terms)
B with 100 tokens
C with 50 tokens
I also need this second sum, i.e. 450.
I plan to get the second number by first
preprocessing the document counting the tokens
storing the number in a separate field,
then applying the statsComponent,
which will deliver me the sum for given query/subcollection.
for the first number, i could use the termfreq() function,
but that gives me only the term frequency per document.
So, before I iterate over the whole result, to sum it,
I wonder, if the statsComponent would be able to perform the counting
also over a dynamic field (the result of the function).
I tried this:
/solr/select/?fq=docsrc:falter&q={!func}tf(inhalt,'haus')&stats=true&stats.field=score&rows=10&indent=true&fl=score&debugQuery=true
but got the error:
<str name="msg">Field type
text_de{class=org.apache.solr.schema.TextField,analyzer=org.apache.solr.analysis.TokenizerChain,args={positionIncrementGap=100}}
is not currently supported</str>
Or is there any other way?
If I understand it correctly, any of tf(), idf(), sttf(), wouldn't be of
any help here neither.
Thanks in advance
best,
matej