Hello, Have you tried to implement your own Collector and pass it into IndexSearch.search()? Collector has a reference to the current scorer, and therefore presumably can access tf info from TermQueryScorer: org.apache.lucene.search.TermScorer.freq(). Then collector can just sum these tfs.
Be aware, of small problem of doing the same with few disjunction clauses. On Tue, Nov 20, 2012 at 11:55 PM, tech.vronk <t...@vronk.net> wrote: > Hello, > > earlier, I was trying to retrieve the total token count per index > http://lucene.472066.n3.**nabble.com/how-to-retrieve-** > total-token-count-per-**collection-index-td4000161.**html<http://lucene.472066.n3.nabble.com/how-to-retrieve-total-token-count-per-collection-index-td4000161.html> > . > > now, I would like to have a token (word) count within the document-set > (resulting of a query), > both for the matching word and as sum of all tokens of matching documents. > > The ultimate goal is to be able to compute relative frequencies of terms, > on token-base instead of per article base. > > so if I search for word "Haus" within a subcollection (defined by a > separate query) and the word appears in a matching doc A 2 times and doc B > 5 times, i need as hit-count: 7 not 2. > > + if the subcollection contains documents > A with 300 tokens (i.e. running words, not different terms) > B with 100 tokens > C with 50 tokens > > I also need this second sum, i.e. 450. > > I plan to get the second number by first > preprocessing the document counting the tokens > storing the number in a separate field, > then applying the statsComponent, > which will deliver me the sum for given query/subcollection. > > for the first number, i could use the termfreq() function, > but that gives me only the term frequency per document. > > So, before I iterate over the whole result, to sum it, > I wonder, if the statsComponent would be able to perform the counting also > over a dynamic field (the result of the function). > I tried this: > /solr/select/?fq=docsrc:**falter&q={!func}tf(inhalt,'** > haus')&stats=true&stats.field=**score&rows=10&indent=true&fl=** > score&debugQuery=true > > but got the error: > <str name="msg">Field type text_de{class=org.apache.solr.** > schema.TextField,analyzer=org.**apache.solr.analysis.** > TokenizerChain,args={**positionIncrementGap=100}} is not currently > supported</str> > > Or is there any other way? > > If I understand it correctly, any of tf(), idf(), sttf(), wouldn't be of > any help here neither. > > Thanks in advance > > best, > matej > > > -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com> <mkhlud...@griddynamics.com>