Oh, I see, essentially you want to get the sum of the term frequencies for every term in a subset of documents (instead of the document frequency as the FacetComponent would give you). I don't know of an easy/out of the box solution for this. I know the TermVectorComponent will give you the tf for every term in a document, but I'm not sure if you can filter or sort on it. Maybe you can do something like: https://issues.apache.org/jira/browse/LUCENE-2393 or what's suggested here: http://search-lucene.com/m/of5Fn1PUOHU/ but I have never used something like that.
Tomás On Mon, Apr 1, 2013 at 9:58 PM, Andy Pickler <andy.pick...@gmail.com> wrote: > I need "total number of occurrences" across all documents for each term. > Imagine this... > > Post #1: "I think, therefore I am like you" > Reply #1: "You think too much" > Reply #2 "I think that I think much as you" > > Each of those "documents" are put into 'content'. Pretending I don't have > stop words, the top term query (not considering dateCreated in this > example) would result in something like... > > "think": 4 > "I": 4 > "you": 3 > "much": 2 > ... > > Thus, just a "number of documents" approach doesn't work, because if a word > occurs more than one time in a document it needs to be counted that many > times. That seemed to rule out faceting like you mentioned as well as the > TermsComponent (which as I understand also only counts "documents"). > > Thanks, > Andy Pickler > > On Mon, Apr 1, 2013 at 4:31 PM, Tomás Fernández Löbbe < > tomasflo...@gmail.com > > wrote: > > > So you have one document per user comment? Why not use faceting plus > > filtering on the "dateCreated" field? That would count "number of > > documents" for each term (so, in your case, if a term is used twice in > one > > comment it would only count once). Is that what you are looking for? > > > > Tomás > > > > > > On Mon, Apr 1, 2013 at 6:32 PM, Andy Pickler <andy.pick...@gmail.com> > > wrote: > > > > > Our company has an application that is "Facebook-like" for usage by > > > enterprise customers. We'd like to do a report of "top 10 terms > entered > > by > > > users over (some time period)". With that in mind I'm using the > > > DataImportHandler to put all the relevant data from our database into a > > > Solr 'content' field: > > > > > > <field name="content" type="text_general" indexed="true" stored="false" > > > multiValued="false" required="true" termVectors="true"/> > > > > > > Along with the content is the 'dateCreated' for that content: > > > > > > <field name="dateCreated" type="tdate" indexed="true" stored="false" > > > multiValued="false" required="true"/> > > > > > > I'm struggling with the TermVectorComponent documentation to understand > > how > > > I can put together a query that answers the 'report' mentioned above. > > For > > > each document I need each term counted however many times it is entered > > > (content of "I think what I think" would report 'think' as used twice). > > > Does anyone have any insight as to whether I'm headed in the right > > > direction and then what my query would be? > > > > > > Thanks, > > > Andy Pickler > > > > > >