Hi David, Out of interest, what are you trying to accomplish by faceting over the story_text field? Is it generally the case that the story_text field will contain values that are repeated or categorize your documents somehow? From your description: "story_text is used to store free form text obtained by crawling new papers and blogs", it doesn't seem that way, so I'm not sure faceting is what you want in this situation.
Cheers, Brendan On Wed, May 22, 2013 at 9:49 PM, David Larochelle < dlaroche...@cyber.law.harvard.edu> wrote: > I'm trying to quickly obtain cumulative word frequency counts over all > documents matching a particular query. > > I'm running in Solr 4.3.0 on a machine with 16GB of ram. My index is 2.5 GB > and has around ~350,000 documents. > > My schema includes the following fields: > > <field name="id" type="string" indexed="true" stored="true" required="true" > multiValued="false" /> > <field name="media_id" type="int" indexed="true" stored="true" > required="true" multiValued="false" /> > <field name="story_text" type="text_general" indexed="true" stored="true" > termVectors="true" termPositions="true" termOffsets="true" /> > > > story_text is used to store free form text obtained by crawling new papers > and blogs. > > Running faceted searches with the fc or fcs methods fails with the error > "Too many values for UnInvertedField faceting on field story_text" > > http://localhost:8983/solr/query?q=id:106714828_6621&facet=true&facet.limit=10&facet.pivot=publish_date,story_text&rows=0&facet.method=fcs > > Running faceted search with the 'enum' method succeeds but takes a very > long time. > > http://localhost:8983/solr/query?q=includes:foobar&facet=true&facet.limit=100&facet.pivot=media_id,includes&facet.method=enum&rows=0 > < > http://localhost:8983/solr/query?q=includes:mccain&facet=true&facet.limit=100&facet.pivot=media_id,includes&facet.method=enum&rows=0 > > > > The frustrating thing is even if the query only returns a few hundred > documents, it still takes 10 minutes or longer to get the cumulative word > count results. > > Eventually we're hoping to build a system that will return results in a few > seconds and scale to hundreds of millions of documents. > Is there anyway to get this level of performance out of Solr/Lucene? > > Thanks, > > David > -- Brendan Grainger www.kuripai.com