Facetting by field then query

2014-03-26 Thread David Larochelle
I have the following schema I'd like to be able to facet by a field and then by queries. i.e. facet_fields": {"media_id": ["1":{ "sentence:foo": 102410, "sentence:bar": 29710}"2": { "sentence:foo": 600, "sentence:bar": 220} "3": { "sentence:foo": 80, "sentence:bar": 2330}]} However, when

Using CachedSqlEntityProcessor with delta imports in DIH

2013-09-23 Thread David Larochelle
I'm trying to use the CachedSqlEntityProcessor on a child entity that also has a delta query. Full imports and delta imports of the parent entity work fine however delta imports for the child entity have no effect. If I remove the processor="CachedSqlEntityProcessor" attribute from the child entit

Re: SolrCloud and Joins

2013-07-31 Thread David Larochelle
that). Then you reindex > them. With small documents like this, it is probably fairly fast. > > If you can't estimate how often the media sets will change or the size of > the changes, then you aren't ready to choose a design. > > wunder > > On Jul 29, 2013, at 8:41

Re: SolrCloud and Joins

2013-07-29 Thread David Larochelle
; wunder > > On Jul 29, 2013, at 7:58 AM, David Larochelle wrote: > > > I'm setting up SolrCloud with around 600 million documents. The basic > > structure of each document is: > > > > stories_id: integer, media_id: integer, sentence: text_en > > > &g

SolrCloud and Joins

2013-07-29 Thread David Larochelle
I'm setting up SolrCloud with around 600 million documents. The basic structure of each document is: stories_id: integer, media_id: integer, sentence: text_en We have a number of stories from different media and we treat each sentence as a separate document because we need to run sentence level a

Re: Solr indexer and Hadoop

2013-06-26 Thread David Larochelle
Pardon, my unfamiliarity with the Solr development process. Now that it's in the trunk, will it appear in the next 4.X release? -- David On Wed, Jun 26, 2013 at 9:42 AM, Erick Erickson wrote: > Well, it's been merged into trunk according to the comments, so > > Try it on trunk, help with

Re: Fast faceting over large number of distinct terms

2013-05-23 Thread David Larochelle
available and it > worked well. I limited my counting to top N (200?) hits. > > Otis > -- > Solr & ElasticSearch Support > http://sematext.com/ > > > > > > On Wed, May 22, 2013 at 10:54 PM, David Larochelle > wrote: > > The goal of the system is to

Re: Fast faceting over large number of distinct terms

2013-05-22 Thread David Larochelle
crawling new papers and blogs", it doesn't seem that way, so > I'm not sure faceting is what you want in this situation. > > Cheers, > Brendan > > > On Wed, May 22, 2013 at 9:49 PM, David Larochelle < > dlaroche...@cyber.law.harvard.edu> wrote: >

Re: Aggregate word counts over a subset of documents

2013-05-16 Thread David Larochelle
r these facets is constrained by the > current query. > > I think this maps to your requirement. > > Jason > > On May 16, 2013, at 12:29 PM, David Larochelle < > dlaroche...@cyber.law.harvard.edu> wrote: > > > Is there a way to get aggregate word coun

Aggregate word counts over a subset of documents

2013-05-16 Thread David Larochelle
Is there a way to get aggregate word counts over a subset of documents? For example given the following data: { "id": "1", "category": "cat1", "includes": "The green car.", }, { "id": "2", "category": "cat1", "includes": "The red car.", }, { "id": "3", "c