Tim, thanks for the response. I definitely owe you a beer next time you're in Austin.
I hadn't thought of your approach of turning things around. But, I don't think it will work because of some stuff I left out in my original email. First, the relationship between Company and Article is many-to-many. I could easily adapt your approach to take this into account: articleID_companyID (key) articleID companyID publishDate source company_name company_desc contents So, the search results return article/company pairs, and I group by companyID. But, what makes this messier is that 1) companies contain many, many fields, all of which are filterable and 2) companies contain a number of multi-valued tuple fields, articles are just one example. For example, companies have lists of addresses, which have city, state, country, county, street address, latitude, longitude, #residentsInCounty, etc. Or, companies have products which have a number of fields. Modifying the above article/company pairs approach to become an article/company/address/product/... (plus many other fields) approach creates a combinatorial explosion of documents. Sorry about not including these extra details in my original question. I think I simplified my original question a bit too much. On Tue, Feb 26, 2013 at 12:17 PM, Timothy Potter <thelabd...@gmail.com>wrote: > Hi Clint, > > Nice to see you on this list! > > What about treating each article as the indexed unit (i.e. each > article is a document) with structure: > > articleID > publishDate > source > company_name > company_desc > contents > > Then you can do grouping by company_name field. > > I happen to know you're very familiar with grouping in Solr so there > must be a reason you're not approaching the problem from this angle. > > Cheers, > Tim > > On Tue, Feb 26, 2013 at 10:32 AM, Clint Miller <clint.mill...@gmail.com> > wrote: > > Suppose I have companies that have articles associated with them. I want > to > > be able to search for companies based on the text in the articles. For > > example, suppose the Company and Article classes look like this: > > > > Company > > ---------------------- > > name: String > > description: String > > articles: Article[] > > > > Article > > ------------------------- > > publishDate: Date > > source: String (like Reuters or AP) > > contents: String > > > > I want to be able to do the following types of operations: > > > > 1. Search for companies by name, description, or article contents with > > name and description boosted higher than article contents. > > 2. Facet on article sources. > > 3. Filter on article sources. > > 4. Boost companies with newer articles. > > > > One approach to representing this data is to use the following Solr > fields: > > > > name_s > > description_s > > article_1_publish_date_tdt > > article_1_source_s > > article_1_contents_s > > ... > > article_publish_dates_tdts > > article_sources_ss > > article_contents_ss > > > > where article_publish_dates_tdts is a copyField of all the > > *_publish_date_tdt fields, article_sources_ss is a copyField of all the > > *_source_s fields, and article_contents_ss is a copyField of all the > > *_contents_s fields. > > > > This structure allows me to do my first 2 types of operations easily > > enough. To search on name, description, and contents, I just use qf set > to > > name_s, description_s, and article_contents_ss, with name_s and > > description_s boosted accordingly. I can facet on article sources by > using > > article_sources_ss. > > > > But, I'm having trouble figuring out how to filter on article sources or > > boost companies with newer articles. For example, say I search for "green > > energy" and filter by source = 'Reuters'. If I use > > "article_sources_ss:Reuters AND article_contents_ss:Green Energy" that > > won't give correct results since a company may have 2 articles, one about > > green energy from AP and another from Reuters that isn't about green > energy > > at all. Yet, that company would match the query. > > > > Similarly, based on Tim Potter's online presentation, I understand how to > > boost based on recent dates if a company has only a single article and a > > single date. But, I'm not sure how to do the boosting with a list of > > articles and dates. > > > > Is this possible? Will I need to go down a path of writing custom > > functions? If so, any pointers on a custom function approach I should > use? > > > > Thank you very much for any help. >