Hi Clint, Nice to see you on this list!
What about treating each article as the indexed unit (i.e. each article is a document) with structure: articleID publishDate source company_name company_desc contents Then you can do grouping by company_name field. I happen to know you're very familiar with grouping in Solr so there must be a reason you're not approaching the problem from this angle. Cheers, Tim On Tue, Feb 26, 2013 at 10:32 AM, Clint Miller <clint.mill...@gmail.com> wrote: > Suppose I have companies that have articles associated with them. I want to > be able to search for companies based on the text in the articles. For > example, suppose the Company and Article classes look like this: > > Company > ---------------------- > name: String > description: String > articles: Article[] > > Article > ------------------------- > publishDate: Date > source: String (like Reuters or AP) > contents: String > > I want to be able to do the following types of operations: > > 1. Search for companies by name, description, or article contents with > name and description boosted higher than article contents. > 2. Facet on article sources. > 3. Filter on article sources. > 4. Boost companies with newer articles. > > One approach to representing this data is to use the following Solr fields: > > name_s > description_s > article_1_publish_date_tdt > article_1_source_s > article_1_contents_s > ... > article_publish_dates_tdts > article_sources_ss > article_contents_ss > > where article_publish_dates_tdts is a copyField of all the > *_publish_date_tdt fields, article_sources_ss is a copyField of all the > *_source_s fields, and article_contents_ss is a copyField of all the > *_contents_s fields. > > This structure allows me to do my first 2 types of operations easily > enough. To search on name, description, and contents, I just use qf set to > name_s, description_s, and article_contents_ss, with name_s and > description_s boosted accordingly. I can facet on article sources by using > article_sources_ss. > > But, I'm having trouble figuring out how to filter on article sources or > boost companies with newer articles. For example, say I search for "green > energy" and filter by source = 'Reuters'. If I use > "article_sources_ss:Reuters AND article_contents_ss:Green Energy" that > won't give correct results since a company may have 2 articles, one about > green energy from AP and another from Reuters that isn't about green energy > at all. Yet, that company would match the query. > > Similarly, based on Tim Potter's online presentation, I understand how to > boost based on recent dates if a company has only a single article and a > single date. But, I'm not sure how to do the boosting with a list of > articles and dates. > > Is this possible? Will I need to go down a path of writing custom > functions? If so, any pointers on a custom function approach I should use? > > Thank you very much for any help.