Hi Clint,

Nice to see you on this list!

What about treating each article as the indexed unit (i.e. each
article is a document) with structure:

articleID
publishDate
source
company_name
company_desc
contents

Then you can do grouping by company_name field.

I happen to know you're very familiar with grouping in Solr so there
must be a reason you're not approaching the problem from this angle.

Cheers,
Tim

On Tue, Feb 26, 2013 at 10:32 AM, Clint Miller <clint.mill...@gmail.com> wrote:
> Suppose I have companies that have articles associated with them. I want to
> be able to search for companies based on the text in the articles. For
> example, suppose the Company and Article classes look like this:
>
> Company
> ----------------------
> name: String
> description: String
> articles: Article[]
>
> Article
> -------------------------
> publishDate: Date
> source: String (like Reuters or AP)
> contents: String
>
> I want to be able to do the following types of operations:
>
>    1. Search for companies by name, description, or article contents with
>    name and description boosted higher than article contents.
>    2. Facet on article sources.
>    3. Filter on article sources.
>    4. Boost companies with newer articles.
>
> One approach to representing this data is to use the following Solr fields:
>
> name_s
> description_s
> article_1_publish_date_tdt
> article_1_source_s
> article_1_contents_s
> ...
> article_publish_dates_tdts
> article_sources_ss
> article_contents_ss
>
> where article_publish_dates_tdts is a copyField of all the
> *_publish_date_tdt fields, article_sources_ss is a copyField of all the
> *_source_s fields, and article_contents_ss is a copyField of all the
> *_contents_s fields.
>
> This structure allows me to do my first 2 types of operations easily
> enough. To search on name, description, and contents, I just use qf set to
> name_s, description_s, and article_contents_ss, with name_s and
> description_s boosted accordingly. I can facet on article sources by using
> article_sources_ss.
>
> But, I'm having trouble figuring out how to filter on article sources or
> boost companies with newer articles. For example, say I search for "green
> energy" and filter by source = 'Reuters'. If I use
> "article_sources_ss:Reuters AND article_contents_ss:Green Energy" that
> won't give correct results since a company may have 2 articles, one about
> green energy from AP and another from Reuters that isn't about green energy
> at all. Yet, that company would match the query.
>
> Similarly, based on Tim Potter's online presentation, I understand how to
> boost based on recent dates if a company has only a single article and a
> single date. But, I'm not sure how to do the boosting with a list of
> articles and dates.
>
> Is this possible? Will I need to go down a path of writing custom
> functions? If so, any pointers on a custom function approach I should use?
>
> Thank you very much for any help.

Reply via email to