We are currently using solr to index various types of content in our
system, several of which allow users to comment on.  What we would
like to do is issue a query on the top level content which also
searches the attached comments but only returns unique top level
documents as results, while still maintaining the option to search and
return comments as an alternative type of search for the user.

The simplest example would probably be that of a blog.  The blog could
be indexed as follows:

id: blog_intId
title: blog title
content: blog content

And the associated comments:

id: comment_intId
title: comment title
content: comment content
parentId: blog_intId

Given this type of layout, how would I go about querying and returning
a list of blogs which contain text in either the blog content or any
of the comments' content?

The only solutions I can come up with would be to:
1) aggregate comment content into the blog content index, allowing me
to query directly on the blog.  However we are expecting the site to
generate many comments, along the lines of hundreds and possibly
thousands.  This also has the downside of requiring duplicate content
in the index if we want to still permit users to search on and return
comments.

2) Use facets to get a list of parent items and issue an additional
query (or hit the database) to pull in the parent content.  Again,
this isn't an ideal solution since we would have to page the results
ourselves since solr's facet parameters don't support an offset.  This
possibly negates any optimizations solr may have for paging regular
queries.  Also, it forces us to issue a second round trip to either
solr or the database to get summary content to display in the search
results list.  It also seems like a poor use case for the facet
functionality in general.

3) Plug into the solr code and implement a custom request handler,
HitCollector, or ...?  I've spent some time digging into the solr code
and I don't see any obvious place to plug this type of functionality
in.  A major concern of mine is performance as well, so I want to
ensure that I can get at and modify the results prior to solr loading
any unnecessary content into memory.

Any thoughts on this are very appreciated.  Any kind of kick start,
pointer, or places to dig into would be very helpful.

--
eric

Reply via email to