We are currently using solr to index various types of content in our system, several of which allow users to comment on. What we would like to do is issue a query on the top level content which also searches the attached comments but only returns unique top level documents as results, while still maintaining the option to search and return comments as an alternative type of search for the user.
The simplest example would probably be that of a blog. The blog could be indexed as follows: id: blog_intId title: blog title content: blog content And the associated comments: id: comment_intId title: comment title content: comment content parentId: blog_intId Given this type of layout, how would I go about querying and returning a list of blogs which contain text in either the blog content or any of the comments' content? The only solutions I can come up with would be to: 1) aggregate comment content into the blog content index, allowing me to query directly on the blog. However we are expecting the site to generate many comments, along the lines of hundreds and possibly thousands. This also has the downside of requiring duplicate content in the index if we want to still permit users to search on and return comments. 2) Use facets to get a list of parent items and issue an additional query (or hit the database) to pull in the parent content. Again, this isn't an ideal solution since we would have to page the results ourselves since solr's facet parameters don't support an offset. This possibly negates any optimizations solr may have for paging regular queries. Also, it forces us to issue a second round trip to either solr or the database to get summary content to display in the search results list. It also seems like a poor use case for the facet functionality in general. 3) Plug into the solr code and implement a custom request handler, HitCollector, or ...? I've spent some time digging into the solr code and I don't see any obvious place to plug this type of functionality in. A major concern of mine is performance as well, so I want to ensure that I can get at and modify the results prior to solr loading any unnecessary content into memory. Any thoughts on this are very appreciated. Any kind of kick start, pointer, or places to dig into would be very helpful. -- eric