Let's say I have a data model that involves books and bookshelves. I have tens of thousands of books and thousands of bookshelves. There is a many-many relationship between books & bookshelves. All of the books are indexed by SOLR.
I need to be able to query SOLR and get all the books for a given bookshelf. I see two schema design options here: 1) Each book has a multi-value field that contains a list of all the bookshelf ID's. Many books will have thousands of bookshelf ID's. In this case the query is simple, I just send solr the bookshelf ID. 2) I send solr a query with each book on the bookshelf e.g. q=book_id:(1+OR+2+OR+3 ....). Many bookshelves will have thousands of book ID's so the query can get rather large. Right now I am using option 2 and it seems to be working fine. I have had to crank 'maxBooleanClauses' right up but it does seem to be pretty fast. Anyone have an opinion?