Hello Solr users, How would you design a filtered join scenario?
Say I have a bunch of movies (excuse any inaccuracies, this is an imagined scenario): curl -XPOST -H 'Content-Type: application/json' 'localhost:8983/solr/test/update?commitWithin=1000' --data-binary ' [{ "id": "1", "title": "Rambo 1", "release_date": "1978-01-01" }, { "id": "2", "title": "Rambo 5", "release_date": "1998-01-01" }, { "id": "3", "title": "300 Spartaaaaaans", "release_date": "2005-01-01" }]' And a bunch of users of certain families who watched those movies: curl -XPOST -H 'Content-Type: application/json' 'localhost:8983/solr/test/update?commitWithin=1000' --data-binary ' [{ "id": "user_1", "name": "Jane", "family": "Smith", "born": "1990-01-01", "watched_movies": ["1", "3"] }, { "id": "user_2", "title": "Joe", "family": "Smith", "born": "1970-01-01", "watched_movies": ["2"] }, { "id": "user_3", "title": "Radu", "family": "Gheorghe, "born": "1985-01-01", "watched_movies": ["1", "2", "3"] }]' They don't have to be in the same collection. The important question is how to get: - movies watched by user of family Smith - after they were born - including the matching users - I'd like to be able to facet on movie metadata, but I don't need to facet on user metadata, just to be able to retrieve those fields The above query should bring back Rambo 5 and 300, with Joe and Jane respectively. I wouldn't get Rambo 1, because although Jane watched it, the movie was released before she was born. Here are some options that I have in mind: 1) using the join query parser (or the newer XCJF) to do the join itself. Then have some sort of plugin pull the "born" value or each corresponding user (via some subquery) and filter movies afterwards. Normalized, but likely painfully slow 2) similar approach with 1), in a streaming expression. Again, normalized, but slow (we're talking billions of movies, millions of users). And limited support for facets. 3) have some sort of denormalization. For example, pre-compute matching users for every movie, then just use join/XCJF to do the actual join. This makes indexing/updates expensive and potentially complicated 4) normalization with nested documents. This is best for searches, but pretty much a no-go for indexing/updates. In this imaginary use-case, there are binge-watchers who might watch a billion movies in a week, making us reindex everything Do you see better ways? Thanks in advance and best regards, Radu