So long as the fields are indexed, I think performance should be ok. Personally, I would also look at using a single document per user with a multi-valued field for recommendation ID. Assuming only a small fraction of all recommendation IDs are ever presented to any single user, this schema would be physically much smaller and require only a single document per user.
I don't know the answer to your sharding question. The join query is available out of the box, so it should be quick work to set up a two-shard sample and test the distributed sub-query. That said, with the scales you are talking about, I question if sharding is necessary. You can still use replication for load balancing without sharding. -----Original Message----- From: amid [mailto:a...@donanza.com] Sent: Thursday, June 11, 2015 12:36 PM To: solr-user@lucene.apache.org Subject: RE: The best way to exclude "seen" results from search queries Thanks allot Charles, This seems to be what I'm looking for. Do you know if join for this amount of documents & user will still have good query performance? also, is there any limitations for the solr architecture once using the "join" method (i.e. sharding)? Many thanks, Ami -- View this message in context: http://lucene.472066.n3.nabble.com/The-best-way-to-exclude-seen-results-from-search-queries-tp4211022p4211223.html Sent from the Solr - User mailing list archive at Nabble.com. ************************************************************************* This e-mail may contain confidential or privileged information. If you are not the intended recipient, please notify the sender immediately and then delete it. TIAA-CREF *************************************************************************