I'm wondering what the expected behavior is for the following scenario... We receive the same document in multiple formats and we handle this by grouping, sorting the group by date received, and limiting the group to 1, resulting in getting the most recent version of a document.
Here is an example, the id field is something like "identifier!date_format" doc { id: doc1!20130618_formatX docId: doc1 dateReceived: 20130620 } doc { id: doc1!20130621_formatY docId: doc1 dateReceived: 20130621 } doc { id: doc2!20130619_formatX docId: doc2 dateReceived: 20130619 } So in this case we would want to group on docId so all the doc1 docs were together and all doc2 docs together, sort with in the groups on dateReceived descending and limit the groups to 1 to get the most recent doc in the group, then sort the whole result set on dateReceived ascending. So we expect to get: doc2!20130619_formatX doc1!20130621_formatY In a regular single node Solr instance, running Solr 4.3, everything I described above works perfectly fine. When running on a sharded configuration with two nodes, the results are different. It will still do the grouping, sorting with in group, and limiting as expected, but the overall sort on dateReceived is not the same. The results end up being: doc1!20130621_formatY doc2!20130619_formatX It seems like this is because the doc1 group has another document with dateReceived of 0618 which is somehow being used for the overall sort, and then the group.sort and group.limit is being applied after this ??? I realize there could be limitations of grouping and sorting in a sharded setup, but I wanted to know if this is correct behavior, or if there is something I am doing wrong. Any help would be appreciated. Thanks, Bryan