I think I recall past experience that if the ID is duplicated, you get one or the other, and the one you get is non-deterministic, but as this is an unsupported and untested configuration, I would expect other things like facet counts etc to be thrown off. Also if the schemas use different fields for identity or the collections assign different id's to the same document then of course you likely get both showing up in the same results. That said this may have changed, and maybe now it's possible to get two with the same ID back, or it has become deterministic in some way. AFAIK It's not a supported use case so anything could have changed.
In short, you probably should not alias two collections containing the same data into a single alias. Aliasing two collections with identical schema and **different** data is the expected use case for aliases that point to more than one collection. Schemas could be slightly different too, but results involving the non-matching fields will become hard to predict. As a practical example of this, in Time Routed Aliases (TRA's) it's important never to send the same document with changes to the value of the routed field as that will create two time slices (collections) with a document that has the same ID (see the very first warning here: https://solr.apache.org/guide/solr/latest/deployment-guide/aliases.html#routed-aliases ) On Wed, Mar 15, 2023 at 5:02 PM David Smiley <dsmi...@apache.org> wrote: > When aliasing across collections, it's up to you/the-user to ensure that > they don't contain the same document (by ID). I don't believe this is > supported at all. If you find information to the contrary, let us know. I > could imagine some small code details to _do something_ if it could be > detected in some cases but that isn't a substitute for truly > working/supported. > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Tue, Mar 7, 2023 at 5:34 AM Vinayak Hegde <vinayakph...@gmail.com> > wrote: > > > Hello everyone, > > I hope this email finds you well. I am reaching out to discuss a strange > > situation we are facing with result grouping. > > We currently have two collections, CollectionA and CollectionB, both of > > which contain an identical document, document1. We have created a new > alias > > collection that includes both CollectionA and CollectionB. > > However, when attempting to perform result grouping on this new alias > > collection, we are encountering an issue where two instances of document1 > > appear in the output. > > > > > http://10.144.10.36:8983/solr/aliasCollection/select?q=id:document1&rows=40&group=true&group.field=fieldA&group.limit=20 > > I have attempted to locate official documentation regarding this issue, > but > > have been unsuccessful. The closest resource I found was this link: > > > > > https://markmail.org/message/2ykh7wyexbnquc6s?q=list:org.apache.lucene.solr-user > > . > > Please let me know if you have any insights or suggestions on how to > > resolve this issue. > > Thank you for your time and attention. > > > > Best regards, > > Vinayak Hegde > > > -- http://www.needhamsoftware.com (work) http://www.the111shift.com (play)