I think I recall past experience that if the ID is duplicated, you get one
or the other, and the one you get is non-deterministic, but as this is an
unsupported and untested configuration, I would expect other things like
facet counts etc to be thrown off. Also if the schemas use different fields
for identity or the collections assign different id's to the same document
then of course you likely get both showing up in the same results. That
said this may have changed, and maybe now it's possible to get two with the
same ID back, or it has become deterministic in some way. AFAIK It's not a
supported use case so anything could have changed.

In short, you probably should not alias two collections containing the same
data into a single alias. Aliasing two collections with identical schema
and **different** data is the expected use case for aliases that point to
more than one collection. Schemas could be slightly different too, but
results involving the non-matching fields will become hard to predict.

As a practical example of this, in Time Routed Aliases (TRA's) it's
important never to send the same document with changes to the value of the
routed field as that will create two time slices (collections) with a
document that has the same ID (see the very first warning here:
https://solr.apache.org/guide/solr/latest/deployment-guide/aliases.html#routed-aliases
)

On Wed, Mar 15, 2023 at 5:02 PM David Smiley <dsmi...@apache.org> wrote:

> When aliasing across collections, it's up to you/the-user to ensure that
> they don't contain the same document (by ID).  I don't believe this is
> supported at all. If you find information to the contrary, let us know.  I
> could imagine some small code details to _do something_ if it could be
> detected in some cases but that isn't a substitute for truly
> working/supported.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Tue, Mar 7, 2023 at 5:34 AM Vinayak Hegde <vinayakph...@gmail.com>
> wrote:
>
> > Hello everyone,
> > I hope this email finds you well. I am reaching out to discuss a strange
> > situation we are facing with result grouping.
> > We currently have two collections, CollectionA and CollectionB, both of
> > which contain an identical document, document1. We have created a new
> alias
> > collection that includes both CollectionA and CollectionB.
> > However, when attempting to perform result grouping on this new alias
> > collection, we are encountering an issue where two instances of document1
> > appear in the output.
> >
> >
> http://10.144.10.36:8983/solr/aliasCollection/select?q=id:document1&rows=40&group=true&group.field=fieldA&group.limit=20
> > I have attempted to locate official documentation regarding this issue,
> but
> > have been unsuccessful. The closest resource I found was this link:
> >
> >
> https://markmail.org/message/2ykh7wyexbnquc6s?q=list:org.apache.lucene.solr-user
> > .
> > Please let me know if you have any insights or suggestions on how to
> > resolve this issue.
> > Thank you for your time and attention.
> >
> > Best regards,
> > Vinayak Hegde
> >
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Reply via email to