Martin, You did not say your two starting objects were already sets. You said they were vectors of strings. It may well be that your strings included duplicates. For example, If I read in lots of text with a blank line between paragraphs, I would have lots of seemingly empty and identical parts. Just converting that into a set would shrink it.
You have not said how you created or processed your initial two vectors. It is also possible parts were sort of DELETED as in removing the string pointed to by some entry but leaving a null pointer of sorts which would leave the length of the vector longer than the useful contents. Your strings seem to be what may be filenames. Are they unique, especially if they are files in different folders/directories? There are many ways to check, but using your method, try this: length(base::union(s1, s1)) -----Original Message----- From: R-help <r-help-boun...@r-project.org> On Behalf Of Martin Møller Skarbiniks Pedersen Sent: Sunday, January 31, 2021 3:57 PM To: R mailing list <r-help@r-project.org> Subject: [R] union of two sets are smaller than one set? This is really puzzling me and when I try to make a small example everything works like expected. The problem: I got these two large vectors of strings. > str(s1) chr [1:766608] "0.dk" ... > str(s2) chr [1:59387] "043.dk" "0606.dk" "0618.dk" "0888.dk" "0iq.dk" "0it.dk" ... And I need to create the union-set of s1 and s2. I expect the size of the union-set to be between 766608 and 766608+59387. However it is 681193 which is less that number of elements in s1! > length(base::union(s1, s2)) [1] 681193 Any hints? Regards Martin [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.