#34393: A filter query returns more items than the original queryset provides 
after
applying INNER JOIN
-------------------------------------+-------------------------------------
               Reporter:  Ľuboš      |          Owner:  nobody
  Mjachky                            |
                   Type:             |         Status:  new
  Uncategorized                      |
              Component:  Database   |        Version:  3.2
  layer (models, ORM)                |       Keywords:  filter query
               Severity:  Normal     |  duplicate distinct
           Triage Stage:             |      Has patch:  0
  Unreviewed                         |
    Needs documentation:  0          |    Needs tests:  0
Patch needs improvement:  0          |  Easy pickings:  0
                  UI/UX:  0          |
-------------------------------------+-------------------------------------
 In our project, we identified that the filter query returns more entries
 than the number of entries stored in the initial queryset.

 The following piece of code is involved:
 {{{
 # qs.count() == 4
 scoped_repos = repo_viewset.get_queryset().values_list("pk", flat=True)
 filtered_content = qs.filter(repositories__in=scoped_repos)
 # filtered_content.count() == 8
 }}}

 The generated query:
 {{{
 SELECT * FROM "rpm_package" INNER JOIN "core_content" ON
 ("rpm_package"."content_ptr_id" = "core_content"."pulp_id")
 INNER JOIN "core_repositorycontent" ON ("core_content"."pulp_id" =
 "core_repositorycontent"."content_id")
 WHERE "core_repositorycontent"."repository_id" IN (c35b7039-2c2c-48e3
 -8f4f-b0eeabad8af1, ee39a78b-9dd5-4bdf-85d9-eb6406b6ef49)
 }}}

 One of the things being noticed is that the query is constructed with an
 INNER JOIN clause instead of a LEFT JOIN clause. The
 core_repositorycontent table contains a lot of duplicates. We believe that
 this should not be a problem. Adding the distinct() query at the end of
 the call resolves the issue. See
 https://github.com/pulp/pulpcore/pull/3642.

 The question is whether this is a bug in Django (i.e., a filter query can
 return more elements than there are in the original queryset) or on our
 side, and we should restructure the query in a specific way. Any advice is
 welcome.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/34393>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/01070186c2cec158-38982f63-7580-4ee8-99ee-fc2ebf8e9136-000000%40eu-central-1.amazonses.com.

Reply via email to