Customizable urlize
Hi Django. I'm currently working on a project where we want to use Django template filter *urlize*, but with a custom url formatter. The way the Django code works atm, we can't extend *urlize* customize output. That means we have to go non-DRY and copy code. Generally I like using class based coding for this kind of thing, but Djangos template filters seems to prefer functions and discourage instantiation. Therefore, it would be nice to solve this in a functional style, in line with current design. Wouldn't it be a good idea to extend django.utils.html.urlize, or am I looking at this the wrong way? The function signature would be smthng like this: def urlize( text, trim_url_limit=None, *# for compat reasons, though it would make sense to accomplish that with the formatter functions in the urlizetrunc filter.* nofollow=False, autoescape=False, email_formatter: callable = None, url_formatter: callable = None, ): ... Johan Schiff Betahaus (SE) -- You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/5d2877db-83e9-4bda-8087-98f5cd8fc57c%40googlegroups.com.
Implement QuerySet.__contains__?
Is there a specific reason Djangos QuerySet does not implement __contains__? It doesn't seem very complicated to me, but maybe I'm missing something. When checking if an object is in e queryset I always use code lite this: if queryset.filter(pk=obj.pk).exists(): The pythonic way would be: if obj in queryset: The way I understand it this latter version will fetch all objects from queryset, while the first one is a much leaner db query. So, is there a reason QuerySet doesn't have a __contains__ method, or would it be a good addition? My guess is that a lot of people use the "obj in queryset" syntax, believing it does "the right thing". //Johan -- You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/88c83b8d-658c-47cc-9162-fcfebebe9c4a%40googlegroups.com.
Re: Implement QuerySet.__contains__?
> > Roger Gammans > > -- > You received this message because you are subscribed to a topic in the > Google Groups "Django developers (Contributions to Django itself)" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/django-developers/NZaMq9BALrs/unsubscribe > . > To unsubscribe from this group and all its topics, send an email to > django-developers+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/django-developers/3928039038bac9b52279294f7efcac318dc80388.camel%40gammascience.co.uk > <https://groups.google.com/d/msgid/django-developers/3928039038bac9b52279294f7efcac318dc80388.camel%40gammascience.co.uk?utm_medium=email&utm_source=footer> > . > -- Vänligen, Johan Schiff Radkompaniet AB 072-229 61 19 -- You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAD69rUA1E1Vfdm7kuv32r1Ppyw%3D7CCqCzfSW9itda88th2Uw3g%40mail.gmail.com.
Re: Implement QuerySet.__contains__?
To answer my own question: No, I wasn't doing it correctly. I should have done a sanity check before posting. New timeit code and results at bottom, now using a smaller dataset (~900 objects). Notable: - An .exists() query is about 1/100 the time of full fetch in this case. This difference would ofc be much bigger if I did it on the 100 000+ objects. - If already prefetched, it should be worth it to search prefetched data if the dataset is <700 objects. It seems rare to prefetch that many objects. If so, a developer could easily be explicit and use .exists() method. *Based on this exact scenario: 100 times slower for using obj in queryset is quite a performance hit. On the other hand, an extra 1 % query time for a queryset that is later evaluated is marginal.* If a developer knows they want to evaluate the queryset later, it would be nice to have a .evaluate() that does self._fetch_all() and returns self. I think that's preferable over list(queryset)-method, because we don't need to create an extra list. # Old explicit fetch method queryset = Stuff.objects.all() if obj in list(queryset): pass # My suggestion queryset = Stuff.objects.all().evaluate() if obj in queryset: pass Updated timeit function and results: import timeit def time_fn(title, stmt, number=10_000): setup = [ 'from movies.models import Movie', 'qs=Movie.objects.all()', 'obj_first = qs.first()', 'obj_mid = qs[300]', 'obj_last = qs.last()', 'list(qs)', ] result_time = timeit.timeit(stmt, setup='\n'.join(setup), number=number) print(f'{title}: {result_time}') time_fn('Database fetch', 'qs._result_cache=None\nlist(qs)') time_fn('Prefetched obj_first in qs', 'obj_first in qs') time_fn('Prefetched obj_mid in qs', 'obj_mid in qs') time_fn('Prefetched obj_last in qs', 'obj_last in qs') time_fn('Database obj_first exists()', 'qs.filter(pk=obj_first.pk ).exists()') time_fn('Database obj_mid exists()', 'qs.filter(pk=obj_mid.pk).exists()') time_fn('Database obj_last exists()', 'qs.filter(pk=obj_last.pk).exists()') # Results Database fetch: 616.097227364 Prefetched obj_first in qs: 0.030961711003328674 Prefetched obj_mid in qs: 6.6988333979970776 Prefetched obj_last in qs: 24.18991441695 Database obj_first exists(): 6.468764332996216 Database obj_mid exists(): 6.167532913001196 Database obj_last exists(): 6.0190791100030765 Le mer. 3 juin 2020 à 10:52, Johan Schiff a écrit : > Thanks for great info. > > First, I'm leaning towards Aymeric's proposition here. I do recognize that > there is a lot to consider. > > This seems to be important: > >1. Developers must be able to explicitly choose methods to optimize >for their environment. (Considering database latency, dataset size, >prefetch preferred, etc.) >2. Behaviour should preferably be consistent across methods. (len(qs), >bool(qs), obj in qs) >3. Syntax should be pythonic. > > I think what Aymeric is proposing would fulfill those demands. > > I have done some timeit-tests on a model with 100 000+ records, based on > Rogers work, on a single host setup using postgresql. My finding is that a > separate database query is generally faster than current *obj in qs* > behaviour, > even on a prefetched queryset (unless your dataset is really small). This > tells me that we should prioritize .exists() query unless explicitly stated > (i.e. *obj in iter(qs)*), even when we have prefetched data. For len(qs) > and bool(qs) that should not be the case. > > It would be interesting to get timings from other setups, so I'm including > the code I used. (Also, am I doing this correctly?) > > import timeit > > def time_fn(title, stmt, number=1, prefetch=False): > setup = [ > 'from tickets.models import Ticket', > 'qs=Ticket.objects.all()', > 'obj_first = qs.first()', > 'obj_mid = qs[5]', > 'obj_last = qs.last()', > ] > if prefetch: > setup.append('list(qs)') > result_time = timeit.timeit(stmt, setup='\n'.join(setup), > number=number) > print(f'{title}: {result_time}') > > time_fn('Database fetch', 'list(qs)') > time_fn('Prefetched obj_first in qs', 'obj_first in qs', prefetch=True) > time_fn('Prefetched obj_mid in qs', 'obj_mid in qs', prefetch=True) > time_fn('Prefetched obj_last in qs', 'obj_last in qs', prefetch=True) > time_fn('Database obj_first
Re: Implement QuerySet.__contains__?
Shai does make a good point about changing a well documented behaviour. That argument wins me over. Adding .contains() and updating the documentation goes a long way to make it easier for developers. Best case would be that Dajngo does not make assumptions about what the developer wants, but to implement __len__(), __bool__() and __contains__(), we have to assume one method or the other. I still think the wrong call was made here, but changing it now is too much pain. *(Assuming queryset should be evaluated is probably correct in most cases, but sometimes adds a big performance hit when the code goes into production and the dataset grows - code that looks reasonable and works like a charm in dev will cause trouble in production. Assuming the opposite would have added a tiny performance hit in most cases, but avoided the big one.)* Since I'm new here: If people agree QuerySet.contains() should be added, how do we move forward? I'd be willing to write some code with tests and add a ticket, if that's helpful. Le ven. 5 juin 2020 à 11:42, אורי a écrit : > Hi, > > I'm thinking, maybe instead of: > > if obj in queryset: > > Users should write: > > if obj in list(queryset): > > Which I understand is already working? Why does the first line (without > list()) should work anyway? > > And if users want performance, why not write: > > if queryset.filter(pk=obj.pk).exists(): > אורי > u...@speedy.net > > > On Tue, Jun 2, 2020 at 11:57 AM Johan Schiff wrote: > >> Is there a specific reason Djangos QuerySet does not implement >> __contains__? It doesn't seem very complicated to me, but maybe I'm missing >> something. >> >> When checking if an object is in e queryset I always use code lite this: >> if queryset.filter(pk=obj.pk).exists(): >> >> The pythonic way would be: >> if obj in queryset: >> >> The way I understand it this latter version will fetch all objects from >> queryset, while the first one is a much leaner db query. >> >> So, is there a reason QuerySet doesn't have a __contains__ method, or >> would it be a good addition? My guess is that a lot of people use the "obj >> in queryset" syntax, believing it does "the right thing". >> >> //Johan >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Django developers (Contributions to Django itself)" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to django-developers+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/django-developers/88c83b8d-658c-47cc-9162-fcfebebe9c4a%40googlegroups.com >> <https://groups.google.com/d/msgid/django-developers/88c83b8d-658c-47cc-9162-fcfebebe9c4a%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- > You received this message because you are subscribed to a topic in the > Google Groups "Django developers (Contributions to Django itself)" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/django-developers/NZaMq9BALrs/unsubscribe > . > To unsubscribe from this group and all its topics, send an email to > django-developers+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/django-developers/CABD5YeE7_ORUfJcnt6f4aB4J0j-%3DyDK_BowEt_fefcaFMGdB1g%40mail.gmail.com > <https://groups.google.com/d/msgid/django-developers/CABD5YeE7_ORUfJcnt6f4aB4J0j-%3DyDK_BowEt_fefcaFMGdB1g%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- Vänligen, Johan Schiff Radkompaniet AB 072-229 61 19 -- You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAD69rUBKOse%2Be2RedPT%2B-KWkcdC%3DiaG0zyt1Hakefp6gQ7vdLA%40mail.gmail.com.
QuerySet.contains() for .values() and .values_list()
https://github.com/django/django/pull/13038 Above pull request (Ticket #24141) has a discussion on how to handle qs.contains() for QuerySets with qs.values() or qs.values_list() calls. Current patch checks self._result_cache if exists and iterable class is ModelIterable, otherwise returns .filter(pk=obj.pk).exists(), i.e. hitting the database. These are the options I see: 1. As above. 2. Throw an exception if iterable class is not ModelIterable (when using .values or .values_list). 3. Accept dict lookup for .values and tuple lookup for .values_list querysets, so that the .contains() query matches what you get from the iterable. It seems kind of neat to be able to check if an object is in a QuerySet, even if you group that QuerySet using .values('group').annotate(sum=Sum('value')).order_by(). But that was never the case I saw for QuerySet.contains(), so I have no real preference here. Johan -- You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/5d77428e-a4a4-4bda-a3bb-2811a88e3797n%40googlegroups.com.