Customizable urlize

2020-01-07 Thread Johan Schiff
Hi Django.

I'm currently working on a project where we want to use Django template 
filter *urlize*, but with a custom url formatter. The way the Django code 
works atm, we can't extend *urlize* customize output. That means we have to 
go non-DRY and copy code. Generally I like using class based coding for 
this kind of thing, but Djangos template filters seems to prefer functions 
and discourage instantiation. Therefore, it would be nice to solve this in 
a functional style, in line with current design.

Wouldn't it be a good idea to extend django.utils.html.urlize, or am I 
looking at this the wrong way?

The function signature would be smthng like this:

def urlize(
text,
trim_url_limit=None,  *# for compat reasons, though it would make 
sense to accomplish that with the formatter functions in 
the urlizetrunc filter.*
nofollow=False,
autoescape=False,
email_formatter: callable = None,
url_formatter: callable = None,
    ):
    ...


Johan Schiff
Betahaus (SE)

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/5d2877db-83e9-4bda-8087-98f5cd8fc57c%40googlegroups.com.


Implement QuerySet.__contains__?

2020-06-02 Thread Johan Schiff
Is there a specific reason Djangos QuerySet does not implement 
__contains__? It doesn't seem very complicated to me, but maybe I'm missing 
something.

When checking if an object is in e queryset I always use code lite this:
if queryset.filter(pk=obj.pk).exists():

The pythonic way would be:
if obj in queryset:

The way I understand it this latter version will fetch all objects from 
queryset, while the first one is a much leaner db query.

So, is there a reason QuerySet doesn't have a __contains__ method, or would 
it be a good addition? My guess is that a lot of people use the "obj in 
queryset" syntax, believing it does "the right thing".

//Johan

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/88c83b8d-658c-47cc-9162-fcfebebe9c4a%40googlegroups.com.


Re: Implement QuerySet.__contains__?

2020-06-03 Thread Johan Schiff
>
> Roger Gammans 
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/django-developers/NZaMq9BALrs/unsubscribe
> .
> To unsubscribe from this group and all its topics, send an email to
> django-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/3928039038bac9b52279294f7efcac318dc80388.camel%40gammascience.co.uk
> <https://groups.google.com/d/msgid/django-developers/3928039038bac9b52279294f7efcac318dc80388.camel%40gammascience.co.uk?utm_medium=email&utm_source=footer>
> .
>


-- 
Vänligen, Johan Schiff
Radkompaniet AB
072-229 61 19

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAD69rUA1E1Vfdm7kuv32r1Ppyw%3D7CCqCzfSW9itda88th2Uw3g%40mail.gmail.com.


Re: Implement QuerySet.__contains__?

2020-06-03 Thread Johan Schiff
To answer my own question: No, I wasn't doing it correctly. I should have
done a sanity check before posting.

New timeit code and results at bottom, now using a smaller dataset (~900
objects).

Notable:

   - An .exists() query is about 1/100 the time of full fetch in this case.
   This difference would ofc be much bigger if I did it on the 100 000+
   objects.
   - If already prefetched, it should be worth it to search prefetched data
   if the dataset is <700 objects. It seems rare to prefetch that many
   objects. If so, a developer could easily be explicit and use .exists()
   method.

*Based on this exact scenario: 100 times slower for using obj in
queryset is quite a performance hit. On the other hand, an extra 1 % query
time for a queryset that is later evaluated is marginal.*

If a developer knows they want to evaluate the queryset later, it would be
nice to have a .evaluate() that does self._fetch_all() and returns self. I
think that's preferable over list(queryset)-method, because we don't need
to create an extra list.

# Old explicit fetch method
queryset = Stuff.objects.all()
if obj in list(queryset):
pass

# My suggestion
queryset = Stuff.objects.all().evaluate()
if obj in queryset:
pass



Updated timeit function and results:

import timeit

def time_fn(title, stmt, number=10_000):
setup = [
'from movies.models import Movie',
'qs=Movie.objects.all()',
'obj_first = qs.first()',
'obj_mid = qs[300]',
'obj_last = qs.last()',
'list(qs)',
]
result_time = timeit.timeit(stmt, setup='\n'.join(setup), number=number)
print(f'{title}: {result_time}')

time_fn('Database fetch', 'qs._result_cache=None\nlist(qs)')
time_fn('Prefetched obj_first in qs', 'obj_first in qs')
time_fn('Prefetched obj_mid in qs', 'obj_mid in qs')
time_fn('Prefetched obj_last in qs', 'obj_last in qs')
time_fn('Database obj_first exists()', 'qs.filter(pk=obj_first.pk
).exists()')
time_fn('Database obj_mid exists()', 'qs.filter(pk=obj_mid.pk).exists()')
time_fn('Database obj_last exists()', 'qs.filter(pk=obj_last.pk).exists()')

# Results

Database fetch: 616.097227364
Prefetched obj_first in qs: 0.030961711003328674
Prefetched obj_mid in qs: 6.6988333979970776
Prefetched obj_last in qs: 24.18991441695
Database obj_first exists(): 6.468764332996216
Database obj_mid exists(): 6.167532913001196
Database obj_last exists(): 6.0190791100030765


Le mer. 3 juin 2020 à 10:52, Johan Schiff  a écrit :

> Thanks for great info.
>
> First, I'm leaning towards Aymeric's proposition here. I do recognize that
> there is a lot to consider.
>
> This seems to be important:
>
>1. Developers must be able to explicitly choose methods to optimize
>for their environment. (Considering database latency, dataset size,
>prefetch preferred, etc.)
>2. Behaviour should preferably be consistent across methods. (len(qs),
>bool(qs), obj in qs)
>3. Syntax should be pythonic.
>
> I think what Aymeric is proposing would fulfill those demands.
>
> I have done some timeit-tests on a model with 100 000+ records, based on
> Rogers work, on a single host setup using postgresql. My finding is that a
> separate database query is generally faster than current *obj in qs* 
> behaviour,
> even on a prefetched queryset (unless your dataset is really small). This
> tells me that we should prioritize .exists() query unless explicitly stated
> (i.e. *obj in iter(qs)*), even when we have prefetched data. For len(qs)
> and bool(qs) that should not be the case.
>
> It would be interesting to get timings from other setups, so I'm including
> the code I used. (Also, am I doing this correctly?)
>
> import timeit
>
> def time_fn(title, stmt, number=1, prefetch=False):
> setup = [
> 'from tickets.models import Ticket',
> 'qs=Ticket.objects.all()',
> 'obj_first = qs.first()',
> 'obj_mid = qs[5]',
> 'obj_last = qs.last()',
> ]
> if prefetch:
> setup.append('list(qs)')
> result_time = timeit.timeit(stmt, setup='\n'.join(setup),
> number=number)
> print(f'{title}: {result_time}')
>
> time_fn('Database fetch', 'list(qs)')
> time_fn('Prefetched obj_first in qs', 'obj_first in qs', prefetch=True)
> time_fn('Prefetched obj_mid in qs', 'obj_mid in qs', prefetch=True)
> time_fn('Prefetched obj_last in qs', 'obj_last in qs', prefetch=True)
> time_fn('Database obj_first 

Re: Implement QuerySet.__contains__?

2020-06-05 Thread Johan Schiff
Shai does make a good point about changing a well documented behaviour.
That argument wins me over.

Adding .contains() and updating the documentation goes a long way to make
it easier for developers.

Best case would be that Dajngo does not make assumptions about what the
developer wants, but to implement __len__(), __bool__() and __contains__(),
we have to assume one method or the other. I still think the wrong call was
made here, but changing it now is too much pain. *(Assuming queryset should
be evaluated is probably correct in most cases, but sometimes adds a big
performance hit when the code goes into production and the dataset grows -
code that looks reasonable and works like a charm in dev will cause trouble
in production. Assuming the opposite would have added a tiny performance
hit in most cases, but avoided the big one.)*

Since I'm new here:
If people agree QuerySet.contains() should be added, how do we move forward?
I'd be willing to write some code with tests and add a ticket, if that's
helpful.

‪Le ven. 5 juin 2020 à 11:42, ‫אורי‬‎  a écrit :‬

> Hi,
>
> I'm thinking, maybe instead of:
>
> if obj in queryset:
>
> Users should write:
>
> if obj in list(queryset):
>
> Which I understand is already working? Why does the first line (without
> list()) should work anyway?
>
> And if users want performance, why not write:
>
> if queryset.filter(pk=obj.pk).exists():
> אורי
> u...@speedy.net
>
>
> On Tue, Jun 2, 2020 at 11:57 AM Johan Schiff  wrote:
>
>> Is there a specific reason Djangos QuerySet does not implement
>> __contains__? It doesn't seem very complicated to me, but maybe I'm missing
>> something.
>>
>> When checking if an object is in e queryset I always use code lite this:
>> if queryset.filter(pk=obj.pk).exists():
>>
>> The pythonic way would be:
>> if obj in queryset:
>>
>> The way I understand it this latter version will fetch all objects from
>> queryset, while the first one is a much leaner db query.
>>
>> So, is there a reason QuerySet doesn't have a __contains__ method, or
>> would it be a good addition? My guess is that a lot of people use the "obj
>> in queryset" syntax, believing it does "the right thing".
>>
>> //Johan
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Django developers (Contributions to Django itself)" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to django-developers+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/django-developers/88c83b8d-658c-47cc-9162-fcfebebe9c4a%40googlegroups.com
>> <https://groups.google.com/d/msgid/django-developers/88c83b8d-658c-47cc-9162-fcfebebe9c4a%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/django-developers/NZaMq9BALrs/unsubscribe
> .
> To unsubscribe from this group and all its topics, send an email to
> django-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/CABD5YeE7_ORUfJcnt6f4aB4J0j-%3DyDK_BowEt_fefcaFMGdB1g%40mail.gmail.com
> <https://groups.google.com/d/msgid/django-developers/CABD5YeE7_ORUfJcnt6f4aB4J0j-%3DyDK_BowEt_fefcaFMGdB1g%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>


-- 
Vänligen, Johan Schiff
Radkompaniet AB
072-229 61 19

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAD69rUBKOse%2Be2RedPT%2B-KWkcdC%3DiaG0zyt1Hakefp6gQ7vdLA%40mail.gmail.com.


QuerySet.contains() for .values() and .values_list()

2021-01-13 Thread Johan Schiff
https://github.com/django/django/pull/13038

Above pull request (Ticket #24141) has a discussion on how to handle 
qs.contains() for QuerySets with qs.values() or qs.values_list() calls.

Current patch checks self._result_cache if exists and iterable class is 
ModelIterable, otherwise returns .filter(pk=obj.pk).exists(), i.e. hitting 
the database.

These are the options I see:
1. As above.
2. Throw an exception if iterable class is not ModelIterable (when using 
.values or .values_list).
3. Accept dict lookup for .values and tuple lookup for .values_list 
querysets, so that the .contains() query matches what you get from the 
iterable.

It seems kind of neat to be able to check if an object is in a QuerySet, 
even if you group that QuerySet using 
.values('group').annotate(sum=Sum('value')).order_by(). But that was never 
the case I saw for QuerySet.contains(), so I have no real preference here.

Johan

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/5d77428e-a4a4-4bda-a3bb-2811a88e3797n%40googlegroups.com.