Re: Fellow Reports -- December 2019

2020-01-02 Thread Carlton Gibson
Hi all. 


Calendar Week 51 -- ending 22 December.


Triaged:

https://code.djangoproject.com/ticket/31103 -- Pagination documentation is 
missing some important detail (Accepted)
https://code.djangoproject.com/ticket/31102 -- Bug: Saving on a ModelAdmin 
with TabularInlines sometimes duplicates rows (needsinfo)
https://code.djangoproject.com/ticket/31101 -- {% csrf_token %} fails 
validation for xhtml (wontfix)
https://code.djangoproject.com/ticket/31100 -- Why baseformset method 
"non_form_errors" is not property (invalid)
https://code.djangoproject.com/ticket/31070 -- Add a check for URLconfs 
that mix named and unnamed capture groups (needsinfo)
https://code.djangoproject.com/ticket/31096 -- Massively improving 
ManyToMany caching when using in forms (needsinfo)
https://code.djangoproject.com/ticket/31083 -- Add select_related support 
for Site.objects.get_current (wontfix)



Reviewed:

https://github.com/django/django/pull/12037 -- Refs #28954 -- Remove 
remaining code and documentation for Jython.
https://github.com/django/django/pull/12174 -- Replaced "Same as 
…" text with the actual text.
https://github.com/django/django/pull/12226 -- Fixed typo in documentations.
https://code.djangoproject.com/ticket/30585 -- Add support for 
"translate" and "blocktranslate" tags.
https://github.com/django/django/pull/12214 -- Fixed Pytest command in 
upgrade documentation



Happy New Year!

Kind Regards,

Carlton

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/da1b1527-5d7c-4ecc-ac92-8d5374ab33b6%40googlegroups.com.


Re: Sounding out for GSoC 2020.

2020-01-02 Thread Carlton Gibson
Hi Aaryan, 

Happy New Year! Thanks for your interest, and your comment on my 
presentation :)

I will write up the project ideas shortly, and try to give some guidance, 
but I imaging the "secrets abstraction" idea wouldn't be too hard: look at, 
say, Vault, and the AWS offering, what are the common operations? What's 
the simplest API that abstracts them? (c.f. the cache API: it's just set, 
get, delete really...)

Have a play: it's the idea that appeals to you that will have most value. 
The distinction between "stronger" and "weaker" candidates is largely, how 
self-directed are you? Yes, some programming tasks are genuinely difficult, 
but for many, it's amazing how far folks can get if they are able to roll 
up their sleeves and try to work it out.

If you do want input early post on forum.djangoproject.com. 

Kind Regards,

Carlton


On Monday, 23 December 2019 14:01:52 UTC+1, Aaryan Dewan wrote:
>
> Hi Carlton.
>
> As I'm a Freshman in College, I'm totally new to the Open Source 
> community. I have been using Python for two years and recently, I learned 
> the Django Framework so I thought that it would be a great idea to 
> contribute to it during my summer break through GSoC. 
>
> I have read the Django contribution guide 
>  and have 
> looked at the previous year projects. 
>
> This year (2020) I only seem to *understand* 2 of the proposed projects 
> (in this thread) namely:
>
>- Two Factor authentication
>- Django case studies
>
> Since you said in your post "Are there easier things that we could take 
> "weaker" candidates for?...", are there any proposed projects yet that are 
> of *low *complexity as I'm unable to judge the complexity of the proposed 
> projects here. If so, please let me know so that I could start working on 
> them ASAP.
>
> Also, you gave a great presentation at DjangoCon US :-)
>
> Regards,
>
> Aaryan
>
> On Tuesday, December 10, 2019 at 8:55:22 PM UTC+5:30, Carlton Gibson wrote:
>>
>> Hi all. 
>>
>> It's time to start thinking about Google Summer of Code (GSoC). If we're 
>> going to participate and projects we might propose. 
>>
>> This year was interesting. Sage in particular did well putting together a 
>> cross-db JSONField, but he was probably under-mentored, 
>> since Mariusz has spent quite a bit of time reworking the PR, and still 
>> has a bit to go, before we can pull it in, hopefully for 3.1
>>
>> So, one consideration we need to think about seriously is our capacity 
>> for mentoring. (This isn't just about the candidate's ability — Sage was 
>> able to implement all suggestions — we just didn't have as much capacity as 
>> we might have liked to think about the requirement implementation — and 
>> there were four of us actively giving some time each... — Anyhow, to think 
>> about.) 
>>
>> Then it's projects. There are three that I have on my list that would 
>> require a "competent candidate":
>>
>> 1. Work on the migrations. Markus mentioned a particular ticket here 
>> but... 
>> 2. Make the parallel test runner work on Windows. ("fork" vs "spawn")
>>
>> And 3, and this is the big one: 
>>
>> 3. Add 2FA to Django. 
>>
>> This has been raised a few times: 
>>
>> * 
>> https://groups.google.com/d/topic/django-developers/T-kBSvry6z0/discussion
>> * 
>> https://groups.google.com/d/topic/django-developers/d92P2V0YrbI/discussion
>>  
>> * ... others... 
>>
>> If I'm honest, in 2020, it's the one "battery" I feel a little bit 
>> embarrassed we haven't got a story for. Maybe it's not possible in a GSoC 
>> type scope but... 
>> — What would it look like? What can we leverage? Is it worth a go? 
>>
>> I'm looking at James, Florian, Joe, ... — who else has been keen here?
>> I'm also looking at the Technical Board, which I'm thinking has (will 
>> have) a new guiding role to come up with suggestions for the direction of 
>> Django. 
>>
>>
>> Other Projects: Are there other ideas? (Do you have one?) Are there 
>> easier things that we could take "weaker" candidates for? But with that is 
>> there a commitment for the mentoring help they'd need? 
>>
>> Anyhow, we have until January, so I'm just starting the discussion here. 
>>
>> Kind Regards,
>>
>> Carlton
>>
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/baefcb0b-7a05-47bc-b52f-e9ab5a245085%40googlegroups.com.


Potential performance related bug in django`s Queryset.get().

2020-01-02 Thread Anudeep Samaiya
Hi everyone,

Happy New Year!!

Ok so I found that Querset.get() is very slow for large datasets when 
multiple objects exists in very big numbers. I did following changes in my 
local copy of django code and it improved the performance for very large 
data sets significantly (like in a blink of second). Didn't had any obvious 
effects for a table with like 10K records or so. I don't have proper stats 
to prove the performance.

So what was the issue?
Queryset.get() raises two exceptions
1. DoesNotExist
2. MultipleObjectsFound

In case Multiple objects are found, Querset.get() raises an error with how 
many objects are found. To do this it was evaluating query to find length 
by iterating over the queryset which was creating a bottle-neck. For small 
datasets this was not abovious but for large datasets with more than 1 
million recors this was slow. 

So Instead I tried changing the method of counting using the 
Queryset.count(). If count == 1,  only then evaluated the query by calling 
Querset._fetch_all(). The results were much than before.

So do you think this is right way? Should I raise a pr for the patch?


diff --git a/django/db/models/query.py b/django/db/models/query.py
index 38c1358..e442384 100644
--- a/django/db/models/query.py
+++ b/django/db/models/query.py
@@ -420,8 +420,9 @@ class QuerySet:
 if not clone.query.select_for_update or connections[clone.db].
features.supports_select_for_update_with_limit:
 limit = MAX_GET_RESULTS
 clone.query.set_limits(high=limit)
-num = len(clone)
+num = clone.count()
 if num == 1:
+clone._fetch_all()
 return clone._result_cache[0]
 if not num:
 raise self.model.DoesNotExist(


Thanks

Anudeep Samaiya

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/185b907b-fa5a-42ba-8a44-80996d906652%40googlegroups.com.


Re: Potential performance related bug in django`s Queryset.get().

2020-01-02 Thread Adam Johnson
Hi Anudeep

Your change makes get() perform an extra query for count() before it . This
would be a regression for most uses of get().

get() is not intended for use "when multiple objects exists in very big
numbers". You may want to perform a single query, something like
Model.objects.filter(id__in=[1, 2, 3, 4, ...]) , and sort out your
duplicates in a loop in Python.

Hope that helps,

Adam

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAMyDDM0M%3DSXbrQVOWh2QDy%3DGLtuEYk_tQeMA4PXT%2BVKdEtAvMQ%40mail.gmail.com.


Re: Potential performance related bug in django`s Queryset.get().

2020-01-02 Thread charettes
Hey there!

The current code assumes that .get() will likely match one result which 
should be the case
most of the time and limit the number of possible results to prevent 
catastrophic matches

Your patch has the side effect of performing an additional COUNT query for 
every single
QuerySet.get() call. It would also likely only perform better under certain 
circumstances
(e.g. large columns retrieved, possibility of using index only scan on 
COUNT). For these
reasons I don't think this is a good idea.

In summary QuerySet.get is currently optimized for a correct usage of its 
API while limiting
nefarious side effects of misuses and this approach would optimize for the 
uncommon case.

Cheers,
Simon

Le jeudi 2 janvier 2020 08:43:47 UTC-5, Anudeep Samaiya a écrit :
>
> Hi everyone,
>
> Happy New Year!!
>
> Ok so I found that Querset.get() is very slow for large datasets when 
> multiple objects exists in very big numbers. I did following changes in my 
> local copy of django code and it improved the performance for very large 
> data sets significantly (like in a blink of second). Didn't had any obvious 
> effects for a table with like 10K records or so. I don't have proper stats 
> to prove the performance.
>
> So what was the issue?
> Queryset.get() raises two exceptions
> 1. DoesNotExist
> 2. MultipleObjectsFound
>
> In case Multiple objects are found, Querset.get() raises an error with how 
> many objects are found. To do this it was evaluating query to find length 
> by iterating over the queryset which was creating a bottle-neck. For small 
> datasets this was not abovious but for large datasets with more than 1 
> million recors this was slow. 
>
> So Instead I tried changing the method of counting using the 
> Queryset.count(). If count == 1,  only then evaluated the query by calling 
> Querset._fetch_all(). The results were much than before.
>
> So do you think this is right way? Should I raise a pr for the patch?
>
>
> diff --git a/django/db/models/query.py b/django/db/models/query.py
> index 38c1358..e442384 100644
> --- a/django/db/models/query.py
> +++ b/django/db/models/query.py
> @@ -420,8 +420,9 @@ class QuerySet:
>  if not clone.query.select_for_update or connections[clone.db].
> features.supports_select_for_update_with_limit:
>  limit = MAX_GET_RESULTS
>  clone.query.set_limits(high=limit)
> -num = len(clone)
> +num = clone.count()
>  if num == 1:
> +clone._fetch_all()
>  return clone._result_cache[0]
>  if not num:
>  raise self.model.DoesNotExist(
>
>
> Thanks
>
> Anudeep Samaiya
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/9b9e1842-b888-4ff9-a1b7-1d2d0ebc5b8b%40googlegroups.com.


Re: Potential performance related bug in django`s Queryset.get().

2020-01-02 Thread Anudeep Samaiya
Thanks for the revert, I completely agree with both of your explanations.

Thanks
Anudeep Samaiya

> On 02-Jan-2020, at 21:51, charettes  wrote:
> 
> Hey there!
> 
> The current code assumes that .get() will likely match one result which 
> should be the case
> most of the time and limit the number of possible results to prevent 
> catastrophic matches
> 
> Your patch has the side effect of performing an additional COUNT query for 
> every single
> QuerySet.get() call. It would also likely only perform better under certain 
> circumstances
> (e.g. large columns retrieved, possibility of using index only scan on 
> COUNT). For these
> reasons I don't think this is a good idea.
> 
> In summary QuerySet.get is currently optimized for a correct usage of its API 
> while limiting
> nefarious side effects of misuses and this approach would optimize for the 
> uncommon case.
> 
> Cheers,
> Simon
> 
> Le jeudi 2 janvier 2020 08:43:47 UTC-5, Anudeep Samaiya a écrit :
> Hi everyone,
> 
> Happy New Year!!
> 
> Ok so I found that Querset.get() is very slow for large datasets when 
> multiple objects exists in very big numbers. I did following changes in my 
> local copy of django code and it improved the performance for very large data 
> sets significantly (like in a blink of second). Didn't had any obvious 
> effects for a table with like 10K records or so. I don't have proper stats to 
> prove the performance.
> 
> So what was the issue?
> Queryset.get() raises two exceptions
> 1. DoesNotExist
> 2. MultipleObjectsFound
> 
> In case Multiple objects are found, Querset.get() raises an error with how 
> many objects are found. To do this it was evaluating query to find length by 
> iterating over the queryset which was creating a bottle-neck. For small 
> datasets this was not abovious but for large datasets with more than 1 
> million recors this was slow. 
> 
> So Instead I tried changing the method of counting using the 
> Queryset.count(). If count == 1,  only then evaluated the query by calling 
> Querset._fetch_all(). The results were much than before.
> 
> So do you think this is right way? Should I raise a pr for the patch?
> 
> 
> diff --git a/django/db/models/query.py b/django/db/models/query.py
> index 38c1358..e442384 100644
> --- a/django/db/models/query.py
> +++ b/django/db/models/query.py
> @@ -420,8 +420,9 @@ class QuerySet:
>  if not clone.query.select_for_update or 
> connections[clone.db].features.supports_select_for_update_with_limit:
>  limit = MAX_GET_RESULTS
>  clone.query.set_limits(high=limit)
> -num = len(clone)
> +num = clone.count()
>  if num == 1:
> +clone._fetch_all()
>  return clone._result_cache[0]
>  if not num:
>  raise self.model.DoesNotExist(
> 
> 
> Thanks
> 
> Anudeep Samaiya
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to django-developers+unsubscr...@googlegroups.com 
> .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/django-developers/9b9e1842-b888-4ff9-a1b7-1d2d0ebc5b8b%40googlegroups.com
>  
> .

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/851B6358-392D-4592-B693-19F59766881F%40gmail.com.


Re: remove_stale_contenttypes doesn't remove entries for renamed apps.

2020-01-02 Thread Adam Johnson
I guess an optional kwarg would be okay then.

On Thu, 2 Jan 2020 at 03:08, Javier Buzzi  wrote:

> @adam I agree with your points, about data loss, but this can still see
> this as being beneficial, perhaps the approach was just too harsh. Perhaps
> adding a flag in the management command would get everyone on board? The
> flag being off by default and only turns on if you know what you’re doing
> and enable it. At any rate i believe from what i can i see in the code it
> will still prompt you to delete the items it finds unless you added the
> ——no—input which prevents helps with data loss.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers  (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/5c55fdaa-8d89-4baa-a08d-68b0e84c610c%40googlegroups.com
> .
>


-- 
Adam

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAMyDDM3Hu_ZKWUj46%3D0BYN8XECG0OT_kMLNUgaopKXhJOEfJ5Q%40mail.gmail.com.