Re: QuerySet.iterator together with prefetch_related because of chunk_size

2020-05-20 Thread 'Taylor' via Django developers (Contributions to Django itself)
Checking in here. What are the next steps to make a decision?

On Wednesday, February 13, 2019 at 2:06:29 PM UTC-5, Taylor wrote:
>
> I agree that 2 makes sense and isn't a huge usability issue. It looks like 
> the PR is ready to go whenever this is decided 
> https://github.com/django/django/pull/10707/files
>
> On Monday, January 14, 2019 at 5:29:18 PM UTC-5, Adam Johnson wrote:
>>
>> I agree with your reasoning, and also favour option 2 now, especially 
>> since the default can break on sqlite.
>>
>> It would also be possible to go through a deprecation period to first 
>> raise a warning and not prefetch, before a later version raises an 
>> exception, which is probably kinder since previously it was documented that 
>> it just was a no-op.
>>
>> On Mon, 14 Jan 2019 at 19:03, charettes  wrote:
>>
>>> > This seems reasonable, but would you do in the case chunk_size isn't 
>>> explicitly defined - throw an exception?
>>>
>>> That's would be the plan.
>>>
>>> > Currently it silently fails to prefetch which means N + 1 queries, so 
>>> even prefetching for the default chunk_size of 2000 would be a huge gain in 
>>> cases where chunk_size isn't defined.
>>>
>>> That's assuming related relationships are actually accessed during 
>>> iteration over the objects. If it's not currently the case turning that on 
>>> would cause P + C extra queries.
>>>
>>> To summarize here's the options we have:
>>>
>>> 1. Always turn on the prefetching feature.
>>> 2. Require chunk_size to be explicitly specified for a deprecation 
>>> period in order to enable the feature.
>>>
>>> Pros/Cons of 1. is that developers who actually didn't follow the 
>>> documentation and/or didn't assert that prefetching didn't work and left 
>>> some lingering prefetch_related() will probably see their code perform less 
>>> queries but result in a slight increase in memory (given they discard 
>>> objects on iterating). This is also likely to break code because of some 
>>> backends (e.g. SQLite) query parameters limits (chunk_size=2000 > 1000)[0]
>>>
>>> Pros/Cons of 2. is that most prefetch_related().iterator() won't break 
>>> for developers that followed the documentation and fail for others 
>>> requiring them to acknowledge what their code is going to start doing.
>>>
>>> As I've expressed in my previous messages I'm slightly in favor of 2 
>>> even if it is likely to cause more breakage because of the exceptional 
>>> situation where this API should have raised an exception from the beginning.
>>>
>>> Cheers,
>>> Simon
>>>
>>> [0] https://code.djangoproject.com/ticket/27833
>>>
>>> Le lundi 14 janvier 2019 09:53:45 UTC-5, Adam Johnson a écrit :

 ...what if we required chunk_size to be explicitly specified when the 
> queryset has prefetch lookups?


 This seems reasonable, but would you do in the case chunk_size isn't 
 explicitly defined - throw an exception? Currently it silently fails to 
 prefetch which means N + 1 queries, so even prefetching for the default 
 chunk_size of 2000 would be a huge gain in cases where chunk_size isn't 
 defined.

 On Sun, 13 Jan 2019 at 02:05, charettes  wrote:

> Replying to my concerns about the P * C queries.
>
> Throwing that idea out there but what if we required chunk_size to be 
> explicitly specified when the queryset has prefetch lookups?
>
> That would at least force the developers to consider how many 
> prefetching queries iterating through the results would require. Plus 
> since 
> the argument was only recently introduced it is less likely to silently 
> cause changes under developers feet.
>
> Simon
>
> Le vendredi 26 octobre 2018 20:12:02 UTC-4, charettes a écrit :
>>
>> Josh, I agree that silently not working is problematic but it has 
>> been this way since prefetch_related() was introduced.
>>
>> Something to keep in mind as well is that silently turning it on 
>> would also perform P * C extra queries where P is the number of 
>> prefetches 
>> requested through prefetch_related() and C the number of chunks the 
>> results 
>> contains. This is non negligible IMO.
>>
>> What I'd be in favor off is raising an exception on 
>> prefetch_related(*prefetches).iterator() in the next release at least to 
>> allow users using this pattern to adjust their code and then ship the 
>> optimization with all the warnings related to the interactions between 
>> prefetch_related(*prefetches) and iterator(chunk_size) in the next one.
>>
>> I'm not completely completely against skipping the exception release 
>> phase entirely given there might be users out there accessing attributes 
>> expected to be prefetched on iterated instances and inadvertently 
>> performing tons of queries but the exception phase just feels safer 
>> given 
>> iterator() is usually used in memory constrained contexts.
>>

Paid Contributions for specific feature in Django

2021-04-20 Thread 'Taylor' via Django developers (Contributions to Django itself)
Hi All,
I am not sure if this is the place to ask this question, so sorry in 
advance, but I am looking to pay for a feature request. I would like for 
someone to build a Django DB adapter for Snowflake DB. 
https://www.snowflake.com/

It would be happy if it was a part of Django core or a separate package. 
Either way, it would be 100% open source after completion. Is there any 
process for this type of thing? If there isn't, is there anyone on here 
interested in this type of work?

Best,
Taylor

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/d2a082af-c014-47d9-909f-87298a2e117bn%40googlegroups.com.


Re: Snowflake db backend

2021-04-20 Thread 'Taylor' via Django developers (Contributions to Django itself)
Hey Everyone,

Sorry to open up an old thread. 

Tim - were you ever able to open source your Snowflake backend? We would 
love to use it and even devote resources (developers or funding for 
contractors) to improving it and getting the tests passing. At Cedar, we 
were planning on creating our own Snowflake backend, but it would be great 
to not duplicate work here. What are your thoughts?

Best,
Taylor

On Wednesday, January 27, 2021 at 1:08:13 AM UTC-8 f.apo...@gmail.com wrote:

> Hi Scott,
>
> Thank you for your response, this is very helpful.
>
> On Tuesday, January 26, 2021 at 11:38:18 PM UTC+1 foug...@apps.disney.com 
> wrote:
>
>> Snowflake does not support lastrowid.  So, we grab the last ID inserted 
>> with a 'SELECT MAX(pk_name) FROM table_name'.  This is obviously prone to 
>> failure.  Assigning an ID during the INSERT would provide similar results 
>> on all backends.
>>
>
> U, the 'SELECT MAX()' is not going to fly well, you are right. 
> Assigning an ID during INSERT has it's own problems. In postgresql it would 
> be possible because you could just select the next value from the created 
> sequence (more or less), but with IDENTITY columns this might get harder. I 
> do not think there is a sensible way to do this in MySQL at all. While 
> lastrowid support is a nice to have, Django should work (mostly?) without 
> it: 
> https://github.com/django/django/blob/464a4c0c59277056b5d3c1132ac1b4c6085aee08/django/db/models/sql/compiler.py#L1372-L1387
>  
> -- the requirement here is that your database is at least able to return 
> values after insert. From the looks of it, it does not I think? Or 
> differently put: Which ways does snowflake offer to get an ID? Solely by 
> providing it directly into insert?
>
> The feature flag `supports_transactions` really means 
>> `supports_nested_transactions`.  Snowflake supports a single level of 
>> transaction, BEGIN + (ROLLBACK|COMMIT).  Multiple BEGINS contribute to the 
>> current (only) transaction.  Since I have to set this flag to False, no 
>> transactions are used, even ones that are supported and testing grinds to a 
>> crawl with all of the table truncations and repopulation.  Since Django 
>> *normally* operates in auto-commit mode, this isn't a huge deal. 
>>  Snowflake also doesn't support save points, but the feature flag seems to 
>> do what is expected when disabled.
>>
>
> Hu, which database support nested BEGIN? As far as I am aware Django does 
> never nest BEGINs -- do you have an example for me? I'd very much like to 
> fix this. From a quick glance at the code we only start a transaction if we 
> are not already in one: 
> https://github.com/django/django/blob/master/django/db/transaction.py#L196-L208
>
> Snowflake does not support column references without a FROM or JOIN 
>> clause.  I've only encountered this used in the unit tests.
>>
>
> Ok, see below.
>
> I've been able to work around most of the function differences by adding 
>> as_snowflake methods to the appropriate classes.  There are a few cases 
>> like JSON_OBJECT that don't translate well, which work, but not entirely as 
>> expected.
>>
>
> Perfect, this sounds as if our extension system works as intended in this 
> area.
>
> A search for 'connection.vendor == ' in the core code shows one example of 
>> where core backends are given privileged access to core Django inner 
>> workings that 3rd party backends don't.
>>
>
> We have been working to get rid of those: 
> https://github.com/django/django/commit/275dd4ebbabbbe758c7219a3d666953d5a7b072f#diff-69f332030b6f25f8f4d95842548d847139ee2cc163aef18f1c8d83b90f3f9e5f
>  
> -- This is solely in 3.2, but Tim can suggest a workaround for earlier 
> versions (he was doing something similar in his cockroachdb tests before 
> 3.2).
>
> Please take these comments as observations, not complaints.  I understand 
>> project evolution and the resistance to change something that doesn't seem 
>> broken.  Maybe keep some of these thoughts in mind when a change is made to 
>> the core backends or the compiler and consider how other backends will need 
>> to implement that new feature.
>>
>
> No offense taken at all. If somehow possible I'd like to encourage you to 
> work with us on your pain points. I think at least some of those are 
> addressed or at least addressable. I am happy to offer code and fixes, but 
> I'll need a few more details especially wrt transaction handling.
>
> Cheers,
> Florian
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/03abf31b-75db-433e-9d35-ead6fefff5a4n%40googlegroups.com.


Re: Snowflake db backend

2021-04-20 Thread 'Taylor' via Django developers (Contributions to Django itself)
Sorry, I meant to write Scott, not Tim. I shouldn't write emails late at 
night.

- Taylor

On Tuesday, April 20, 2021 at 10:29:46 PM UTC-7 Taylor wrote:

> Hey Everyone,
>
> Sorry to open up an old thread. 
>
> Tim - were you ever able to open source your Snowflake backend? We would 
> love to use it and even devote resources (developers or funding for 
> contractors) to improving it and getting the tests passing. At Cedar, we 
> were planning on creating our own Snowflake backend, but it would be great 
> to not duplicate work here. What are your thoughts?
>
> Best,
> Taylor
>
> On Wednesday, January 27, 2021 at 1:08:13 AM UTC-8 f.apo...@gmail.com 
> wrote:
>
>> Hi Scott,
>>
>> Thank you for your response, this is very helpful.
>>
>> On Tuesday, January 26, 2021 at 11:38:18 PM UTC+1 foug...@apps.disney.com 
>> wrote:
>>
>>> Snowflake does not support lastrowid.  So, we grab the last ID inserted 
>>> with a 'SELECT MAX(pk_name) FROM table_name'.  This is obviously prone to 
>>> failure.  Assigning an ID during the INSERT would provide similar results 
>>> on all backends.
>>>
>>
>> U, the 'SELECT MAX()' is not going to fly well, you are right. 
>> Assigning an ID during INSERT has it's own problems. In postgresql it would 
>> be possible because you could just select the next value from the created 
>> sequence (more or less), but with IDENTITY columns this might get harder. I 
>> do not think there is a sensible way to do this in MySQL at all. While 
>> lastrowid support is a nice to have, Django should work (mostly?) without 
>> it: 
>> https://github.com/django/django/blob/464a4c0c59277056b5d3c1132ac1b4c6085aee08/django/db/models/sql/compiler.py#L1372-L1387
>>  
>> -- the requirement here is that your database is at least able to return 
>> values after insert. From the looks of it, it does not I think? Or 
>> differently put: Which ways does snowflake offer to get an ID? Solely by 
>> providing it directly into insert?
>>
>> The feature flag `supports_transactions` really means 
>>> `supports_nested_transactions`.  Snowflake supports a single level of 
>>> transaction, BEGIN + (ROLLBACK|COMMIT).  Multiple BEGINS contribute to the 
>>> current (only) transaction.  Since I have to set this flag to False, no 
>>> transactions are used, even ones that are supported and testing grinds to a 
>>> crawl with all of the table truncations and repopulation.  Since Django 
>>> *normally* operates in auto-commit mode, this isn't a huge deal. 
>>>  Snowflake also doesn't support save points, but the feature flag seems to 
>>> do what is expected when disabled.
>>>
>>
>> Hu, which database support nested BEGIN? As far as I am aware Django does 
>> never nest BEGINs -- do you have an example for me? I'd very much like to 
>> fix this. From a quick glance at the code we only start a transaction if we 
>> are not already in one: 
>> https://github.com/django/django/blob/master/django/db/transaction.py#L196-L208
>>
>> Snowflake does not support column references without a FROM or JOIN 
>>> clause.  I've only encountered this used in the unit tests.
>>>
>>
>> Ok, see below.
>>
>> I've been able to work around most of the function differences by adding 
>>> as_snowflake methods to the appropriate classes.  There are a few cases 
>>> like JSON_OBJECT that don't translate well, which work, but not entirely as 
>>> expected.
>>>
>>
>> Perfect, this sounds as if our extension system works as intended in this 
>> area.
>>
>> A search for 'connection.vendor == ' in the core code shows one example 
>>> of where core backends are given privileged access to core Django inner 
>>> workings that 3rd party backends don't.
>>>
>>
>> We have been working to get rid of those: 
>> https://github.com/django/django/commit/275dd4ebbabbbe758c7219a3d666953d5a7b072f#diff-69f332030b6f25f8f4d95842548d847139ee2cc163aef18f1c8d83b90f3f9e5f
>>  
>> -- This is solely in 3.2, but Tim can suggest a workaround for earlier 
>> versions (he was doing something similar in his cockroachdb tests before 
>> 3.2).
>>
>> Please take these comments as observations, not complaints.  I understand 
>>> project evolution and the resistance to change something that doesn't seem 
>>> broken.  Maybe keep some of these thoughts in mind when a change is made to 
>>> the core backends or the compiler and consider how other backends will need 
>>> to implement that new feature.
>>>
>>
>> No offense taken at all. If somehow possible I'd like to encourage you to 
>> work with us on your pain points. I think at least some of those are 
>> addressed or at least addressable. I am happy to offer code and fixes, but 
>> I'll need a few more details especially wrt transaction handling.
>>
>> Cheers,
>> Florian
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroup