Re: Model-level validation

2022-10-07 Thread Carlton Gibson
> ... the duplication I was referring to is having both Forms and
Serializers do validation.

That's a separate issue.

Can we merge various aspects of DRF into Django, so that it better handles
building JSON APIs? Yes, clearly. One step of that is better content type
handling, another is serializers. (There are others).
On the serializer front, it would be a question of making django.forms
better able to handle list-like (possibly do-able with FormSet) and nested
data, and so on.
Not a small project, but with things like django-readers, and Pydantic (and
django-ninja), and attrs/cattrs showing new ideas, re-thinking about
serialization in Django is about due.

But the issue is here:

> ... I also don't relish the thought of needing to use a Form or
Serializer every time I alter a Model's data.

I'm like literally, "¿Qué? 😳" - Every single time you get data from an
untrusted source you simply **must** validate it before use. ("Filter
input, escape output", I was drilled.) That applies exactly the same to a
CSV file as it does to HTTP request data. (That your CSV is malformed is
axiomatic no? :)

If you want to enforce validation, with a single call, write a method (on a
manager likely) that encapsulates your update logic (and runs the
validation before save). Then always use that in your code. (That's long
been a recommended pattern
.) But don't
skip the validation layer on your incoming data.

I would be -1 to `validate` kwarg to `save()` — that's every user ever
wondering *should I use it? *every time. (Same for a setting.)
Rather — is this a docs issue? — we should re-emphasise the importance of
the validation layer.
Then if folks want a convenience API to do both tasks, they're free to
write that for their models. (This is what Uri has done for Speedy Net.
It's not a bad pattern.)






On Fri, 7 Oct 2022 at 04:34, Aaron Smith  wrote:

> James - to clarify, the duplication I was referring to is having both
> Forms and Serializers do validation. I often work with web apps where data
> for the same model can arrive via user input, serializer, or created in
> some backend process e.g. Celery. If forms/serializers are your validation
> layer, you need to duplicate it and worry about how to keep them from
> diverging over time as there's no single source of truth. I also don't
> relish the thought of needing to use a Form or Serializer every time I
> alter a Model's data.
>
> Perhaps we think about validation differently. I consider it to be
> critical to maintain complex systems with any kind of confidence, any time
> data is being created or changed, regardless of where that change comes
> from. Bugs can happen anywhere and validation is the best (only?) option to
> prevent data-related bugs.
> On Thursday, October 6, 2022 at 12:03:28 PM UTC-7 James Bennett wrote:
>
>> On Thu, Oct 6, 2022 at 9:00 AM Aaron Smith  wrote:
>>
>>> James - The problem with moving validation up the stack, i.e. to logical
>>> branches from Model (Form, Serializer) is that you must duplicate
>>> validation logic if your data comes from multiple sources or domains (web
>>> forms *and* API endpoints *and* CSVs polled from S3. Duplication leads
>>> to divergence leads to horrible data integrity bugs and no amount of test
>>> coverage can guarantee safety. Even if you consider Django to be "only a
>>> web framework" I would still argue that validation should be centralized in
>>> the data storage layer. Validity is a core property of data. Serialization
>>> and conversion changes between sources and is a different concern than
>>> validation.
>>>
>>
>> I would flip this around and point out that the duplication comes from
>> seeing the existing data conversion/validation layer and deciding not to
>> use it.
>>
>> There's nothing that requires you to pass in an HttpRequest instance to
>> use a form or a serializer -- you can throw a dict of data from any source
>> into one and have it convert/validate for you.  Those APIs are also
>> designed to be easy to check and easy to return useful error messages from
>> on failed validation, while a model's save() has no option other than to
>> throw an exception at you and demand you parse the details out of it
>> (because it was designed as part of an overall web framework that already
>> had the validation layer elsewhere).
>>
>> So I would argue, once again, that the solution to your problem is to use
>> the existing data conversion/validation utilities (forms or serializers)
>> regardless of the source of the data. If you refuse to, I don't think
>> that's Django's problem to solve.
>>
>>> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.co

Re: Model-level validation

2022-10-07 Thread אורי
אורי
u...@speedy.net


On Fri, Oct 7, 2022 at 10:01 AM Carlton Gibson 
wrote:

> > ... the duplication I was referring to is having both Forms and
> Serializers do validation.
>
> That's a separate issue.
>
> Can we merge various aspects of DRF into Django, so that it better handles
> building JSON APIs? Yes, clearly. One step of that is better content type
> handling, another is serializers. (There are others).
> On the serializer front, it would be a question of making django.forms
> better able to handle list-like (possibly do-able with FormSet) and nested
> data, and so on.
> Not a small project, but with things like django-readers, and
> Pydantic (and django-ninja), and attrs/cattrs showing new ideas,
> re-thinking about serialization in Django is about due.
>
> But the issue is here:
>
> > ... I also don't relish the thought of needing to use a Form or
> Serializer every time I alter a Model's data.
>
> I'm like literally, "¿Qué? 😳" - Every single time you get data from an
> untrusted source you simply **must** validate it before use. ("Filter
> input, escape output", I was drilled.) That applies exactly the same to a
> CSV file as it does to HTTP request data. (That your CSV is malformed is
> axiomatic no? :)
>
> If you want to enforce validation, with a single call, write a method (on
> a manager likely) that encapsulates your update logic (and runs the
> validation before save). Then always use that in your code. (That's long
> been a recommended pattern
> .) But
> don't skip the validation layer on your incoming data.
>
> I would be -1 to `validate` kwarg to `save()` — that's every user ever
> wondering *should I use it? *every time. (Same for a setting.)
> Rather — is this a docs issue? — we should re-emphasise the importance of
> the validation layer.
> Then if folks want a convenience API to do both tasks, they're free to
> write that for their models. (This is what Uri has done for Speedy Net.
> It's not a bad pattern.)
>

Thank you! 🍑

You might want to include such a solution in the docs, in case Django users
want to validate models.

My solution is taken from https://gist.github.com/glarrain/5448253

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CABD5YeH%2B-9SeTqLvLZHjDZSRg_xPBcyLyggwi8LGtR2orNmHGA%40mail.gmail.com.


Re: Model-level validation

2022-10-07 Thread 'Barry Johnson' via Django developers (Contributions to Django itself)
I agree with James in several ways.   Our large Django application does 
rather extensive validation of data -- but I would argue strongly against 
embedding that validation in the base instance.save() logic.

(I would not argue against Django including a "ValidatingModel", derived 
from Model, that automatically runs defined validations as part of save(). 
 Then developers could choose which base they'd like to subclass when 
designing their objects.  Of course, anyone could simply create their own 
"ValidatingModel" class and derive everything from that class.)

Reason 1 is that business logic validation often requires access to 
multiple model instances -- for example, uniqueness across a set of objects 
about to be updated.  (e.g., "Only one person can be marked as the primary 
contact").  Or internal consistency:   "If this record is of this type, 
then it cannot have any children of that type".  Or even referential 
integrity in some cases:  "The incoming data has a code that serves as the 
primary key in some other table.  Make sure that primary key exists."

Yes, you can encode all of those cross-instance validations into an 
instance-level check, but then that brings us to the second point: 
 Performance.  There are a number of types of validations that are best 
served by operating on sets or lists of instances at a time.   Again, 
consider a referential integrity validation:  If I'm about to bulk_create 
5000 instances, but need to confirm that the "xyz code" is valid for all of 
them, then I should run a query that selects the "xyz table" for all of the 
codes that are referenced within the 5000 items instead of doing 5000 
individual lookups within that table.   Yes, one can maintain and access 
caches of known-valid things, but those are awkward to manage from within 
the Model layer.  

It's particularly difficult to write performant validations within the 
model when you're using .only() or .defer().   Unless the validation logic 
is able to detect that certain properties haven't been loaded from the 
database, then they would trigger extra queries retrieving values from the 
database solely for the purpose of validating that they are still correct 
(even though you aren't changing them).

Also on the performance front, there are times that removing the extra 
layer of validation is necessary and appropriate.  With well-tested code, 
once the incoming data has been validated and the 
transformation/operational logic is considered fully tested and accurate, 
then avoiding a second validation on the outbound data can result in a 
significant performance improvement.  If you're dealing with millions or 
billions of records at a time (as we do during data conversions), then 
those significant performance improvements are worthwhile.

Finally, Django supports the queryset .update() method.  Again, validations 
that run within the model instance won't even HAVE instances when using 
.update() -- the queryset manager would need to figure out how to do the 
necessary validation (and if it's a multi-field validation, good luck!)   
There are also cases where the use of raw SQL is appropriate, and one 
obviously cannot lean on instance-level validation in that case.

Validation is indeed important -- but testing the validity of data belongs 
in the business logic layer, not in the database model layer.  Agreed that 
some types of validations can easily be encoded into the database model, 
but then you find yourselves writing two layers of validation ("one simple, 
the other more sophisticated")...  that that makes things even more 
complex.  We do indeed use the model-level validations for single-field 
validations... but we invoke those validations from our business logic at 
the proper time, not during the time we're saving the data to the database.

baj

Barry Johnson
Epicor


On Thursday, October 6, 2022 at 2:47:19 AM UTC-5 James Bennett wrote:

> I see a lot of people mentioning that other ORMs do validation, but not 
> picking up on a key difference:
>
> Many ORMs are designed as standalone packages. For example, in Python 
> SQLAlchemy is a standalone DB/ORM package, and other languages have similar 
> popular ORMs.
>
> But Django's ORM isn't standalone. It's tightly integrated into Django, 
> and Django is a web framework. And once you focus *specifically* on the web 
> framework use case, suddenly things start going differently.
>
> For example: data on the web is "stringly-typed" (effectively, since HTTP 
> doesn't really have data types) and comes in via HTML's form mechanism or 
> other string-y formats like JSON or XML payloads. So you need not just data 
> *validation*, but data *conversion* which works for the web use case.
>
> And since the web use case inevitably involves supporting forms/payloads 
> that don't persist to a relational data store -- think of, for example, a 
> contact form that sends an email, or forms that store their results 
> client-side for things

Re: Model-level validation

2022-10-07 Thread Aaron Smith
Yes, every time you you get data from an untrusted source you must validate 
it. As well as *every time you change model attributes, ever*. There seems 
to be a widespread frame of mind in Django that validation is something you 
only need to do with data from a untrusted sources. As someone who has had 
to deal with the consequences of this pattern in mission critical systems, 
this terrifies me, and I consider it *extremely* harmful. Untrusted users 
are not the only place you can get bad data from. Bugs can happen anywhere, 
and no data source can be considered "safe". It happens *all the time*. 
Nothing is more dangerous than a developer who says "don't worry, I'll 
remember to do everything perfectly 100% of the time".  This is why 
model-level validation is the default in other ORMs. Django is not somehow 
immune to this fundamental property of software.

I am aware there are patterns to work around this in Django. My position is 
that skipping validation should be the rare edge case and not the easy 
naive path. Unless Django's stated purpose is to be a cute toy for making 
blogs, and robust infrastructure is off-label, but that's not what I see in 
the wild.
On Friday, October 7, 2022 at 12:01:30 AM UTC-7 carlton...@gmail.com wrote:

> > ... the duplication I was referring to is having both Forms and 
> Serializers do validation.
>
> That's a separate issue. 
>
> Can we merge various aspects of DRF into Django, so that it better handles 
> building JSON APIs? Yes, clearly. One step of that is better content type 
> handling, another is serializers. (There are others). 
> On the serializer front, it would be a question of making django.forms 
> better able to handle list-like (possibly do-able with FormSet) and nested 
> data, and so on. 
> Not a small project, but with things like django-readers, and 
> Pydantic (and django-ninja), and attrs/cattrs showing new ideas, 
> re-thinking about serialization in Django is about due. 
>
> But the issue is here: 
>
> > ... I also don't relish the thought of needing to use a Form or 
> Serializer every time I alter a Model's data.
>
> I'm like literally, "¿Qué? 😳" - Every single time you get data from an 
> untrusted source you simply **must** validate it before use. ("Filter 
> input, escape output", I was drilled.) That applies exactly the same to a 
> CSV file as it does to HTTP request data. (That your CSV is malformed is 
> axiomatic no? :) 
>
> If you want to enforce validation, with a single call, write a method (on 
> a manager likely) that encapsulates your update logic (and runs the 
> validation before save). Then always use that in your code. (That's long 
> been a recommended pattern 
> .) But 
> don't skip the validation layer on your incoming data. 
>
> I would be -1 to `validate` kwarg to `save()` — that's every user ever 
> wondering *should I use it? *every time. (Same for a setting.)
> Rather — is this a docs issue? — we should re-emphasise the importance of 
> the validation layer. 
> Then if folks want a convenience API to do both tasks, they're free to 
> write that for their models. (This is what Uri has done for Speedy Net. 
> It's not a bad pattern.) 
>
>
>
>
>
>
> On Fri, 7 Oct 2022 at 04:34, Aaron Smith  wrote:
>
>> James - to clarify, the duplication I was referring to is having both 
>> Forms and Serializers do validation. I often work with web apps where data 
>> for the same model can arrive via user input, serializer, or created in 
>> some backend process e.g. Celery. If forms/serializers are your validation 
>> layer, you need to duplicate it and worry about how to keep them from 
>> diverging over time as there's no single source of truth. I also don't 
>> relish the thought of needing to use a Form or Serializer every time I 
>> alter a Model's data.
>>
>> Perhaps we think about validation differently. I consider it to be 
>> critical to maintain complex systems with any kind of confidence, any time 
>> data is being created or changed, regardless of where that change comes 
>> from. Bugs can happen anywhere and validation is the best (only?) option to 
>> prevent data-related bugs.
>> On Thursday, October 6, 2022 at 12:03:28 PM UTC-7 James Bennett wrote:
>>
>>> On Thu, Oct 6, 2022 at 9:00 AM Aaron Smith  wrote:
>>>
 James - The problem with moving validation up the stack, i.e. to 
 logical branches from Model (Form, Serializer) is that you must duplicate 
 validation logic if your data comes from multiple sources or domains (web 
 forms *and* API endpoints *and* CSVs polled from S3. Duplication leads 
 to divergence leads to horrible data integrity bugs and no amount of test 
 coverage can guarantee safety. Even if you consider Django to be "only a 
 web framework" I would still argue that validation should be centralized 
 in 
 the data storage layer. Validity is a core property of data. Serialization 
 and conversion ch

Re: Model-level validation

2022-10-07 Thread Mariusz Felisiak
> I am aware there are patterns to work around this in Django. My position 
is that skipping validation should be the rare edge case and not the easy 
naive path. Unless Django's stated purpose is to be a cute toy for making 
blogs, and robust infrastructure is off-label, but that's not what I see in 
the wild.

I think you're going a bit too far with your judgement and comparisons. 
It's already clear for everyone involved in this thread that you're firmly 
convinced that your way of doing things is the only right one. You don't 
need to emphasize it any more.

I can say that in the past 15+ years I made dozens of web apps (I've never 
written a blog) including critical workflows for international retailers, 
pharmaceutical companies, public sector etc. and I've never missed 
auto-validation in the ORM layer. Is this an argument in the discussion? 
Not really, IMO :) It's just the way it is, it's not something that can or 
should convince anyone that I'm right :)

Personally, I agree with James and I'm strongly against any auto-validation 
in the ORM. I'm also against extra settings and built-in subclasses of 
`Model` as `ValidatingModel`, because they would be confusing for newcomers 
and increase the barrier of entry for developers. Django has to make design 
decisions and that's one of them.

Best,
Mariusz

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/6303eaf5-f4d9-4c15-bcce-f57efc929d0bn%40googlegroups.com.


Re: Why using django.contrib.sessions as the salt to encode session data? why not secret key?

2022-10-07 Thread Avantika gohane
heyavantika this side

On Mon, Oct 3, 2022, 5:21 PM Lokesh Sanapalli 
wrote:

> Hi,
>
> I was going through the code and got a question. I saw that we are using
> hard-coded string `django.contrib.sessions` as the key salt to encode
> session data
> .
> Why not using the secret key? as the secret key is specific to environment
> and project it serves as a good candidate. Is it because the session data
> does not contain any sensitive info (it only contains user id and other
> info) so that's why this decision is made?
>
> Thanks & Regards,
> Lokesh Sanpalli
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/6c6544b7-a190-4198-9108-6c66fac213ebn%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CA%2B-Ekf%2B9FFq24Ux4d9J8bDV8JSM3stO6Xr8CHfykzMGi1STZ8A%40mail.gmail.com.


Re: Model-level validation

2022-10-07 Thread Aaron Smith
Mariusz - fair enough, I will consider my point made and apologies if it 
came off too strong. FWIW it's not just my opinion, it's shared by every 
developer (dozens) I've had this conversation with up until now. It's a 
stark contrast that makes me wonder how aware the core developers / old 
timers are of the broader user base's experience.

So you would object to a `VALIDATE_MODELS_BY_DEFAULT` setting, defaulted to 
False?
On Friday, October 7, 2022 at 8:55:24 AM UTC-7 Mariusz Felisiak wrote:

> > I am aware there are patterns to work around this in Django. My position 
> is that skipping validation should be the rare edge case and not the easy 
> naive path. Unless Django's stated purpose is to be a cute toy for making 
> blogs, and robust infrastructure is off-label, but that's not what I see in 
> the wild.
>
> I think you're going a bit too far with your judgement and comparisons. 
> It's already clear for everyone involved in this thread that you're firmly 
> convinced that your way of doing things is the only right one. You don't 
> need to emphasize it any more.
>
> I can say that in the past 15+ years I made dozens of web apps (I've never 
> written a blog) including critical workflows for international retailers, 
> pharmaceutical companies, public sector etc. and I've never missed 
> auto-validation in the ORM layer. Is this an argument in the discussion? 
> Not really, IMO :) It's just the way it is, it's not something that can or 
> should convince anyone that I'm right :)
>
> Personally, I agree with James and I'm strongly against any 
> auto-validation in the ORM. I'm also against extra settings and built-in 
> subclasses of `Model` as `ValidatingModel`, because they would be confusing 
> for newcomers and increase the barrier of entry for developers. Django has 
> to make design decisions and that's one of them.
>
> Best,
> Mariusz
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/97f41b89-4bdf-4d4b-b5f5-c79e84512058n%40googlegroups.com.


Re: Model-level validation

2022-10-07 Thread James Bennett
On Fri, Oct 7, 2022 at 6:21 PM Aaron Smith  wrote:

> Mariusz - fair enough, I will consider my point made and apologies if it
> came off too strong. FWIW it's not just my opinion, it's shared by every
> developer (dozens) I've had this conversation with up until now. It's a
> stark contrast that makes me wonder how aware the core developers / old
> timers are of the broader user base's experience.


I would wonder how many of these developers you've talked to are used to
working in Python.

The main standalone ORM package people use in Python is SQLAlchemy, which
*also* does not do validation in the ORM layer. The last time I worked with
Flask, the standard practice was to use Marshmallow to write serializers,
and these days the popular async frameworks like Starlite and FastAPI have
you write Pydantic models. In either case they fill the role of, say, a DRF
serializer -- they do the validation, and data type conversion at the
application boundaries, so that the ORM doesn't have to.

It's true that when you start branching out into other languages you'll
encounter ORMs which have validation built-in, like Entity Framework or
Hibernate, but you'll also more often encounter that in statically-typed
languages where the data conversion step has already been handled for you.
It's also not always clear that the ORM is the right place for validation,
since often the rules being enforced are ones that aren't actually enforced
at the DB level by constraints.

Either way, I think I've made the case for why Django doesn't and shouldn't
do this. You seem to have a strong reluctance to use either Django forms
(in a "vanilla" Django project) or DRF serializers (in a more "API"
project) to validate data from sources other than direct user-initiated
HTTP request, but I don't really get that -- the validation utilities are
there, and if you're not willing to use them that still is not Django's
problem to solve -- after all, someone else might be equally set in their
conviction that all the existing validation layers are the wrong way to do
things, and demand we add yet another one, and I doubt you'd be supportive
of that.

So I think Django should continue to be Django, and validation should
continue to be a layer independent of the ORM (which, as I originally
noted, it *has* to be in a web framework, since not every use case for
validation will end up touching the database). For that reason I'd be very
strongly against ever adding even an optional default enforcement of
model-level data validation.

>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAL13Cg90kbJtO50%3D8%3DPeVN0RkXuB5ixP87jv%3DGmZ-JhJ4CU9Uw%40mail.gmail.com.