Re: Model-level validation

2022-10-06 Thread James Bennett
I see a lot of people mentioning that other ORMs do validation, but not
picking up on a key difference:

Many ORMs are designed as standalone packages. For example, in Python
SQLAlchemy is a standalone DB/ORM package, and other languages have similar
popular ORMs.

But Django's ORM isn't standalone. It's tightly integrated into Django, and
Django is a web framework. And once you focus *specifically* on the web
framework use case, suddenly things start going differently.

For example: data on the web is "stringly-typed" (effectively, since HTTP
doesn't really have data types) and comes in via HTML's form mechanism or
other string-y formats like JSON or XML payloads. So you need not just data
*validation*, but data *conversion* which works for the web use case.

And since the web use case inevitably involves supporting forms/payloads
that don't persist to a relational data store -- think of, for example, a
contact form that sends an email, or forms that store their results
client-side for things like language or theme preferences -- you inevitably
end up needing to do data conversion and validation *independently of the
ORM*.

And at that point, you have to start asking tough questions about whether
it's worth having *two* conversion and validation layers, just because
"every other ORM has this, so we have to put one in the ORM".

Which basically is where Django is. Yes, there are utilities to do your
data conversion and validation in the ORM layer if you want to. But Django
is, first and foremost, a web framework, which needs to support the web use
case I've described above, and so its primary conversion/validation layer
can never be the ORM.

Personally, I wish model-level validation had never been added even as an
option, because in a web framework like Django it's conceptually the wrong
place to put the validation logic. Though that battle was lost many years
ago, I'd be *strongly* against trying to expand it or start forcing the ORM
to default to doing validation work that, in Django, properly belongs to
the forms layer (or to serializers if you use DRF).

So: Django ships with ModelForm, which does the hard work of auto-deriving
as much validation logic as possible from your model definition so you
don't have to repeat it. DRF ships with ModelSerializer, which does the
same thing for its validation/conversion layer. I would strongly urge
people to use them. Trying to force all that validation back into the model
layer misses the bigger picture of what Django is and how it works.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAL13Cg9KHxksNOAVhcOQWS80%2BP5wJbE48V-Z17h15n-krfUVcA%40mail.gmail.com.


Re: Model-level validation

2022-10-06 Thread Aaron Smith
Uri - that's a great upgrade path (or should I say, non-upgrade path). 
Agree with `VALIDATE_MODELS_BY_DEFAULT`.

Rails also skips validations for some operations, like `update_column`, but 
they are prominently marked to use with caution, and the other ORMs i've 
used follow a similar pattern. bulk_create sounds like there's legitimate 
reason to not validate everything, seems reasonable to exclude it so long 
as there's a prominent "use with caution" statement in the docs.
On Wednesday, October 5, 2022 at 8:35:36 PM UTC-7 Uri wrote:

>
> אורי
> u...@speedy.net
>
>
> On Thu, Oct 6, 2022 at 6:11 AM Aaron Smith  wrote:
>
>> It sounds like there is little support for this being the default. But 
>> I'd like to propose something that might satisfy the different concerns:
>>
>> 1) A `validate` kwarg for `save()`, defaulted to `False`. This maintains 
>> backwards compatibility and also moves the validation behavior users coming 
>> to Django from other frameworks likely expect, in a more user friendly way 
>> than overriding save to call `full_clean()`.
>>
>> And/or...
>>
>> 2) An optional Django setting (`VALIDATE_MODELS_DEFAULT`?) to change the 
>> default behavior to `True`. The `validate` kwarg above would override this 
>> per call, allowing unvalidated saves when necessary.
>>
>> These changes would be simple, backwards compatible, and give individual 
>> projects the choice to make Django behave like other ORMs with regard to 
>> validation. This being the Django developers mailing list I should not be 
>> surprised that most people here support the status quo, but in my personal 
>> experience, having had this conversation with dozens of coworkers over the 
>> years - 100% of them expressed a strong desire for Django to do this 
>> differently.
>>
>
> +1
>
> I would suggest having a setting "VALIDATE_MODELS_BY_DEFAULT", which is 
> true or false (true by default), whether to call full_clean() on save(), 
> with an option to call it with "validate=True" or "validate=False" to 
> override this default. Maybe also allow changing the default for specific 
> models.
>
> This is similar to forms that have `def save(self, commit=True):`, and you 
> can call them with "commit=True" or "commit=False" to save or not save the 
> results to the database. I also suggest that VALIDATE_MODELS_BY_DEFAULT 
> will be true by default from some specific future version of Django, so 
> that if users don't want it, they will have to manually set it to false.
>
> We should still remember that there are bulk actions such as bulk_create() 
> or update(), that bypass save() completely, so we have to decide how to 
> handle them if we want our data to be always validated.
>
> Uri Rodberg, Speedy Net.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/4f51341e-bc60-4675-a749-0c60dd9108fbn%40googlegroups.com.


Re: Model-level validation

2022-10-06 Thread Aaron Smith
James - The problem with moving validation up the stack, i.e. to logical 
branches from Model (Form, Serializer) is that you must duplicate 
validation logic if your data comes from multiple sources or domains (web 
forms *and* API endpoints *and* CSVs polled from S3. Duplication leads to 
divergence leads to horrible data integrity bugs and no amount of test 
coverage can guarantee safety. Even if you consider Django to be "only a 
web framework" I would still argue that validation should be centralized in 
the data storage layer. Validity is a core property of data. Serialization 
and conversion changes between sources and is a different concern than 
validation.

On Thursday, October 6, 2022 at 12:47:19 AM UTC-7 James Bennett wrote:

> I see a lot of people mentioning that other ORMs do validation, but not 
> picking up on a key difference:
>
> Many ORMs are designed as standalone packages. For example, in Python 
> SQLAlchemy is a standalone DB/ORM package, and other languages have similar 
> popular ORMs.
>
> But Django's ORM isn't standalone. It's tightly integrated into Django, 
> and Django is a web framework. And once you focus *specifically* on the web 
> framework use case, suddenly things start going differently.
>
> For example: data on the web is "stringly-typed" (effectively, since HTTP 
> doesn't really have data types) and comes in via HTML's form mechanism or 
> other string-y formats like JSON or XML payloads. So you need not just data 
> *validation*, but data *conversion* which works for the web use case.
>
> And since the web use case inevitably involves supporting forms/payloads 
> that don't persist to a relational data store -- think of, for example, a 
> contact form that sends an email, or forms that store their results 
> client-side for things like language or theme preferences -- you inevitably 
> end up needing to do data conversion and validation *independently of the 
> ORM*.
>
> And at that point, you have to start asking tough questions about whether 
> it's worth having *two* conversion and validation layers, just because 
> "every other ORM has this, so we have to put one in the ORM".
>
> Which basically is where Django is. Yes, there are utilities to do your 
> data conversion and validation in the ORM layer if you want to. But Django 
> is, first and foremost, a web framework, which needs to support the web use 
> case I've described above, and so its primary conversion/validation layer 
> can never be the ORM.
>
> Personally, I wish model-level validation had never been added even as an 
> option, because in a web framework like Django it's conceptually the wrong 
> place to put the validation logic. Though that battle was lost many years 
> ago, I'd be *strongly* against trying to expand it or start forcing the ORM 
> to default to doing validation work that, in Django, properly belongs to 
> the forms layer (or to serializers if you use DRF).
>
> So: Django ships with ModelForm, which does the hard work of auto-deriving 
> as much validation logic as possible from your model definition so you 
> don't have to repeat it. DRF ships with ModelSerializer, which does the 
> same thing for its validation/conversion layer. I would strongly urge 
> people to use them. Trying to force all that validation back into the model 
> layer misses the bigger picture of what Django is and how it works.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/90d6d795-b68b-41fe-aacc-b36281755d2en%40googlegroups.com.


Re: Model-level validation

2022-10-06 Thread James Bennett
On Thu, Oct 6, 2022 at 9:00 AM Aaron Smith  wrote:

> James - The problem with moving validation up the stack, i.e. to logical
> branches from Model (Form, Serializer) is that you must duplicate
> validation logic if your data comes from multiple sources or domains (web
> forms *and* API endpoints *and* CSVs polled from S3. Duplication leads to
> divergence leads to horrible data integrity bugs and no amount of test
> coverage can guarantee safety. Even if you consider Django to be "only a
> web framework" I would still argue that validation should be centralized in
> the data storage layer. Validity is a core property of data. Serialization
> and conversion changes between sources and is a different concern than
> validation.
>

I would flip this around and point out that the duplication comes from
seeing the existing data conversion/validation layer and deciding not to
use it.

There's nothing that requires you to pass in an HttpRequest instance to use
a form or a serializer -- you can throw a dict of data from any source into
one and have it convert/validate for you.  Those APIs are also designed to
be easy to check and easy to return useful error messages from on failed
validation, while a model's save() has no option other than to throw an
exception at you and demand you parse the details out of it (because it was
designed as part of an overall web framework that already had the
validation layer elsewhere).

So I would argue, once again, that the solution to your problem is to use
the existing data conversion/validation utilities (forms or serializers)
regardless of the source of the data. If you refuse to, I don't think
that's Django's problem to solve.

>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAL13Cg-fB_hMSDz3_Ox8svEiqX%3DhGHxnTLFAkyT55M2NKgGYzg%40mail.gmail.com.


Re: Model-level validation

2022-10-06 Thread Aaron Smith
James - to clarify, the duplication I was referring to is having both Forms 
and Serializers do validation. I often work with web apps where data for 
the same model can arrive via user input, serializer, or created in some 
backend process e.g. Celery. If forms/serializers are your validation 
layer, you need to duplicate it and worry about how to keep them from 
diverging over time as there's no single source of truth. I also don't 
relish the thought of needing to use a Form or Serializer every time I 
alter a Model's data.

Perhaps we think about validation differently. I consider it to be critical 
to maintain complex systems with any kind of confidence, any time data is 
being created or changed, regardless of where that change comes from. Bugs 
can happen anywhere and validation is the best (only?) option to prevent 
data-related bugs.
On Thursday, October 6, 2022 at 12:03:28 PM UTC-7 James Bennett wrote:

> On Thu, Oct 6, 2022 at 9:00 AM Aaron Smith  wrote:
>
>> James - The problem with moving validation up the stack, i.e. to logical 
>> branches from Model (Form, Serializer) is that you must duplicate 
>> validation logic if your data comes from multiple sources or domains (web 
>> forms *and* API endpoints *and* CSVs polled from S3. Duplication leads 
>> to divergence leads to horrible data integrity bugs and no amount of test 
>> coverage can guarantee safety. Even if you consider Django to be "only a 
>> web framework" I would still argue that validation should be centralized in 
>> the data storage layer. Validity is a core property of data. Serialization 
>> and conversion changes between sources and is a different concern than 
>> validation.
>>
>
> I would flip this around and point out that the duplication comes from 
> seeing the existing data conversion/validation layer and deciding not to 
> use it.
>
> There's nothing that requires you to pass in an HttpRequest instance to 
> use a form or a serializer -- you can throw a dict of data from any source 
> into one and have it convert/validate for you.  Those APIs are also 
> designed to be easy to check and easy to return useful error messages from 
> on failed validation, while a model's save() has no option other than to 
> throw an exception at you and demand you parse the details out of it 
> (because it was designed as part of an overall web framework that already 
> had the validation layer elsewhere).
>
> So I would argue, once again, that the solution to your problem is to use 
> the existing data conversion/validation utilities (forms or serializers) 
> regardless of the source of the data. If you refuse to, I don't think 
> that's Django's problem to solve.
>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/458d7bbd-b542-4e9a-ab62-91afdfe4b78fn%40googlegroups.com.