Re: New external package - is there any duplication

2019-05-21 Thread Adam Johnson
Hi Dan,

I think this has not got much of a response because it might be a bit niche
- it really depends where your teams are struggling to use RunSQL.

Looking for other packages at
https://djangopackages.org/grids/g/database-migration/ , nothing seems to
be related. They are all precursors/replacements for Django Migrations.
Maybe there are other packages somewhere on Django Packages though.

Some responses to your ideas:

1. Rather than subclass RunSQL to load sql files, just embed SQL in Python
with triple quoted strings? The checksumming would be unnecessary then,
presuming team members understand that (applied) migration files are
immutable.

2. You could "sync" a folder of (idemptotent) SQL files with a post_migrate
signal handler. For example, Django syncs the Content Type table with such
a handler:
https://github.com/django/django/blob/master/django/contrib/contenttypes/apps.py#L20
.
You could loop over all the SQL files in a given folder to check and run
them. They could contain custom idempotent SQL such as "CREATE OR REPLACE
FUNCTION" - presuming you're using RunSQL for DDL that Django Migrations
doesn't support. I've seen this approach a couple times for things like
stored procedures, though I'd prefer to use vanilla RunSQL in a migration
myself. At least one can avoid yet another easily forgotten management
command.

Thanks,

Adam



On Fri, 17 May 2019 at 19:08, Dan Davis  wrote:

>
> I am planning an external package to assist my developers in getting their
> special-purpose DDL out of the database and into git.   Some teams can
> handle it, but some teams could use help making sure that they use
> operations.RunSQL well and wisely.
>
> My biggest questions about what I'm doing are:
> * Are there any packages that already attempt to do what I'm thinking of?
> * What gotchas have I not considered?
> * What should it be called?   What should the management command be called?
> * How do I test it well?
>
> Here is a more detailed idea of the packages key contributions (
> https://github.com/danizen/django-sqlextras/):
>
> * Create a migration operation that is a subclass of RunSQL, but takes a
> forward object that consists of a path and an expected checksum, and a
> similar optional backwards checksum.
> * Create a management command that can build migrations using this
> operation (or you can do it yourself, of course).
>
> Feedback is welcome.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/73a56671-3904-4eef-b82a-d04fe175c993%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 
Adam

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAMyDDM0FhWyZrgZRVw1i_OGQ6qMGu8xArL8xwxWZbghC%3Ds6bsg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Proposal: Add additional password validators (auth/password_validation.py)

2019-05-21 Thread Adam Johnson
Hi Brad,

I appreciate no one has replied here for some time, and your ticket was
closed rapidly after your work creating the code. I can imagine it might be
a bit frustrating.

Thanks for trying to improve Django. I see you've submitted some other PR's
since, thanks for that work.


My thoughts on the password validators:


First, I'd like to say there has probably not been much thought on the
issue. The Django fellows, and everyone on this mailing list, have a lot on
their plates. The fellows have probably defaulted to "no" on these feature
request tickets simply to keep Django at a manageable size. We are
conservative in adding new code because it needs maintaining forever (or
however long Django will live!). You're right the API surface area is small
and stable, but new features all add up.

Also the suggestion to create a third party package has a precedent. The
Django ecosystem is maybe 10,000x more code than Django core - core simply
can't contain the long tail. A number of good ideas in Django have actually
been third party packages first, for example Andrew Godwin's South which
evolved into Django Migrations.

Additionally, releasing a package means that users (yourself included!) can
use the features *today*, rather than in 9 months' time when the next
Django release comes out. (...or 5 years time when your project actually
gets upgraded.)


Second, I would like to consider the password validation more closely. When
I think about the topic these days, I point to the NIST guidelines. They
went through a big change in 2016. I saw this reported a couple times in
security media - it was hailed as a forwards step towards more sane
password management across the industry. Here's an article summarizing them
in non-governmental language:
https://nakedsecurity.sophos.com/2016/08/18/nists-new-password-rules-what-you-need-to-know/
 .

in general the guidelines represent a move away from the attitude you
spirit you quoted in your PR: "more validation is almost always better"

I think the guideline against composition rules would be against the first
three validators you suggested in #30442, to quote the article:

No composition rules. What this means is, no more rules that force you to
> use particular characters or combinations, like those daunting conditions
> on some password reset pages that say, “Your password must contain one
> lowercase letter, one uppercase letter, one number, four symbols but not
> &%#@_, and the surname of at least one astronaut.”
>

Additionally, although the article doesn't mention it, I found in the
document NIST.SP.800-63b Appendix A a warning against trying to measure
entropy. This would argue against adding the entropy validator suggestions
in #27568 and #30442:

Complexity of user-chosen passwords has often been characterized using the
> information theory concept of entropy [Shannon]. While entropy can be
> readily calculated for data having deterministic distribution functions,
> estimating the entropy for user-chosen passwords is difficult and past
> efforts to do so have not been particularly accurate. For this reason, a
> different and somewhat simpler approach, based primarily on password
> length, is presented herein.


On the other hand, there is a recommendation to check against databases of
known crackable passwords. This would lead us to the pwned passwords
validator suggested in #30100 as a more powerful version of Django's
built-in CommonPasswordValidator. There are a couple reasons I can see
*not* to add it to Django core though:

   - The validator relies on a third party web service. If the service
   changes after a version of Django goes out of support, the validator will
   stop working and no future release of that version of Django will be issued
   to fix it.
   - There is already a package implementing this validator, by Django core
   team member James Bennett:
   https://pypi.org/project/pwned-passwords-django/ . I think packages
   maintained by James are of a quality only second to Django itself.

That said, whilst it might be "more sane" to move to the NIST guidelines,
Django can adopt some pragmatism. We developers can be forced to implement
validation to fit in with some regulatory guidelines. I think if there was
good evidence that a validator would be required by a large number of
Django projects, I believe it wouldn't be out of place in Django core. On
the other hand, a great way to prove this utility is to make a third party
package that becomes popular :)

Looking at Django Packages, I found only one third party package explicitly
offering password validators:
https://djangopackages.org/packages/p/django-password-validators/  . And
for its name, it only offers one validator at
current, UniquePasswordsValidator . It seems there isn't much state of the
art for password validators right now - but perhaps they're not a
particularly interesting topic for third party package development!


To conclude, I think you asked a number of good questions

Re: New external package - is there any duplication

2019-05-21 Thread Dan Davis
Thank you for the review, Adam.
I like the idea of the post-migrate signal, but it can be complicated, and
sometimes doing things in sequence is important:

With Oracle, if function B runs function A (and so on) then after we run:

CREATE OR REPLACE FUNCTION *A*(...) 


We would need to, in Oracle anyway, run:

ALTER FUNCTION *B *COMPILE REUSE SETTINGS;


Hard then to be an idempotent change then, even with create and replace.

On Tue, May 21, 2019 at 4:48 PM Adam Johnson  wrote:

> Hi Dan,
>
> I think this has not got much of a response because it might be a bit
> niche - it really depends where your teams are struggling to use RunSQL.
>
> Looking for other packages at
> https://djangopackages.org/grids/g/database-migration/ , nothing seems to
> be related. They are all precursors/replacements for Django Migrations.
> Maybe there are other packages somewhere on Django Packages though.
>
> Some responses to your ideas:
>
> 1. Rather than subclass RunSQL to load sql files, just embed SQL in Python
> with triple quoted strings? The checksumming would be unnecessary then,
> presuming team members understand that (applied) migration files are
> immutable.
>
> 2. You could "sync" a folder of (idemptotent) SQL files with a
> post_migrate signal handler. For example, Django syncs the Content Type
> table with such a handler:
> https://github.com/django/django/blob/master/django/contrib/contenttypes/apps.py#L20
>  .
> You could loop over all the SQL files in a given folder to check and run
> them. They could contain custom idempotent SQL such as "CREATE OR REPLACE
> FUNCTION" - presuming you're using RunSQL for DDL that Django Migrations
> doesn't support. I've seen this approach a couple times for things like
> stored procedures, though I'd prefer to use vanilla RunSQL in a migration
> myself. At least one can avoid yet another easily forgotten management
> command.
>
> Thanks,
>
> Adam
>
>
>
> On Fri, 17 May 2019 at 19:08, Dan Davis  wrote:
>
>>
>> I am planning an external package to assist my developers in getting
>> their special-purpose DDL out of the database and into git.   Some teams
>> can handle it, but some teams could use help making sure that they use
>> operations.RunSQL well and wisely.
>>
>> My biggest questions about what I'm doing are:
>> * Are there any packages that already attempt to do what I'm thinking of?
>> * What gotchas have I not considered?
>> * What should it be called?   What should the management command be
>> called?
>> * How do I test it well?
>>
>> Here is a more detailed idea of the packages key contributions (
>> https://github.com/danizen/django-sqlextras/):
>>
>> * Create a migration operation that is a subclass of RunSQL, but takes a
>> forward object that consists of a path and an expected checksum, and a
>> similar optional backwards checksum.
>> * Create a management command that can build migrations using this
>> operation (or you can do it yourself, of course).
>>
>> Feedback is welcome.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Django developers (Contributions to Django itself)" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to django-developers+unsubscr...@googlegroups.com.
>> To post to this group, send email to django-developers@googlegroups.com.
>> Visit this group at https://groups.google.com/group/django-developers.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/django-developers/73a56671-3904-4eef-b82a-d04fe175c993%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> --
> Adam
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/CAMyDDM0FhWyZrgZRVw1i_OGQ6qMGu8xArL8xwxWZbghC%3Ds6bsg%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group a

Re: Proposal: Add additional password validators (auth/password_validation.py)

2019-05-21 Thread Brad Solomon
Thanks Adam--the thoroughness of your response makes me more, rather than
less, interested in contributing further to Django.  And I must admit you
have me beat on just about all points raised.  As a new contributor,
hearing your framing of the consideration that goes into feature requests
was helpful also.

Brad

On Tue, May 21, 2019 at 5:58 PM Adam Johnson  wrote:

> Hi Brad,
>
> I appreciate no one has replied here for some time, and your ticket was
> closed rapidly after your work creating the code. I can imagine it might be
> a bit frustrating.
>
> Thanks for trying to improve Django. I see you've submitted some other
> PR's since, thanks for that work.
>
>
> My thoughts on the password validators:
>
>
> First, I'd like to say there has probably not been much thought on the
> issue. The Django fellows, and everyone on this mailing list, have a lot on
> their plates. The fellows have probably defaulted to "no" on these feature
> request tickets simply to keep Django at a manageable size. We are
> conservative in adding new code because it needs maintaining forever (or
> however long Django will live!). You're right the API surface area is small
> and stable, but new features all add up.
>
> Also the suggestion to create a third party package has a precedent. The
> Django ecosystem is maybe 10,000x more code than Django core - core simply
> can't contain the long tail. A number of good ideas in Django have actually
> been third party packages first, for example Andrew Godwin's South which
> evolved into Django Migrations.
>
> Additionally, releasing a package means that users (yourself included!)
> can use the features *today*, rather than in 9 months' time when the next
> Django release comes out. (...or 5 years time when your project actually
> gets upgraded.)
>
>
> Second, I would like to consider the password validation more closely.
> When I think about the topic these days, I point to the NIST guidelines.
> They went through a big change in 2016. I saw this reported a couple times
> in security media - it was hailed as a forwards step towards more sane
> password management across the industry. Here's an article summarizing them
> in non-governmental language:
> https://nakedsecurity.sophos.com/2016/08/18/nists-new-password-rules-what-you-need-to-know/
>  .
>
> in general the guidelines represent a move away from the attitude you
> spirit you quoted in your PR: "more validation is almost always better"
>
> I think the guideline against composition rules would be against the first
> three validators you suggested in #30442, to quote the article:
>
> No composition rules. What this means is, no more rules that force you to
>> use particular characters or combinations, like those daunting conditions
>> on some password reset pages that say, “Your password must contain one
>> lowercase letter, one uppercase letter, one number, four symbols but not
>> &%#@_, and the surname of at least one astronaut.”
>>
>
> Additionally, although the article doesn't mention it, I found in the
> document NIST.SP.800-63b Appendix A a warning against trying to measure
> entropy. This would argue against adding the entropy validator suggestions
> in #27568 and #30442:
>
> Complexity of user-chosen passwords has often been characterized using the
>> information theory concept of entropy [Shannon]. While entropy can be
>> readily calculated for data having deterministic distribution functions,
>> estimating the entropy for user-chosen passwords is difficult and past
>> efforts to do so have not been particularly accurate. For this reason, a
>> different and somewhat simpler approach, based primarily on password
>> length, is presented herein.
>
>
> On the other hand, there is a recommendation to check against databases of
> known crackable passwords. This would lead us to the pwned passwords
> validator suggested in #30100 as a more powerful version of Django's
> built-in CommonPasswordValidator. There are a couple reasons I can see
> *not* to add it to Django core though:
>
>- The validator relies on a third party web service. If the service
>changes after a version of Django goes out of support, the validator will
>stop working and no future release of that version of Django will be issued
>to fix it.
>- There is already a package implementing this validator, by Django
>core team member James Bennett:
>https://pypi.org/project/pwned-passwords-django/ . I think packages
>maintained by James are of a quality only second to Django itself.
>
> That said, whilst it might be "more sane" to move to the NIST guidelines,
> Django can adopt some pragmatism. We developers can be forced to implement
> validation to fit in with some regulatory guidelines. I think if there was
> good evidence that a validator would be required by a large number of
> Django projects, I believe it wouldn't be out of place in Django core. On
> the other hand, a great way to prove this utility is to make a third party
> pack

Re: Faster Migrations! But at what cost?

2019-05-21 Thread Dan Davis
Migrations are very slow, so I don't run them during test runs, and run
tests with --keepdb even in CI/CD.  This is required for my environment
anyway, because we use Oracle heavily, a "schema" is the same thing as a
"user", and is what I used to think of as a "database" coming from MySQL
and PostgreSQL.

To make migrations testable, I wrote a management command "clearschema"
which can clear the schema of any database alias, or its test analog.  This
"clearschema" command is in a framework app that all the apps at my
workplace include, but I think only I and a couple other projects use it.

Anyway, it seems that if --keepdb is so necessary, that anything we can do
to speed up migrations during tests and during deployment would be
worthwhile, but we have to be very careful to make sure that migrations are
imported once, and that any caching scheme is idempotent - e.g. we can
throw away the cached data.

On Mon, May 20, 2019 at 11:26 PM Raphael Gaschignard 
wrote:

> Hi Markus, Simon,
>
>   Both of you, thank you for the detailed replies and status report on
> this sort of stuff.
>
> > Did you look into squashing these 500 migrations by any chance?
>
> Yeah so I'll go in and squash things, though (partly because effective
> squashing requires moving models around, though we're still at Django
> 1.11 so it might become easier, partly because of cross-app references)
> it's a decent amount of work.
>
> I do like the idea of reusing a "effectively production DB" for things,
> not least because it's actually acurate.
>
> OK so the Operation API is effectively documented. I am tempted to try
> modifying the `database_forwards` and `database_backwards` to not
> require `from_state` (probably through adding a hook that gets called
> _before_ `state_forwards` to allow capturing the small details of the
> old state). It might actually be possible to make this backwards
> compatible by porting Django migrations to use an `apps`-free workflow,
> but falling back to the old "re-render the universe" mechanism for
> operations that don't apply it.
>
> This is very "that xkcd comic about time spent versus time saved"
>
>  > will cause trouble with RunSQL and other operations that use related
> field or model attributes
>
> So one thing I felt like was an invariant in this code was that field
> sharing was expected? From the docstring of ModelState:
>
>  Note that while you are allowed to mutate .fields, you are not allowed
>  to mutate the Field instances inside there themselves - you must
> instead
>  assign new ones, as these are not detached during a clone.
>
> Also maybe you meant to refer to RunPython instead of RunSQL. But I get
> your point in general here. Related models can be a problem
>
> One throwaway idea would be to not allow related model/related field
> access in these models? There's already a lot of documentation related
> to not allowing general model methods (effectively establishing that
> "migration models are _not_ normal models"), so there's a bit of
> precedent. But beyond the general backwards incompatability, it might
> not actually even be obvious how one would implement this. And you kinda
> need this info for foreign keys and the like anyways.
>
>
> Working directly off of `ModelState` is interesting, and I think there
> might be a backwards-compatible way forward there, where we still allow
> for rendering on certain operations but hold off on it on the basic
> ones. Even in our large project, most of our migration operatiosn are
> dirt-simple, so if the core django migrations could work off of
> `ModelState` then we could get a fast path through there.
>
> Thanks for your input, both of you. I have a couple ideas now that I'm
> pretty tempted to try out, mainly around the "fast path and slow path"
> strategies that should offer backwards compatibility.
>
>   Raphael
>
> Markus Holtermann wrote on 2019/05/21 2:26:
> > Thanks Raphael for bringing this topic up and Simon for your input
> already.
> >
> > I just left a note on your PR:
> https://github.com/django/django/pull/11388#issuecomment-494076750 . I'll
> quote it here for ease of readability:
> >
> > As far as I can see right now, a similar caching happened as a first
> approach to the Django 1.8 release but cause significant problems,
> specifically with regards to relational fields. Relational fields
> (ForeignKey, OneToOneField, ManyToManyField) keep an instance reference to
> the related model in `.related_model` or the related fields in
> `.related_fields`. The problem now is, if you reuse a field (and you do
> because you're only calling `list()` on `self.fields` to copy but not
> deepcopy the list), you're screwing up references between the models that
> _will_ cause trouble with `RunSQL` and other operations that use related
> field or model attributes.
> >
> >
> https://github.com/django/django/blob/1d0bab0bfd77edcf1228d45bf654457a8ff1890d/django/db/models/fields/__init__.py#L495-L499
> >
> >  From my work on mig

Re: Faster Migrations! But at what cost?

2019-05-21 Thread Josh Smeaton
Just to add some stats to the conversation, our largest project has ~90 
apps (+3rd party), and 240 migration files. This is after we reset our 
migration history when migrating from 1.8 -> 1.11 (just over a year ago). 
We would have had well in excess of 800 migration files at that point.

To avoid running migrations in unit tests, we define:

class DisableMigrations(object):
def __contains__(self, item):
return True

def __getitem__(self, item):
return None

MIGRATION_MODULES = DisableMigrations()

Which effectively syncdb's rather than applying migrations. Caveat - you 
lose all data/runpython/runsql operations in your test environment.

Our dev/staging setup relies on exporting a sanitised version of 
production, so we only need to run incremental migrations since the last 
backup.



On Tuesday, 21 May 2019 13:26:39 UTC+10, Raphael Gaschignard wrote:
>
> Hi Markus, Simon, 
>
>   Both of you, thank you for the detailed replies and status report on 
> this sort of stuff. 
>
> > Did you look into squashing these 500 migrations by any chance? 
>
> Yeah so I'll go in and squash things, though (partly because effective 
> squashing requires moving models around, though we're still at Django 
> 1.11 so it might become easier, partly because of cross-app references) 
> it's a decent amount of work. 
>
> I do like the idea of reusing a "effectively production DB" for things, 
> not least because it's actually acurate. 
>
> OK so the Operation API is effectively documented. I am tempted to try 
> modifying the `database_forwards` and `database_backwards` to not 
> require `from_state` (probably through adding a hook that gets called 
> _before_ `state_forwards` to allow capturing the small details of the 
> old state). It might actually be possible to make this backwards 
> compatible by porting Django migrations to use an `apps`-free workflow, 
> but falling back to the old "re-render the universe" mechanism for 
> operations that don't apply it. 
>
> This is very "that xkcd comic about time spent versus time saved" 
>
>  > will cause trouble with RunSQL and other operations that use related 
> field or model attributes 
>
> So one thing I felt like was an invariant in this code was that field 
> sharing was expected? From the docstring of ModelState: 
>
>  Note that while you are allowed to mutate .fields, you are not 
> allowed 
>  to mutate the Field instances inside there themselves - you must 
> instead 
>  assign new ones, as these are not detached during a clone. 
>
> Also maybe you meant to refer to RunPython instead of RunSQL. But I get 
> your point in general here. Related models can be a problem 
>
> One throwaway idea would be to not allow related model/related field 
> access in these models? There's already a lot of documentation related 
> to not allowing general model methods (effectively establishing that 
> "migration models are _not_ normal models"), so there's a bit of 
> precedent. But beyond the general backwards incompatability, it might 
> not actually even be obvious how one would implement this. And you kinda 
> need this info for foreign keys and the like anyways. 
>
>
> Working directly off of `ModelState` is interesting, and I think there 
> might be a backwards-compatible way forward there, where we still allow 
> for rendering on certain operations but hold off on it on the basic 
> ones. Even in our large project, most of our migration operatiosn are 
> dirt-simple, so if the core django migrations could work off of 
> `ModelState` then we could get a fast path through there. 
>
> Thanks for your input, both of you. I have a couple ideas now that I'm 
> pretty tempted to try out, mainly around the "fast path and slow path" 
> strategies that should offer backwards compatibility. 
>
>   Raphael 
>
> Markus Holtermann wrote on 2019/05/21 2:26: 
> > Thanks Raphael for bringing this topic up and Simon for your input 
> already. 
> > 
> > I just left a note on your PR: 
> https://github.com/django/django/pull/11388#issuecomment-494076750 . I'll 
> quote it here for ease of readability: 
> > 
> > As far as I can see right now, a similar caching happened as a first 
> approach to the Django 1.8 release but cause significant problems, 
> specifically with regards to relational fields. Relational fields 
> (ForeignKey, OneToOneField, ManyToManyField) keep an instance reference to 
> the related model in `.related_model` or the related fields in 
> `.related_fields`. The problem now is, if you reuse a field (and you do 
> because you're only calling `list()` on `self.fields` to copy but not 
> deepcopy the list), you're screwing up references between the models that 
> _will_ cause trouble with `RunSQL` and other operations that use related 
> field or model attributes. 
> > 
> > 
> https://github.com/django/django/blob/1d0bab0bfd77edcf1228d45bf654457a8ff1890d/django/db/models/fields/__init__.py#L495-L499
>  
> > 
> >  From my work on migrations, I