#18392: Make MySQL backend default to utf8mb4 encoding
-------------------------------------+-------------------------------------
     Reporter:  EmilStenstrom        |                    Owner:  nobody
         Type:  New feature          |                   Status:  new
    Component:  Database layer       |                  Version:  1.4
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:
     Keywords:  utf8mb4 mysql        |             Triage Stage:  Accepted
    Has patch:  1                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  1
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------

Comment (by Nick Pope):

 Replying to [comment:48 lambdaq]:
 > Lots of stuff had changed since the issue was first open ''nine'' years
 ago.

 Yes, it has. But it is still not necessarily straightforward.

 > 1. In v5.7, innodb indexes no longer limits the 767 bytes hardcap on
 utf8mb4 indexes.
 >
 
https://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-7.html#mysqld-5-7-7-feature
 > > When innodb_file_format is set to Barracuda, innodb_large_prefix=ON
 allows index key prefixes longer than 767 bytes (up to 3072 bytes) for
 tables that use a Compressed or Dynamic row format.

 Based on your linked release notes, MySQL 5.7.7 changed the ''defaults''
 of some settings to the following:

 {{{
 innodb_file_format=Barracuda  # Previous default was Antelope
 innodb_large_prefix=ON        # Previous default was OFF
 }}}

 These allow indexing strings up to 768 characters instead of 191
 characters for utf8mb4 which should eliminate the problem of . I don't
 think we need to worry about that new upper limit being an issue as 255
 was the previous cap anyway with utf8 (a.k.a. utf8mb3).

 It should be noted that these options are also deprecated as of that
 release and removed in
 
[https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-0.html#mysqld-8-0-0-feature
 8.0.0]. The release notes also state that using non-default values for the
 above settings in MySQL 5.7.7+ will log a deprecation warning.

 The underlying problem, however, is what
 [https://dev.mysql.com/doc/refman/5.7/en/innodb-row-format.html row
 format] is used. It is necessary for the row format of a table to be
 `COMPRESSED` or `DYNAMIC` for large key index support. These were
 unavailable with the previous default file format configuration (Antelope)
 and only `REDUNDANT` and `COMPACT` could be used.

 Note that even in MySQL 8.0.0+ all of these
 [https://dev.mysql.com/doc/refman/8.0/en/innodb-row-format.html row
 formats] can be used, __will not be changed automatically during upgrade__
 and thus the migration of existing projects from utf8mb3 → utf8mb4 (with
 255 → 191 characters) is still a potential problem.

 That said, the default value of [https://dev.mysql.com/doc/refman/5.7/en
 /innodb-parameters.html#sysvar_innodb_default_row_format
 innodb_default_row_format] added in MySQL 5.7.9 is `DYNAMIC`. (See
 [https://dev.mysql.com/doc/refman/5.7/en/innodb-row-format.html#innodb-
 row-format-defining here] also.) Note that in 5.6
 [https://dev.mysql.com/doc/refman/5.6/en/innodb-row-format.html#innodb-
 row-format-defining the default value] was `COMPACT`.

 I found the following [https://dev.mysql.com/doc/refman/5.7/en/upgrading-
 from-previous-series.html#upgrade-innodb-changes upgrade details] which
 state:

 > In MySQL 5.7.9, `DYNAMIC` replaces `COMPACT` as the implicit default row
 format for InnoDB tables. A new configuration option,
 `innodb_default_row_format`, specifies the default InnoDB row format.
 Permitted values include `DYNAMIC` (the default), `COMPACT`, and
 `REDUNDANT`.
 >
 > After upgrading to 5.7.9, any new tables that you create use the row
 format defined by `innodb_default_row_format` unless you explicitly define
 a row format (`ROW_FORMAT`).
 >
 > For existing tables that do not explicitly define a `ROW_FORMAT` option
 or that use `ROW_FORMAT=DEFAULT`, any operation that rebuilds a table also
 silently changes the row format of the table to the format defined by
 `innodb_default_row_format`. Otherwise, existing tables retain their
 current row format setting. For more information, see Defining the Row
 Format of a Table.

 So maybe things are not as bad as they seem after all.

 > 2. MySQL 8 will default use db-wide utf8mb4
 >
 >
 
https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-1.html#mysqld-8-0-1-charset
 > > Important Change: The default character set has changed from latin1 to
 utf8mb4. These system variables are affected:
 >
 >
 > IMHO Django's decision to hack MySQL's default settings to less
 supported utf8mb3 (aka the utf8) would be unwise. Maybe it was a proper
 compromise ''9 years ago'', but it will be a liability in the years to
 come.
 >
 > Maybe at least we can add some notes to alarm the readers?
 >
 >
 > CREATE DATABASE <dbname> CHARACTER SET utf8;
 >
 > This would be a huge misleading mistake on official Django doc.
 >
 > @felixxm @pope1ni

 It would be nice to sort this out, not that I'm volunteering. You should
 reignite the discussion on the DevelopersMailingList if you want to take
 this on.

 Given how things have changed maybe there is an easy path forward now and
 we could consider the following:

 - Require MySQL 5.7.9+. We can't drop 5.7 entirely as it is supported
 until October 2023. See SupportedDatabaseVersions.
 - Add a system check to ensure that the configuration options are set to
 expected values:
   {{{
   # For MySQL < 8.0.0:
   innodb_file_format=Barracuda
   innodb_large_prefix=ON
   # For all versions:
   innodb_default_row_format=DYNAMIC
   }}}
 - Add a system check that the `ROW_FORMAT` of all tables is `DEFAULT`,
 `DYNAMIC`, or `COMPRESSED`. (See [https://dev.mysql.com/doc/refman/5.7/en
 /innodb-row-format.html#innodb-row-format-detrmining here].)
 - Change the documentation and connection configuration to use `utf8mb4`.
 - Add plenty of warnings into the release notes detailing the restrictions
 and how to fix any issues in preparation for upgrading Django.

 One other question is how this all affects MariaDB which is supported by
 `django.db.backends.mysql`. I haven't looked into that.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:49>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/071.cc3a2b225d62a1f17b945bbf3541deb2%40djangoproject.com.

Reply via email to