On Jun 27, 2011, at 5:44 PM, Michael Blume wrote: > I see a variable saved_objects being written, but I don't see it being > accessed -- is this to ease future features, or am I missing a code path?
Thanks good catch. This was a remnant from an earlier iteration of this patch, in which I tried (unsuccessfully) to call save() on any objects created during the fixture load after foreign key checks had been re-enabled, in hopes that the UPDATE call this generated would trigger a constraint check. It did not, because MySQL ignores UPDATE if the data being set matches what's already in the DB. I have some other changes I'm making in response to some of Jacob's comments on the ticket, so this will be gone in the next version of the patch. > If I'm reading correctly, check_for_invalid_foreign_keys extends over all the > rows in a table. loaddata is called by syncdb and South's migrate, not just > when a db is set up, so this could easily wind up run over lots and lots of > non-fixture data. I don't know MySQL's performance characteristics that well > -- is this likely to be expensive? This is a good question. First, though it's true that loaddata is called in several places other than during tests, I would imagine it is rarely if ever called during a normal request/response cycle in production. I'm operating under the assumption that the performance of this patch is important but not as critical as it would be if this was a command used in production. So that caveat aside, I will try later to find a large data set I can run the query on to get an idea of what kind of performance hit the check entails. If anyone else has a large data set with two related tables, you can try it out as well and report your results. The basic structure of the SQL statement being run is as follows: SELECT REFERRING.`id`, REFERRING.`id_pointing_to_some_related_table ` FROM `some_table` as REFERRING LEFT JOIN `some_related_table` as REFERRED ON (REFERRING.`id_pointing_to_some_related_table` = REFERRED.`id`) WHERE REFERRING.`id_pointing_to_some_related_table ` IS NOT NULL AND REFERRED.`id` IS NULL And keep in mind this is being called for *each* relation in the table. So that'll run through all the rows once for every relation. Of course, it won't return anything unless you have bad data, so the i/o aspect should be minimal. If we notice an "unacceptable" performance hit the likely solution would be to scope that select statement to only rows that were added during the load data routine. Ideally I'm hoping to steer clear of that, because it entails more work and complexity in loaddata, since you'd have to track all the IDs that were added and then pass them along here. Wouldn't be a big deal but that's why I only wanted to do it if necessary. Thanks for your input and help testing this btw! Jim -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.