** Description changed: + SRU Justification: + + Impact: For i386 PGDs are stored in a linked list. For this two elements of + struct page are (mis-)used. To have a backwards pointer, the private field is + assigned a pointer to the index field of the previous struct page. The main + problem there was that list_add and list_del operations accidentally were done + twice. Which leads to accesses to (after first list operation) innocent struct + pages. + + Fix: This is a bit more than needed to fix the bug itself, but it will bring our + code more into a shape that resembles upstream (factually there is only a 2.6.18 + upstream but that code did not do the double list access). + + Testcase: Running a 32bit domU (64bit Hardy dom0, though that should not matter) + with the xen kernel and doing a lot of process starts (like the aslr qa + regression test does) would quite soon crash because the destructor of a PTE + (which incidentally is stored in index) was suddenly overwritten. + + --- + For months we have been working around a bug in ami-6836dc01, but this seems not to be reported any place. Is this a known issue? When we use ruby/puppet (from the Canonical repo) on an instance with this AMI (e.g. a c1.medium) or in some cases when using java applications the instance gets locked up. Our work-around is using kernel 2.6.27-22-xen instead - the person who created the fixed AMI used this method: - launch instance of ami-7e28ca17 (instance #1) - modprobe loop on instance #1 - copy up creds, jdk and ec2-ami-tools to /dev/shm on instance #1 - launch instance of ami-69d73000 (canonical-beta-us/ubuntu-intrepid-beta2-20090226-i386.manifest.xml) to grab kernel modules from (instance #2) - tar.gz /lib/modules/2.6.27-22-xen on instance #2 - - scp to instance #1 and untar in /lib/modules + - scp to instance #1 and untar in /lib/modules - rm -rf the old /lib/modules/2.6.24-10-xen dir on instance #1 - edit quick-bundle script on instance #1 to hard-code AKI to aki-20c12649, ARI to ari-21c12648 (the AKI and ARI from instance #2). - - hard-coded manifest name, bucket to whatever. + - hard-coded manifest name, bucket to whatever. - run pre-clean script on instance #1 - run quick-bundle script on instance #1 - The console output from a locked instance is attached
** Description changed: SRU Justification: - Impact: For i386 PGDs are stored in a linked list. For this two elements of - struct page are (mis-)used. To have a backwards pointer, the private field is - assigned a pointer to the index field of the previous struct page. The main - problem there was that list_add and list_del operations accidentally were done - twice. Which leads to accesses to (after first list operation) innocent struct - pages. + Impact: For i386 PGDs are stored in a linked list. For this two elements + of struct page are (mis-)used. To have a backwards pointer, the private + field is assigned a pointer to the index field of the previous struct + page. The main problem there was that list_add and list_del operations + accidentally were done twice. Which leads to accesses to (after first + list operation) innocent struct pages. - Fix: This is a bit more than needed to fix the bug itself, but it will bring our - code more into a shape that resembles upstream (factually there is only a 2.6.18 - upstream but that code did not do the double list access). + Fix: This is a bit more than needed to fix the bug itself, but it will + bring our code more into a shape that resembles upstream (factually + there is only a 2.6.18 upstream but that code did not do the double list + access). - Testcase: Running a 32bit domU (64bit Hardy dom0, though that should not matter) - with the xen kernel and doing a lot of process starts (like the aslr qa - regression test does) would quite soon crash because the destructor of a PTE - (which incidentally is stored in index) was suddenly overwritten. + Testcase: Running a 32bit domU (64bit Hardy dom0, though that should not + matter) with the xen kernel and doing a lot of process starts (like the + aslr qa regression test does) would quite soon crash because the + destructor of a PTE (which incidentally is stored in index) was suddenly + overwritten. --- For months we have been working around a bug in ami-6836dc01, but this seems not to be reported any place. Is this a known issue? When we use ruby/puppet (from the Canonical repo) on an instance with this AMI (e.g. a c1.medium) or in some cases when using java applications the instance gets locked up. Our work-around is using kernel 2.6.27-22-xen instead - the person who created the fixed AMI used this method: - launch instance of ami-7e28ca17 (instance #1) - modprobe loop on instance #1 - copy up creds, jdk and ec2-ami-tools to /dev/shm on instance #1 - launch instance of ami-69d73000 (canonical-beta-us/ubuntu-intrepid-beta2-20090226-i386.manifest.xml) to grab kernel modules from (instance #2) - tar.gz /lib/modules/2.6.27-22-xen on instance #2 - scp to instance #1 and untar in /lib/modules - rm -rf the old /lib/modules/2.6.24-10-xen dir on instance #1 - edit quick-bundle script on instance #1 to hard-code AKI to aki-20c12649, ARI to ari-21c12648 (the AKI and ARI from instance #2). - hard-coded manifest name, bucket to whatever. - run pre-clean script on instance #1 - run quick-bundle script on instance #1 The console output from a locked instance is attached -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/705562 Title: ami-6836dc01 8.04 32 bit AMI kernel lock bug To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/705562/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs