Hello, 2010/8/30 Diego Nieto Cid <dnie...@gmail.com> > > Short story: something is clearing kernel_page_dir. >
Here's some information about this issue. After modifing qemu to generate an instruction trace and stop when a physical memory location is modified, I've found that interrupts are being nested too deeply. Thus the stack used by their handlers starts to grow until it oveflows and other variables are overwritten. When this happens, the first variable before _intstack (it grows from bottom to top) is destroyed and it eventually reaches kernel_page_dir which is the variable I was watching. gnumach symbols: 00188828 B net_kmsg_size 0018882c B active_stacks 00188830 B kernel_page_dir 00188840 B ipc_notify_msg_accepted_template 00188860 B virtual_space_end 00188880 B proc_net_inode_operations 001888d0 B _intstack 001898d0 B sched_thread_id Intruction (back)trace: 0xc010665a - 0xc010664d * gnumach-uld-HEAD/i386/i386/spl.S:214-212 0xc0106646 * gnumach-uld-HEAD/i386/i386/spl.S:210 0xc0134bba - 0xc0134bb1 * gnumach-uld-HEAD/i386/i386at/interrupt.S:39-37 0xc0134baf - 0xc0134ba4 * gnumach-uld-HEAD/i386/i386at/interrupt.S:35-30 0xc01054cc * gnumach-uld-HEAD/i386/i386/locore.S:707 0xc0105486 - 0xc010547f * gnumach-uld-HEAD/i386/i386/locore.S:648-647 0xc010547e - 0xc010547c * gnumach-uld-HEAD/i386/i386/locore.S:645-643 0xc0105446 - 0xc0105440 * gnumach-uld-HEAD/i386/i386/locore.S:630 0xc0106668 - 0xc010664d * gnumach-uld-HEAD/i386/i386/spl.S:216-212 0xc0106646 * gnumach-uld-HEAD/i386/i386/spl.S:210 0xc0134bba - 0xc0134bb1 * gnumach-uld-HEAD/i386/i386at/interrupt.S:39-37 0xc0134bad - 0xc0134ba4 * gnumach-uld-HEAD/i386/i386at/interrupt.S:34-30 0xc01054cc * gnumach-uld-HEAD/i386/i386/locore.S:707 0xc0105486 - 0xc010547f * gnumach-uld-HEAD/i386/i386/locore.S:648-647 0xc010547e - 0xc010547c * gnumach-uld-HEAD/i386/i386/locore.S:645-643 0xc0105416 - 0xc0105410 * gnumach-uld-HEAD/i386/i386/locore.S:626 (* means I assumed these addresses were owned by the kernel and substracted 0xc0000000 before giving them to addr2line.) It goes on alternating these two blocks. The first is the handler for interrupt 11 (assigned to the ethernet card) and the second is interrupt 7. Next, dde_pcnet32 appears: 0x804b4cd ??:0 ( dde_pcnet32::outw_local ) ... 0x804b527 ??:0 ( dde_pcnet32::outw ) ... 0x804b4cd ??:0 ( dde_pcnet32::outw_local ) ... 0x804b507 ??:0 ( dde_pcnet32::outw ) ... 0x804bf83 ??:0 ( dde_pcnet32::pcnet32_wio_write_csr ) ... 0x804fb68 ??:0 ( dde_pcnet32::pcnet32_open ) pcnet32_wio_write_csr call is located at the following line: dde_pcnet32/pcnet32.c:2298 lp->a.write_csr(ioaddr, CSR0, CSR0_NORMAL); Writting CSR0_NORMAL to pcnet's CSR0 register sets the 'interruption enabled' bit allowing the hardware to raise interruptions. So, why would such an amount of interruptions be triggered? Their alternating pattern is also interesting. What's interrupt 7 and how is it related to 11? The full trace, memory dumps and qemu's patch may be downloaded from: ( 12M, but it expands to around 495M ) http://web.fi.uba.ar/~mnieto/_borrar/archhurd/nested_interrupts.tar.xz