Hi again! I have some more information about this bug, which really seems to be more a unionfs-bug than nfs-kernel-server. Is it possible to reassign it to unionfs (Sarge, version 1.0.11-1)?
Sorry for the long mail. The crash is: ------------ <1>Unable to handle kernel NULL pointer dereference at virtual address 00000014 printing eip: e12e83e5 *pde = 00000000 Oops: 0000 [#8] PREEMPT Modules linked in: unionfs i830 ipv6 nfsd exportfs lockd sunrpc ide_cd evdev pcspkr floppy parport_pc parport snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd_page_alloc gameport snd_mpu401_uart snd_rawmidi snd_seq_device snd pci_hotplug ehci_hcd uhci_hcd intel_agp agpgart dm_mod capability commoncap rfcomm l2cap hci_vhci hci_usb hci_uart bfusb firmware_class bluetooth i810_audio ac97_codec soundcore e1000 sr_mod cdrom sg ide_scsi 3c59x usbkbd usbcore genrtc ext3 jbd mbcache ide_generic piix ide_disk ide_core sd_mod ata_piix libata scsi_mod unix font vesafb cfbcopyarea cfbimgblt cfbfillrect CPU: 0 EIP: 0060:[<e12e83e5>] Not tainted EFLAGS: 00010246 (2.6.8-2-686) EIP is at unionfs_open+0x225/0x8de0 [unionfs] eax: ce1c7f14 ebx: de28fbec ecx: ce1c7f14 edx: ce1c7f14 esi: 00000000 edi: c558ae60 ebp: d2fbeb00 esp: de28fb2c ds: 007b es: 007b ss: 0068 Process nfsd (pid: 2717, threadinfo=de28e000 task=de4961b0) Stack: 0000000c 000000d0 00000323 00000004 e13ae08b e13b4a00 e13a86a7 e13aa100 00000323 de28fbc8 d82f1d00 de28fc88 00000038 00000038 00000000 00000000 d8120680 de28fc88 ce1c7f14 00000000 00000000 00000000 00000038 d8120680 Call Trace: [<c01556b7>] open_private_file+0xb7/0xd0 [<e0af1845>] get_name+0x95/0x130 [exportfs] [<c016ce07>] d_find_alias+0x27/0x50 [<e0af157c>] find_exported_dentry+0x57c/0x730 [exportfs] [<c016fcc3>] iput+0x63/0x90 [<e0bc72e2>] xprt_destroy+0x42/0x60 [sunrpc] [<e0bc34c9>] rpc_destroy_client+0x69/0xe0 [sunrpc] [<e0bc8aaa>] rpc_release_task+0x12a/0x1b0 [sunrpc] [<e0bc83f0>] __rpc_execute+0x370/0x410 [sunrpc] [<c0139ec1>] __rmqueue+0xd1/0x110 [<c013a345>] buffered_rmqueue+0xf5/0x1d0 [<c013a730>] __alloc_pages+0x310/0x370 [<c013a7c3>] __get_free_pages+0x33/0x40 [<c013e857>] alloc_slabmgmt+0x57/0x70 [<c0156e11>] invalidate_inode_buffers+0x11/0x80 [<c021694b>] sock_destroy_inode+0x1b/0x20 [<c016e8e3>] destroy_inode+0x43/0x70 [<c016fcc3>] iput+0x63/0x90 [<e0bcc85c>] svc_tcp_accept+0x2ec/0x420 [sunrpc] [<e0c23dd7>] exp_find_key+0x87/0xa0 [nfsd] [<e0af1ada>] export_decode_fh+0x5a/0x7a [exportfs] [<e0c1e320>] nfsd_acceptable+0x0/0x120 [nfsd] [<e0c1e64b>] fh_verify+0x20b/0x5a0 [nfsd] [<e0c1e320>] nfsd_acceptable+0x0/0x120 [nfsd] [<e0c2760d>] nfsd3_proc_getattr+0x7d/0xc0 [nfsd] [<e0c1c747>] nfsd_dispatch+0xd7/0x1e0 [nfsd] [<e0c1c670>] nfsd_dispatch+0x0/0x1e0 [nfsd] [<e0bcb451>] svc_process+0x4b1/0x620 [sunrpc] [<e0c1c4b6>] nfsd+0x206/0x3c0 [nfsd] [<e0c1c2b0>] nfsd+0x0/0x3c0 [nfsd] [<c01042ad>] kernel_thread_helper+0x5/0x18 Code: 8b 76 14 89 74 24 34 89 77 04 8b 5c 24 34 8b 84 24 8c 00 00 ----------- i.e., unionfs_open seems to be the culprit. unionfs_open+0x225 disassembled with -S gives: ----------- if (ret) { 37363: 85 f6 test %esi,%esi 37365: 74 7e je 373e5 <unionfs_open+0x225> ... <------ I cut some text here ASSERT2(ret->udi_bend <= ret->udi_bcount); 373c4: 39 d3 cmp %edx,%ebx 373c6: 0f 8f 31 8a 00 00 jg 3fdfd <unionfs_open+0x8c3d> ASSERT2(ret->udi_bend <= sbmax(dent->d_sb)); 373cc: 8b 4c 24 48 mov 0x48(%esp),%ecx 373d0: 8b 41 48 mov 0x48(%ecx),%eax 373d3: 8b 80 4c 01 00 00 mov 0x14c(%eax),%eax 373d9: 8b 40 18 mov 0x18(%eax),%eax 373dc: 40 inc %eax 373dd: 39 c3 cmp %eax,%ebx 373df: 0f 8f c1 89 00 00 jg 3fda6 <unionfs_open+0x8be6> 373e5: 8b 76 14 mov 0x14(%esi),%esi <------ Crash is here 373e8: 89 74 24 34 mov %esi,0x34(%esp) ----------- The module crashes since %esi is 0. I looked in the unionfs-source (file.c) to see if I could spot the place in the source, and it seems that the dbstart()-call ----------- bstart = fbstart(file) = dbstart(dentry); ----------- on line 823 is the problem. Looking at dbstart in unionfs.h::522, it's defined as ----------- #define dbstart(dentry) __dbstart(dentry, __FILE__, __FUNCTION__, __LINE__) static inline int __dbstart(const struct dentry *dentry, const char *file, const char *function, int line) { return dtopd(dentry)->udi_bstart; } ----------- and I'm since dtopd is defined as ----------- #define dtopd(dent) __dtopd(dent, 1, __FILE__, __FUNCTION__, __LINE__) static inline struct unionfs_dentry_info *__dtopd(const struct dentry *dent, int check, const char *file, const char *function, int line) { struct unionfs_dentry_info *ret; PASSERT2(dent); ret = (struct unionfs_dentry_info *)(dent)->d_fsdata; /* We are really only interested in catching poison here. */ if (ret) { PASSERT2(ret); if (check) { if ((ret->udi_bend > ret->udi_bcount) || (ret->udi_bend > sbmax(dent->d_sb))) { printk("udi_bend = %d, udi_count = %d, sbmax = %d\n", ret->udi_bend, ret->udi_bcount, sbmax(dent->d_sb)); } ASSERT2(ret->udi_bend <= ret->udi_bcount); ASSERT2(ret->udi_bend <= sbmax(dent->d_sb)); } } return ret; } ----------- I'm pretty sure that the branch at 37365 in the disassembly is taken, and therefore the dtopd(dentry)->udi_bstart results in a NULL-pointer dereference. So (dent)->d_fsdata is apparently NULL. This happens after a while with a NFS-exported unionfs filesystem both in kernel 2.4.27-2-686 and 2.6.8-2-686 (this is from the latter). Hope this helps. // Simon -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]