The amd64 pmap keeps track of which cpu(s) a given pmap is in use on, so that when the pmap's page tables are modified it can send IPIs to the neccessary processors to have them invalidate those pages in their caches. Unfortunately, there are two bugs in this: 1) the bitmaps used to do that aren't actually initiallized to zero, so they get have the 0xdeadbeef default fill, and 2) the cpu_fork() logic leaves the bit for the cpu that the parent was running on set in the child.
Both of those result in many superfluous IPIs. Fixing those reduces the number of shootpage and shootrange IPIs by a factor of 21 on my 4 CPU laptop, completely eliminating the IPI on over 7/8th of the calls to those functions. The diff below does the initialization, stops setting the bit in pmap_activate unless it's the current proc, and adds a DIAGNOSTIC printf to pmap_destroy() to complain if a pmap's pm_cpus member isn't zero when it's being destroyed. It also eliminates the pm_flags member of the pmap, which is unused. Philip Guenther Index: include/pmap.h =================================================================== RCS file: /cvs/src/sys/arch/amd64/include/pmap.h,v retrieving revision 1.33 diff -u -p -r1.33 pmap.h --- include/pmap.h 13 May 2010 19:27:24 -0000 1.33 +++ include/pmap.h 6 Sep 2010 02:09:42 -0000 @@ -318,8 +318,6 @@ struct pmap { /* pointer to a PTP in our pmap */ struct pmap_statistics pm_stats; /* pmap stats (lck by object lock) */ - int pm_flags; /* see below */ - union descriptor *pm_ldt; /* user-set LDT */ int pm_ldt_len; /* number of LDT entries */ int pm_ldt_sel; /* LDT selector */ Index: amd64/pmap.c =================================================================== RCS file: /cvs/src/sys/arch/amd64/amd64/pmap.c,v retrieving revision 1.55 diff -u -p -r1.55 pmap.c --- amd64/pmap.c 13 May 2010 19:27:24 -0000 1.55 +++ amd64/pmap.c 6 Sep 2010 02:10:24 -0000 @@ -1020,7 +1020,7 @@ pmap_create(void) } pmap->pm_stats.wired_count = 0; pmap->pm_stats.resident_count = 1; /* count the PDP allocd below */ - pmap->pm_flags = 0; + pmap->pm_cpus = 0; /* init the LDT */ pmap->pm_ldt = NULL; @@ -1075,6 +1075,12 @@ pmap_destroy(struct pmap *pmap) * reference count is zero, free pmap resources and then free pmap. */ +#ifdef DIAGNOSTIC + if (pmap->pm_cpus != 0) + printf("pmap_destroy: pmap %p cpus=0x%lx\n", + (void *)pmap, pmap->pm_cpus); +#endif + /* * remove it from global list of pmaps */ @@ -1129,15 +1135,16 @@ pmap_activate(struct proc *p) pcb->pcb_pmap = pmap; pcb->pcb_ldt_sel = pmap->pm_ldt_sel; pcb->pcb_cr3 = pmap->pm_pdirpa; - if (p == curproc) + if (p == curproc) { lcr3(pcb->pcb_cr3); + + /* + * mark the pmap in use by this processor. + */ + x86_atomic_setbits_ul(&pmap->pm_cpus, (1U << cpu_number())); + } if (pcb == curpcb) lldt(pcb->pcb_ldt_sel); - - /* - * mark the pmap in use by this processor. - */ - x86_atomic_setbits_ul(&pmap->pm_cpus, (1U << cpu_number())); } /*