Hello Gavin Yes I tested the fix suggested by Dimitri, and it didn't work. The system hung during boot.
After doing this test, I did more testings in other directions. - First, I searched the build that introduced the regression. I tested builds 55, 87, that worked fine. And all build after snv_87 fail to start. I tested a precompiled kernel (bfu archive) in debug mode. It failed with this assertion message : panic[cpu0]/thread=fffffffffbc3ae20: assertion failed: PFN_2_MEM_NODE(pp->p_pagenum) == mnode, file: ../../common/vm/vm_pagelist.c, line: 2922 Looking at vm_pagelist.c near line 2922 made me think of a kind of kernel memory corruption. - I then tried to isolate the changeset that broke the boot in b88. So I took the Full html changelog of the build 88, and selected a short list of changes that could break the boot. I selected 6 changesets. I built 6 kernel based on snv_87 with the addition of each candidate changeset (separately), and tested each. The 3rd changeset (that I listed above in the thread) was the one that broke the boot. - I then did more tests to confirm that discovery : * When I bfu a snv87 precompiled archive. The system boots and works without problem. * When I bfu a snv88 precompiled archive. The system fails to boot. * When I compile a snv88, and bfu the result. the server fails to boot. * When I compile a snv87, and bfu the result the server works fine. * When I compile a snv87 plus the above changeset. the bfu'ed server fails to boot. * When I compile a snv88 minus the above changeset, the bfu'ed server works fine. - Then I started the kernel in debug mode, and tried to trace the execution. I noticed that when the function lgrp_plat_process_srat was called, the kernel crash (much later in the boot process). I then tried to modify the lgrpplat.c to bypass this call. This modification was tested on build 91 and 99 and 103 and worked fine. I can send you a full debug log if you want. I know this is not a fix. But at least, the system boots. I have to look futher into the lgrp.c code to see the impact of not running lgrp_plat_process_srat, and why it break the vm subsystem later in the boot process. Thanks Guy -- This message posted from opensolaris.org _______________________________________________ opensolaris-code mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/opensolaris-code
