Bugzilla – Bug 12313
SXCE b125 installaton hangs on Supermicro Motherboard: X8DTi-F.
Last modified: 2009-11-05 02:22:54 UTC
You need to log in before you can comment on or make changes to this bug.
I'm trying to install OpenSolaris build 125 on Supermicro Xeon E5504 Server. But the installation hangs, CD drives stops after grub menu shows. The last line I can see is the below. installing busra, module id 38. First the CPU is Nehalem, so I tried work around introduced in Bug ID 6874223. Work Around Boot with '-kd', then do: [0]> ::bp pcplusmp`_init [0]> : [0]> x2apic_enable/W 0 [0]> :c But After setting the break point at pcpplusmp _init, I tried to continue, but hangs. I boot with -kdv options, and set module debug mask(moddebug /W 80000000). From the above approach, the stack point seems to be inside impl_bus_initialprobe function. impl_bus_initialprobe+0x47 * After step out this point, the installation gets stuck. 2647 static void 2648 impl_bus_initialprobe(void) 2649 { 2650 struct bus_probe *probe; 2651 2652 /* load modules to install bus probes */ 2653 #if defined(__xpv) 2654 if (DOMAIN_IS_INITDOMAIN(xen_info)) { 2655 if (modload("misc", "pci_autoconfig") < 0) { 2656 panic("failed to load misc/pci_autoconfig"); 2657 } 2658 2659 if (modload("drv", "isa") < 0) ★ 2660 panic("failed to load drv/isa"); 2661 } 2662 2663 (void) modload("misc", "xpv_autoconfig"); 2664 #else 2665 (void) modload("misc", "acpidev"); 2666 2667 if (modload("misc", "pci_autoconfig") < 0) { 2668 panic("failed to load misc/pci_autoconfig"); 2669 } 2670 2671 if (modload("drv", "isa") < 0) 2672 panic("failed to load drv/isa"); 2673 #endif 2674 2675 probe = bus_probes; 2676 while (probe) { 2677 /* run the probe functions */ 2678 (*probe->probe)(0); <--- Here, it seems to be stuck. 2679 probe = probe->next; 2680 } 2681 } ------- System Details ----------- Motherboard: X8DTi-F Case: CSE-846A-R1200B CPU: Xeon-E5504 2.00G/4MB Memory: DDRIII 1333 2GB ECC/REG 128*8 "F" x6 (Total 12GB) HDD: HDT721010SLA360 (1TB SATA300 7200) x24 (Total 24TB) Raid Board: Adaptec ASR-52445 KIT
pls have a try to boot with: -B acpidev-autoconfig=off -kdv [0]> prom_debug/W 1 [0]> :c
-B acpidev-autoconfig=off -kdv [0]> acpidev-autoconfig/W 1 [0]> :c After booting unix kernel with the acpidev-autoconfig=off, I tried to set acpidev-autoconfig value, but I can not. [0]> moddebug/W 80000000 moddebug: 0 = 0x80000000 [0]> ::bp acpidev`acpidev_boot_probe [0]> ::bp isa`_init [0]> acpi-autoconfig/W 1 kmdb: failed to dereference symbol: unknown symbol name [0]> By doing step over, I confirmed that installation does not hang inside acpidev`acpidev_boot_probe, and acpidev_initialize functions, so I can not be sure acpi device has somethint wrong for this problem. isa`_init can be stepped out, too. So I guess isa driver also can not be a issue. # Sequence impl_bus_initialprobe 2) acpidev`_init -> OK 3) isa`_init -> OK 4) acpidev`acpi_dev_initialize -> OK 5) acpidev`acpidev_boot_probe -> OK When stepping out of impl_bus_initialize, the installation hangs. So I guess the trigger is inside it, but I can not be sure as to trace this further. Advise how to proceed would be highly appreciated. /* * impl_bus_initialprobe * Modload the prom simulator, then let it probe to verify existence * and type of PCI support. */ static void impl_bus_initialprobe(void) { [...] probe = bus_probes; while (probe) { /* run the probe functions */ (*probe->probe)(0); probe = probe->next; } }
I tried -B pci-reprog=off, but it can not resolve.
we can try below breakpoints to see in which bus probe function it hangs: [0]> ::bp acpidev`acpidev_boot_probe [0]> ::bp acpica`acpica_init [0]> ::bp pci_autoconfig`pci_enumerate [0]> ::bp isa`isa_enumerate [0]> e_ddi_instance_init:b Then you can further narrow down where it hangs inside the probe function.
Created an attachment (id=2928) [details] First breakpoint log
Created an attachment (id=2929) [details] Second breakpoint log
Created an attachment (id=2930) [details] Third breakpoint log
Thank you appreciate your advise. I coud progress the analysis further. To be temporarily conclued, in enumerate_bus_devs functions, execution seems not to exit the below loop, and it causes installation hang. void enumerate_bus_devs(uchar_t bus, int config_op) { [...] while (par_bus != (uchar_t)-1) { [...] } } The analysis which I did is the below, First, I set up break points as you could advise. it revealed that ths stuck is in pci_autoconfig`pci_enumerate. Please see first_breakpoints.log Next I set the below breakpoints. [0]> ::bp pci_autoconfig`pci_enumerate [0]> :c [0]> ::bp pci_autoconfig`pci_setup_tree [0]> :c [0]> ::bp pci_autoconfig`enumerate_bus_devs [0]> pci_boot_debug/W 0x1 [0]> :c [0]> :$<msgbuf [0]>::bp pci_autoconfig`process_devfunc [0]> :c The output is below (complete log is the attached "second_breakpoints.log") [...] NOTICE: probing dev 0x1f, func 0x1 NOTICE: probing dev 0x1f, func 0x2 kmdb: stop at pci_autoconfig`process_devfunc kmdb: target stopped at: pci_autoconfig`process_devfunc: pushq %rbp [0]> [0]> ::cont NOTICE: probing dev 0x1f, func 0x3 kmdb: stop at pci_autoconfig`process_devfunc kmdb: target stopped at: pci_autoconfig`process_devfunc: pushq %rbp [0]> [0]> ::cont NOTICE: probing dev 0x1f, func 0x4 NOTICE: probing dev 0x1f, func 0x5 kmdb: stop at pci_autoconfig`process_devfunc kmdb: target stopped at: pci_autoconfig`process_devfunc: pushq %rbp [0]> [0]> ::cont NOTICE: probing dev 0x1f, func 0x6 NOTICE: probing dev 0x1f, func 0x7 [hangs] To do further analysis, I did step over one by one. complete log is "Third_breakpoints.log" pci_autoconfig`memlist_merge is called repeatedly, and I did step over for about 20 min, but the execution did not exit out this loop. pci_autoconfig`enumerate_bus_devs+0x1ef:call +0x4964 <pci_autoconfig`memlist_merge> so I guess, this forever loop causes the installation hang. Is there any workaround for this issue or necessary information to fix this issue ?
1. When it hangs, can 'F1+a' get into kmdb? (pls boot with -kdv before) If yes, what's the calling stack? [0]> $C 2. can you post "/usr/X11/bin/scanpci -v" result on previous working OSOL build? 3. If it hangs inside the while loop: void enumerate_bus_devs(uchar_t bus, int config_op) { [...] while (par_bus != (uchar_t)-1) { [...] } } Pls help to collect more info(I assume you are still running b125). boot with "-kdv": [0]> ::bp pci_autoconfig`_init [0]> :c [0]> pci_boot_debug/W 0x1 [0]> add_bus_slot_names_prop:b [0]> enumerate_bus_devs+0x143:b [0]> enumerate_bus_devs+0x20b:b continuously run ":c" and check the value of register r13 when the two enumerate_bus_devs breakpoints are hit: [0]> :c [0]> <r13=X ==> this is the value of par_bus.
Created an attachment (id=2946) [details] debug trace r13 (par_bus)
Please see my inline comments. 1. When it hangs, can 'F1+a' get into kmdb? (pls boot with -kdv before) If yes, what's the calling stack? [0]> $C [Answer] 'F1;a' can not interrupt, I mean it does not work. 2. can you post "/usr/X11/bin/scanpci -v" result on previous working OSOL build? [Answer] As I heard that the previous builds did not support Intel Nehalem, so this is first time to install on the machine. I don't know which build started to support Nehalem, so I try to install the latest buld(b125). I am sorry that I can not get the above ouput. If you know a way to get it on the kmdb environemnt, please let me know. 3. If it hangs inside the while loop: void enumerate_bus_devs(uchar_t bus, int config_op) { [...] while (par_bus != (uchar_t)-1) { [...] } } Pls help to collect more info(I assume you are still running b125). boot with "-kdv": [0]> ::bp pci_autoconfig`_init [0]> :c [0]> pci_boot_debug/W 0x1 [0]> add_bus_slot_names_prop:b [0]> enumerate_bus_devs+0x143:b [0]> enumerate_bus_devs+0x20b:b continuously run ":c" and check the value of register r13 when the two enumerate_bus_devs breakpoints are hit: [0]> :c [0]> <r13=X ==> this is the value of par_bus. [Answer] The value of par_bus is always zero. You could check it in details with the attached log file "debug trace r13 (par_bus) ".
Thank you for your update. The values of pci_bus_res[0].par_bus was overwritten in pci_autoconfig`add_ppb_props. [0]> ::bp pci_autoconfig`create_root_bus_dip [0]> :c [0]> *pci_bus_res ::print -at struct pci_bus_resource par_bus ffffff03db07fd08 uchar_t par_bus = 0xff [0]> ffffff03db07fd08:w [0]> :c kmdb: stop on write of [0xffffff03db07fd08, 0xffffff03db07fd09) kmdb: target stopped at: pci_autoconfig`add_ppb_props+0x10c: cmpl $0x0,-0x88(%rbp) [0]> $c pci_autoconfig`add_ppb_props+0x10c(ffffff03de0d1ca0, 0, 8, 0, 1, 0) pci_autoconfig`process_devfunc+0x754(0, 8, 0, 1, 8086, 0) pci_autoconfig`enumerate_bus_devs+0xfe(0, 0) pci_autoconfig`pci_setup_tree+0x6f() pci_autoconfig`pci_enumerate+0x1d(0) impl_bus_initialprobe+0x60() impl_setup_ddi+0x12b() create_devinfo_tree+0xb7() setup_ddi+0x13() startup_modules+0x273() startup+0x50() main+0x27() _locore_start+0x92() [0]> *pci_bus_res ::print -at struct pci_bus_resource par_bus ffffff03db07fd08 uchar_t par_bus = 0 [0]> *pci_bus_res ::print -at struct pci_bus_resource ffffff03db07fcc0 struct pci_bus_resource { ffffff03db07fcc0 struct memlist *io_avail = 0 ffffff03db07fcc8 struct memlist *io_used = 0xffffff03dce0fb38 ffffff03db07fcd0 struct memlist *mem_avail = 0 ffffff03db07fcd8 struct memlist *mem_used = 0xffffff03dce0faf8 ffffff03db07fce0 struct memlist *pmem_avail = 0 ffffff03db07fce8 struct memlist *pmem_used = 0 ffffff03db07fcf0 struct memlist *bus_avail = 0 ffffff03db07fcf8 dev_info_t *dip = 0xffffff03de0d1ca0 ffffff03db07fd00 void *privdata = 0xffffff03de0537f8 ffffff03db07fd08 uchar_t par_bus = 0 ffffff03db07fd09 uchar_t sub_bus = 0x5 ffffff03db07fd0a uchar_t root_addr = 0 ffffff03db07fd0b uchar_t num_cbb = 0 ffffff03db07fd0c boolean_t io_reprogram = 0 (B_FALSE) ffffff03db07fd10 boolean_t mem_reprogram = 0 (B_FALSE) ffffff03db07fd14 boolean_t subtractive = 0 (B_FALSE) ffffff03db07fd18 uint_t mem_size = 0 ffffff03db07fd1c uint_t io_size = 0 } 0]> :c kmdb: stop on write of [0xffffff03db07fd08, 0xffffff03db07fd09) kmdb: target stopped at: pci_autoconfig`add_ppb_props+0x10c: cmpl $0x0,-0x88(%rbp) [0]> *pci_bus_res ::print -at struct pci_bus_resource ffffff03db07fcc0 struct pci_bus_resource { ffffff03db07fcc0 struct memlist *io_avail = 0 ffffff03db07fcc8 struct memlist *io_used = 0xffffff03dce0fb38 ffffff03db07fcd0 struct memlist *mem_avail = 0 ffffff03db07fcd8 struct memlist *mem_used = 0xffffff03dce0faf8 ffffff03db07fce0 struct memlist *pmem_avail = 0 ffffff03db07fce8 struct memlist *pmem_used = 0 ffffff03db07fcf0 struct memlist *bus_avail = 0 ffffff03db07fcf8 dev_info_t *dip = 0xffffff03de0d1a20 ffffff03db07fd00 void *privdata = 0xffffff03de0537f8 ffffff03db07fd08 uchar_t par_bus = 0 ffffff03db07fd09 uchar_t sub_bus = 0 ffffff03db07fd0a uchar_t root_addr = 0 ffffff03db07fd0b uchar_t num_cbb = 0 ffffff03db07fd0c boolean_t io_reprogram = 0 (B_FALSE) ffffff03db07fd10 boolean_t mem_reprogram = 0 (B_FALSE) ffffff03db07fd14 boolean_t subtractive = 0 (B_FALSE) ffffff03db07fd18 uint_t mem_size = 0 ffffff03db07fd1c uint_t io_size = 0 } [0]> :c HANGS In pci_autoconfig`add_ppb_props, the suspectful line is uchar_t secbus = pci_getb(bus, dev, func, PCI_BCNF_SECBUS); uchar_t subbus = pci_getb(bus, dev, func, PCI_BCNF_SUBBUS); [...] ASSERT(pci_bus_res[secbus].dip == NULL); pci_bus_res[secbus].dip = dip; *** pci_bus_res[secbus].par_bus = bus; *** Next, as you could advise, and the issue seems to be hardware issue, I checked if there is the later bios version of motherboard(supermicro Motherboard: X8DTi-F). I found the latest version, and tried again. The result was successful, pci_getb returned appropriate value, and the installation proceeded. I highly appreciate your support ! Best Regards,
That's great. And you are welcome!
As this didn't involved an OpenSolaris bug, I've adjusted the Resolution accordingly.
BTW, forgot to mention, it's a BIOS issue because the bridge is configured wrongly by BIOS with secondary bus == primary bus == 0
Thank you, I could understand the problem much better.