Bug 12313 - SXCE b125 installaton hangs on Supermicro Motherboard: X8DTi-F.
: SXCE b125 installaton hangs on Supermicro Motherboard: X8DTi-F.
Status: RESOLVED INVALID
Product: opensolaris
kernel
: 200906
: i86pc/i386 OpenSolaris
: P2 normal (vote)
: ---
Assigned To: Watcher account for OpenSolaris kernel bugs
:
:
:
:
:
: 8314
  Show dependency treegraph
 
Reported: 2009-10-29 08:46 UTC by Hiroshi Takeuchi
Modified: 2009-11-05 02:22 UTC (History)
5 users (show)

See Also:


Attachments
First breakpoint log (3.22 KB, text/plain)
2009-11-02 07:26 UTC, Hiroshi Takeuchi
no flags Details
Second breakpoint log (19.00 KB, text/plain)
2009-11-02 07:27 UTC, Hiroshi Takeuchi
no flags Details
Third breakpoint log (280.96 KB, text/plain)
2009-11-02 07:28 UTC, Hiroshi Takeuchi
no flags Details
debug trace r13 (par_bus) (38.70 KB, application/octet-stream)
2009-11-04 00:50 UTC, Hiroshi Takeuchi
no flags Details


Note

You need to log in before you can comment on or make changes to this bug.


Description Hiroshi Takeuchi 2009-10-29 08:46:46 UTC
I'm trying to install OpenSolaris build 125 on Supermicro Xeon E5504 Server.
But the installation hangs, CD drives stops after grub menu shows. 

The last line I can see is the below. 
installing busra, module id 38.

First the CPU is Nehalem, so I tried work around introduced in Bug ID 6874223.

Work Around 
Boot with '-kd', then do:
 [0]> ::bp pcplusmp`_init
 [0]> :
 [0]> x2apic_enable/W 0
 [0]> :c     

But After setting the break point at pcpplusmp _init, I tried to continue, but
hangs.

I boot with -kdv options, and set module debug mask(moddebug /W 80000000).
From the above approach, the stack point seems to be inside
impl_bus_initialprobe
function.

impl_bus_initialprobe+0x47 * After step out this point, the installation gets
stuck.

   2647 static void
   2648 impl_bus_initialprobe(void)
   2649 {
   2650     struct bus_probe *probe;
   2651 
   2652     /* load modules to install bus probes */
   2653 #if defined(__xpv)
   2654     if (DOMAIN_IS_INITDOMAIN(xen_info)) {
   2655         if (modload("misc", "pci_autoconfig") < 0) {
   2656             panic("failed to load misc/pci_autoconfig");
   2657         }
   2658 
   2659         if (modload("drv", "isa") < 0) ★
   2660             panic("failed to load drv/isa");
   2661     }
   2662 
   2663     (void) modload("misc", "xpv_autoconfig");
   2664 #else
   2665     (void) modload("misc", "acpidev");
   2666 
   2667     if (modload("misc", "pci_autoconfig") < 0) {
   2668         panic("failed to load misc/pci_autoconfig");
   2669     }
   2670 
   2671     if (modload("drv", "isa") < 0)
   2672         panic("failed to load drv/isa");
   2673 #endif
   2674 
   2675     probe = bus_probes;
   2676     while (probe) {
   2677         /* run the probe functions */
   2678         (*probe->probe)(0); <--- Here, it seems to be stuck. 
   2679         probe = probe->next;
   2680     }
   2681 }

------- System Details -----------
Motherboard: X8DTi-F
Case: CSE-846A-R1200B
CPU: Xeon-E5504 2.00G/4MB
Memory: DDRIII 1333 2GB ECC/REG 128*8 "F" x6 (Total 12GB)
HDD: HDT721010SLA360 (1TB SATA300 7200) x24 (Total 24TB)
Raid Board: Adaptec ASR-52445 KIT
Comment 1 Kerry Shu 2009-10-29 17:26:10 UTC
pls have a try to boot with:
  -B acpidev-autoconfig=off -kdv
  [0]> prom_debug/W 1
  [0]> :c
Comment 2 Hiroshi Takeuchi 2009-10-30 06:00:13 UTC
-B acpidev-autoconfig=off -kdv
  [0]> acpidev-autoconfig/W 1
  [0]> :c

After booting unix kernel with the acpidev-autoconfig=off, 
I tried to set acpidev-autoconfig value, but I can not.

[0]> moddebug/W 80000000
moddebug:       0               =       0x80000000
[0]> ::bp acpidev`acpidev_boot_probe
[0]> ::bp isa`_init
[0]> acpi-autoconfig/W 1
kmdb: failed to dereference symbol: unknown symbol name
[0]>

By doing step over, I confirmed that installation does not hang inside
acpidev`acpidev_boot_probe, and 
acpidev_initialize functions, so I can not be sure acpi device has somethint
wrong for this problem. 
isa`_init can be stepped out, too. So I guess isa driver also can not be a
issue. 

# Sequence impl_bus_initialprobe 
2) acpidev`_init -> OK
3) isa`_init -> OK
4) acpidev`acpi_dev_initialize -> OK
5) acpidev`acpidev_boot_probe -> OK

When stepping out of impl_bus_initialize, the installation hangs.
So I guess the trigger is inside it, but I can not be sure as to 
trace this further. 
Advise how to proceed would be highly appreciated.



/*
 * impl_bus_initialprobe
 *      Modload the prom simulator, then let it probe to verify existence
 *      and type of PCI support.
 */
static void
impl_bus_initialprobe(void)
{
    [...]
        probe = bus_probes;
        while (probe) {
                /* run the probe functions */
                (*probe->probe)(0); 
                probe = probe->next;
        }
}
Comment 3 Hiroshi Takeuchi 2009-10-30 06:09:14 UTC
I tried -B pci-reprog=off, but it can not resolve.
Comment 4 Kerry Shu 2009-10-30 15:44:04 UTC
we can try below breakpoints to see in which bus probe function it hangs:
[0]> ::bp acpidev`acpidev_boot_probe
[0]> ::bp acpica`acpica_init
[0]> ::bp pci_autoconfig`pci_enumerate
[0]> ::bp isa`isa_enumerate
[0]> e_ddi_instance_init:b

Then you can further narrow down where it hangs inside the probe function.
Comment 5 Hiroshi Takeuchi 2009-11-02 07:26:51 UTC
Created an attachment (id=2928) [details]
First breakpoint log
Comment 6 Hiroshi Takeuchi 2009-11-02 07:27:47 UTC
Created an attachment (id=2929) [details]
Second breakpoint log
Comment 7 Hiroshi Takeuchi 2009-11-02 07:28:39 UTC
Created an attachment (id=2930) [details]
Third breakpoint log
Comment 8 Hiroshi Takeuchi 2009-11-02 07:30:13 UTC
Thank you appreciate your advise. 

I coud progress the analysis further. 

To be temporarily conclued,  in enumerate_bus_devs functions, 
execution seems not to exit the below loop, and it causes installation hang. 

void
enumerate_bus_devs(uchar_t bus, int config_op)
{
    [...]
    while (par_bus != (uchar_t)-1) {
        [...]
    }

}

The analysis which I did is the below, 

First, I set up break points as you could advise. it revealed that 
ths stuck is in pci_autoconfig`pci_enumerate. 

Please see first_breakpoints.log 

Next I set the below breakpoints. 

 [0]> ::bp pci_autoconfig`pci_enumerate
 [0]> :c
 [0]> ::bp pci_autoconfig`pci_setup_tree
 [0]> :c
 [0]> ::bp pci_autoconfig`enumerate_bus_devs
 [0]> pci_boot_debug/W 0x1   
 [0]> :c
 [0]> :$<msgbuf
 [0]>::bp pci_autoconfig`process_devfunc
 [0]> :c

The output is below (complete log is the attached "second_breakpoints.log") 
[...]
NOTICE: probing dev 0x1f, func 0x1
NOTICE: probing dev 0x1f, func 0x2
kmdb: stop at pci_autoconfig`process_devfunc
kmdb: target stopped at:
pci_autoconfig`process_devfunc: pushq  %rbp
[0]> [0]> ::cont
NOTICE: probing dev 0x1f, func 0x3
kmdb: stop at pci_autoconfig`process_devfunc
kmdb: target stopped at:
pci_autoconfig`process_devfunc: pushq  %rbp
[0]> [0]> ::cont
NOTICE: probing dev 0x1f, func 0x4
NOTICE: probing dev 0x1f, func 0x5
kmdb: stop at pci_autoconfig`process_devfunc
kmdb: target stopped at:
pci_autoconfig`process_devfunc: pushq  %rbp
[0]> [0]> ::cont

NOTICE: probing dev 0x1f, func 0x6
NOTICE: probing dev 0x1f, func 0x7
[hangs]     

To do further analysis, I did step over one by one. 
complete log is "Third_breakpoints.log"

pci_autoconfig`memlist_merge is called repeatedly, and 
I did step over for about 20 min, but the execution did not exit out this loop. 

pci_autoconfig`enumerate_bus_devs+0x1ef:call   +0x4964 
<pci_autoconfig`memlist_merge>

so I guess, this forever loop causes the installation hang. 

Is there any workaround for this issue or necessary information to fix this
issue ?
Comment 9 Kerry Shu 2009-11-02 18:04:13 UTC
1. When it hangs, can 'F1+a' get into kmdb? (pls boot with -kdv before)
If yes, what's the calling stack?
[0]> $C

2. can you post "/usr/X11/bin/scanpci -v" result on previous working OSOL
build?

3. If it hangs inside the while loop:
void
enumerate_bus_devs(uchar_t bus, int config_op)
{
    [...]
    while (par_bus != (uchar_t)-1) {
        [...]
    }

}
Pls help to collect more info(I assume you are still running b125).
boot with "-kdv":
[0]> ::bp pci_autoconfig`_init
[0]> :c
[0]> pci_boot_debug/W 0x1
[0]> add_bus_slot_names_prop:b
[0]> enumerate_bus_devs+0x143:b
[0]> enumerate_bus_devs+0x20b:b

continuously run ":c" and check the value of register r13 when the two
enumerate_bus_devs breakpoints are hit:
[0]> :c
[0]> <r13=X
   ==> this is the value of par_bus.
Comment 10 Hiroshi Takeuchi 2009-11-04 00:50:17 UTC
Created an attachment (id=2946) [details]
debug trace r13 (par_bus)
Comment 11 Hiroshi Takeuchi 2009-11-04 01:08:11 UTC
Please see my inline comments. 

1. When it hangs, can 'F1+a' get into kmdb? (pls boot with -kdv before)
If yes, what's the calling stack?
[0]> $C

[Answer]
'F1;a' can not interrupt, I mean it does not work.

2. can you post "/usr/X11/bin/scanpci -v" result on previous working OSOL
build?

[Answer]
As I heard that the previous builds did not support Intel Nehalem, so this is 
first time to install on the machine. I don't know which build started to
support 
Nehalem, so I try to install the latest buld(b125). 
I am sorry that I can not get the above ouput. If you know a way to get it on
the kmdb 
environemnt, please let me know.

3. If it hangs inside the while loop:
void
enumerate_bus_devs(uchar_t bus, int config_op)
{
    [...]
    while (par_bus != (uchar_t)-1) {
        [...]
    }

}
Pls help to collect more info(I assume you are still running b125).
boot with "-kdv":
[0]> ::bp pci_autoconfig`_init
[0]> :c
[0]> pci_boot_debug/W 0x1
[0]> add_bus_slot_names_prop:b
[0]> enumerate_bus_devs+0x143:b
[0]> enumerate_bus_devs+0x20b:b

continuously run ":c" and check the value of register r13 when the two
enumerate_bus_devs breakpoints are hit:
[0]> :c
[0]> <r13=X
   ==> this is the value of par_bus.

[Answer]
The value of par_bus is always zero. 
You could check it in details with the attached log file "debug trace r13
(par_bus) ".
Comment 12 Hiroshi Takeuchi 2009-11-05 01:53:41 UTC
Thank you for your update. 

The values of pci_bus_res[0].par_bus was overwritten in
pci_autoconfig`add_ppb_props. 

[0]> ::bp pci_autoconfig`create_root_bus_dip
[0]> :c
[0]> *pci_bus_res ::print -at struct pci_bus_resource par_bus
ffffff03db07fd08 uchar_t par_bus = 0xff
[0]> ffffff03db07fd08:w
[0]> :c
kmdb: stop on write of [0xffffff03db07fd08, 0xffffff03db07fd09)
kmdb: target stopped at:
pci_autoconfig`add_ppb_props+0x10c:     cmpl   $0x0,-0x88(%rbp)
[0]> $c
pci_autoconfig`add_ppb_props+0x10c(ffffff03de0d1ca0, 0, 8, 0, 1, 0)
pci_autoconfig`process_devfunc+0x754(0, 8, 0, 1, 8086, 0)
pci_autoconfig`enumerate_bus_devs+0xfe(0, 0)
pci_autoconfig`pci_setup_tree+0x6f()
pci_autoconfig`pci_enumerate+0x1d(0)
impl_bus_initialprobe+0x60()
impl_setup_ddi+0x12b()
create_devinfo_tree+0xb7()
setup_ddi+0x13()
startup_modules+0x273()
startup+0x50()
main+0x27()
_locore_start+0x92()
[0]> *pci_bus_res ::print -at struct pci_bus_resource par_bus
ffffff03db07fd08 uchar_t par_bus = 0
[0]> *pci_bus_res ::print -at struct pci_bus_resource
ffffff03db07fcc0 struct pci_bus_resource {
    ffffff03db07fcc0 struct memlist *io_avail = 0
    ffffff03db07fcc8 struct memlist *io_used = 0xffffff03dce0fb38
    ffffff03db07fcd0 struct memlist *mem_avail = 0
    ffffff03db07fcd8 struct memlist *mem_used = 0xffffff03dce0faf8
    ffffff03db07fce0 struct memlist *pmem_avail = 0
    ffffff03db07fce8 struct memlist *pmem_used = 0
    ffffff03db07fcf0 struct memlist *bus_avail = 0
    ffffff03db07fcf8 dev_info_t *dip = 0xffffff03de0d1ca0
    ffffff03db07fd00 void *privdata = 0xffffff03de0537f8
    ffffff03db07fd08 uchar_t par_bus = 0
    ffffff03db07fd09 uchar_t sub_bus = 0x5
    ffffff03db07fd0a uchar_t root_addr = 0
    ffffff03db07fd0b uchar_t num_cbb = 0
    ffffff03db07fd0c boolean_t io_reprogram = 0 (B_FALSE)
    ffffff03db07fd10 boolean_t mem_reprogram = 0 (B_FALSE)
    ffffff03db07fd14 boolean_t subtractive = 0 (B_FALSE)
    ffffff03db07fd18 uint_t mem_size = 0
    ffffff03db07fd1c uint_t io_size = 0
}
0]> :c
kmdb: stop on write of [0xffffff03db07fd08, 0xffffff03db07fd09)
kmdb: target stopped at:
pci_autoconfig`add_ppb_props+0x10c:     cmpl   $0x0,-0x88(%rbp)
[0]> *pci_bus_res ::print -at struct pci_bus_resource
ffffff03db07fcc0 struct pci_bus_resource {
    ffffff03db07fcc0 struct memlist *io_avail = 0
    ffffff03db07fcc8 struct memlist *io_used = 0xffffff03dce0fb38
    ffffff03db07fcd0 struct memlist *mem_avail = 0
    ffffff03db07fcd8 struct memlist *mem_used = 0xffffff03dce0faf8
    ffffff03db07fce0 struct memlist *pmem_avail = 0
    ffffff03db07fce8 struct memlist *pmem_used = 0
    ffffff03db07fcf0 struct memlist *bus_avail = 0
    ffffff03db07fcf8 dev_info_t *dip = 0xffffff03de0d1a20
    ffffff03db07fd00 void *privdata = 0xffffff03de0537f8
    ffffff03db07fd08 uchar_t par_bus = 0
    ffffff03db07fd09 uchar_t sub_bus = 0
    ffffff03db07fd0a uchar_t root_addr = 0
    ffffff03db07fd0b uchar_t num_cbb = 0
    ffffff03db07fd0c boolean_t io_reprogram = 0 (B_FALSE)
    ffffff03db07fd10 boolean_t mem_reprogram = 0 (B_FALSE)
    ffffff03db07fd14 boolean_t subtractive = 0 (B_FALSE)
    ffffff03db07fd18 uint_t mem_size = 0
    ffffff03db07fd1c uint_t io_size = 0
}
[0]> :c
HANGS 

In pci_autoconfig`add_ppb_props, the suspectful line is 

uchar_t secbus = pci_getb(bus, dev, func, PCI_BCNF_SECBUS);
uchar_t subbus = pci_getb(bus, dev, func, PCI_BCNF_SUBBUS);
    [...]
ASSERT(pci_bus_res[secbus].dip == NULL);
pci_bus_res[secbus].dip = dip;
*** pci_bus_res[secbus].par_bus = bus; *** 

Next, as you could advise, and the issue seems to be hardware issue, I checked
if there is the later bios version
 of motherboard(supermicro Motherboard: X8DTi-F). I found the latest version,
and tried again.
The result was successful, pci_getb returned appropriate value, and the
installation proceeded. 

I highly appreciate your support ! 

Best Regards,
Comment 13 Kerry Shu 2009-11-05 02:03:33 UTC
That's great. And you are welcome!
Comment 14 David Comay 2009-11-05 02:06:55 UTC
As this didn't involved an OpenSolaris bug, I've adjusted the Resolution
accordingly.
Comment 15 Kerry Shu 2009-11-05 02:11:29 UTC
BTW, forgot to mention, it's a BIOS issue because the bridge is configured
wrongly by BIOS with
   secondary bus == primary bus == 0
Comment 16 Hiroshi Takeuchi 2009-11-05 02:22:54 UTC
Thank you, I could understand the problem much better.