[openib-general] Kernel 2.6.9
Andras.Horvath at cern.ch
Fri Oct 22 04:35:24 PDT 2004
> What are you trying to get working?
VAPI and/or IPoIB.
> Have you loaded ib_mthca? How about ib_ipoib?
well.. now I've started from
'find /lib/modules/2.6.9ib/kernel/drivers/infiniband/'
due to lack of any docs.
In short, the kernel crashed after trying to send the first packet on
ipoib (?). The test setup is two dual Xeon i386 boxes, 2.6.9 plus
Roland's IB patches (SMP kernel).
Kernel messages as below - please let me know if I did something wrong
:) or how I can aid the debug process.
The hardware is a Voltaire PCI-X HCA:
03:01.0 PCI bridge: Mellanox Technology MT23108 PCI Bridge (rev a1)
04:00.0 InfiniBand: Mellanox Technology MT23108 InfiniHost (rev a1)
(but I can try with "original" Mellanox Cougar boards as we have those
as well)
By the way, what is the primary development platform (architecture) for
openib.org?
Thanks in advance!
> Do you have an external subnet manager running on something?
not (at least, not yet, this is a back-to-back demo setup).
So, what I did:
# modprobe ib_mthca
# modprobe ib_ipoib
# dmesg
[...]
ib_mthca: Mellanox InfiniBand HCA driver v0.05-pre (June 13, 2004)
ib_mthca: Initializing Mellanox Technology MT23108 InfiniHost (0000:04:00.0)
ib_mthca 0000:04:00.0: Found bridge: Mellanox Technology MT23108 PCI Bridge (0000:03:01.0)
ib_mthca 0000:04:00.0: FW version 000100180000, max_cmds 1
ib_mthca 0000:04:00.0: FW size 6143 KB (start f7a00000, end f7ffffff)
ib_mthca 0000:04:00.0: HCA memory size 131071 KB (start f0000000, end f7ffffff)
ib_mthca 0000:04:00.0: Max QPs: 16777216, reserved QPs: 16, entry size: 256
ib_mthca 0000:04:00.0: Max CQs: 16777216, reserved CQs: 128, entry size: 64
ib_mthca 0000:04:00.0: Max EQs: 64, reserved EQs: 1, entry size: 64
ib_mthca 0000:04:00.0: reserved MPTs: 16, reserved MTTs: 16
ib_mthca 0000:04:00.0: Max PDs: 16777216, reserved PDs: 0, reserved UARs: 1
ib_mthca 0000:04:00.0: Max QP/MCG: 16777216, reserved MGMs: 0
ib_mthca 0000:04:00.0: Flags: 003f0337
ib_mthca 0000:04:00.0: profile[ 0]--10/20 @ 0x f0000000 (size 0x 4000000)
ib_mthca 0000:04:00.0: profile[ 1]-- 0/16 @ 0x f4000000 (size 0x 1000000)
ib_mthca 0000:04:00.0: profile[ 2]-- 7/18 @ 0x f5000000 (size 0x 800000)
ib_mthca 0000:04:00.0: profile[ 3]-- 9/17 @ 0x f5800000 (size 0x 800000)
ib_mthca 0000:04:00.0: profile[ 4]-- 3/16 @ 0x f6000000 (size 0x 400000)
ib_mthca 0000:04:00.0: profile[ 5]-- 4/16 @ 0x f6400000 (size 0x 200000)
ib_mthca 0000:04:00.0: profile[ 6]--12/15 @ 0x f6600000 (size 0x 100000)
ib_mthca 0000:04:00.0: profile[ 7]-- 8/13 @ 0x f6700000 (size 0x 80000)
ib_mthca 0000:04:00.0: profile[ 8]--11/11 @ 0x f6780000 (size 0x 10000)
ib_mthca 0000:04:00.0: profile[ 9]-- 6/ 5 @ 0x f6790000 (size 0x 800)
ib_mthca 0000:04:00.0: HCA memory: allocated 106050 KB/124928 KB (18878 KB free)
ib_mthca 0000:04:00.0: Allocated EQ 1 with 65536 entries
ib_mthca 0000:04:00.0: Allocated EQ 2 with 128 entries
ib_mthca 0000:04:00.0: Allocated EQ 3 with 128 entries
ib_mthca 0000:04:00.0: Setting mask 000000000003c3fe for eqn 2
ib_mthca 0000:04:00.0: Setting mask 0000000000000400 for eqn 3
ib_mthca 0000:04:00.0: Registering memory at 0 (iova 0) in PD 1; shift 30, npages 1.
ib_mthca 0000:04:00.0: Registering memory at 0 (iova 0) in PD 2; shift 30, npages 1.
ib_mthca 0000:04:00.0: Registering memory at 0 (iova 0) in PD 3; shift 30, npages 1.
# ifconfig ib0
ib0 Link encap:UNSPEC HWaddr
00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
BROADCAST MULTICAST MTU:2044 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:128
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
# ifconfig ib0 up 10.0.0.2/8
# ping 10.0.0.2
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=0 ttl=64 time=0.035 ms
64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=0.015 ms
[etc]
(did same on the other box, configuring it to 10.0.0.4, then on
10.0.0.2: )
# ping 10.0.0.4
PING 10.0.0.4 (1Unable to handle kernel NULL pointer dereference at virtual address 0000000c
printing eip:
f89abc03
*pde = 36a1b001
Oops: 0000 [#1]
SMP
Modules linked in: ib_ipoib ib_sa_client ib_client_query ib_mad ib_poll ib_mthca ib_core ib_services e100 e1000 floppy sg scsi_mod microcode
CPU: 0
EIP: 0060:[<f89abc03>] Not tainted VLI
EFLAGS: 00010046 (2.6.9ib)
EIP is at mthca_post_send+0x575/0x70d [ib_mthca]
eax: 00000000 ebx: c03d5d90 ecx: 00000000 edx: f7d59280
esi: f7d59280 edi: c03d5d8c ebp: f654e010 esp: c03d5cd0
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, threadinfo=c03d4000 task=c034fac0)
Stack: f7d59280 00000000 00000206 f659b080 f8c15153 f6d3d880 c03d5d54 00c15153
00000000 00000000 00000001 f6d3d880 00000000 00000000 00000286 00000000
f6981800 f8c15153 88088a89 00000000 00000000 f6981800 f6afa220 c03d5d8c
Call Trace:
[<f8955046>] ipoib_send+0x1de/0x3e8 [ib_ipoib]
[<f8956884>] ipoib_mcast_send+0x2a/0x2e [ib_ipoib]
[<f8953c90>] ipoib_start_xmit+0x30e/0x320 [ib_ipoib]
[<f8953d6a>] ipoib_hard_header+0x90/0xcf [ib_ipoib]
[<c02f119e>] arp_create+0x1e9/0x269
[<c02c0645>] qdisc_restart+0x14a/0x1d1
[<c02b52db>] dev_queue_xmit+0x216/0x29f
[<c02f0afc>] arp_solicit+0x104/0x1d8
[<c02ba105>] neigh_timer_handler+0x176/0x27d
[<c02b9f8f>] neigh_timer_handler+0x0/0x27d
[<c01253b4>] run_timer_softirq+0xc7/0x180
[<c01214a2>] __do_softirq+0xba/0xc9
[<c01214de>] do_softirq+0x2d/0x2f
[<c0112412>] smp_apic_timer_interrupt+0x90/0xfa
[<c0103d08>] default_idle+0x2a/0x2d
[<c0103cde>] default_idle+0x0/0x2d
[<c010679a>] apic_timer_interrupt+0x1a/0x20
[<c0103cde>] default_idle+0x0/0x2d
[<c0103d08>] default_idle+0x2a/0x2d
[<c0103d7c>] cpu_idle+0x37/0x40
[<c03d68b6>] start_kernel+0x18d/0x1cb
[<c03d6337>] unknown_bootoption+0x0/0x182
Code: 44 24 34 75 10 83 c5 10 c7 44 24 28 02 00 00 00 e9 b4 fb ff ff 8b
54 24 6c 8b 44 24 70 89 10 e9 31 fe ff ff 8b 5c 24 6c 8b 43 20 <8b> 48
0c 89 c8 89 ca 25 00 ff 00 00 c1 e2 18 c1 e0 08 09 c2 89
0.0.0.4) 56(84) <0>Kernel panic - not syncing: Fatal exception in interrupt
More information about the openib-general mailing list