[openib-general] Kernel 2.6.9

Andras.Horvath at cern.ch
Fri Oct 22 04:35:24 PDT 2004


> What are you trying to get working?

VAPI and/or IPoIB.

> Have you loaded ib_mthca?  How about ib_ipoib?

well.. now I've started from
'find /lib/modules/2.6.9ib/kernel/drivers/infiniband/'
due to lack of any docs.

In short, the kernel crashed after trying to send the first packet on
ipoib (?). The test setup is two dual Xeon i386 boxes, 2.6.9 plus
Roland's IB patches (SMP kernel).
Kernel messages as below - please let me know if I did something wrong
:) or how I can aid the debug process.

The hardware is a Voltaire PCI-X HCA:
03:01.0 PCI bridge: Mellanox Technology MT23108 PCI Bridge (rev a1)
04:00.0 InfiniBand: Mellanox Technology MT23108 InfiniHost (rev a1)
(but I can try with "original" Mellanox Cougar boards as we have those
as well)

By the way, what is the primary development platform (architecture) for
openib.org? 

Thanks in advance!

> Do you have an external subnet manager running on something?

not (at least, not yet, this is a back-to-back demo setup).

So, what I did:

# modprobe ib_mthca
# modprobe ib_ipoib
# dmesg
[...]
ib_mthca: Mellanox InfiniBand HCA driver v0.05-pre (June 13, 2004)
ib_mthca: Initializing Mellanox Technology MT23108 InfiniHost (0000:04:00.0)
ib_mthca 0000:04:00.0: Found bridge: Mellanox Technology MT23108 PCI Bridge (0000:03:01.0)
ib_mthca 0000:04:00.0: FW version 000100180000, max_cmds 1
ib_mthca 0000:04:00.0: FW size 6143 KB (start f7a00000, end f7ffffff)
ib_mthca 0000:04:00.0: HCA memory size 131071 KB (start f0000000, end f7ffffff)
ib_mthca 0000:04:00.0: Max QPs: 16777216, reserved QPs: 16, entry size: 256
ib_mthca 0000:04:00.0: Max CQs: 16777216, reserved CQs: 128, entry size: 64
ib_mthca 0000:04:00.0: Max EQs: 64, reserved EQs: 1, entry size: 64
ib_mthca 0000:04:00.0: reserved MPTs: 16, reserved MTTs: 16
ib_mthca 0000:04:00.0: Max PDs: 16777216, reserved PDs: 0, reserved UARs: 1
ib_mthca 0000:04:00.0: Max QP/MCG: 16777216, reserved MGMs: 0
ib_mthca 0000:04:00.0: Flags: 003f0337
ib_mthca 0000:04:00.0: profile[ 0]--10/20 @ 0x        f0000000 (size 0x 4000000)
ib_mthca 0000:04:00.0: profile[ 1]-- 0/16 @ 0x        f4000000 (size 0x 1000000)
ib_mthca 0000:04:00.0: profile[ 2]-- 7/18 @ 0x        f5000000 (size 0x  800000)
ib_mthca 0000:04:00.0: profile[ 3]-- 9/17 @ 0x        f5800000 (size 0x  800000)
ib_mthca 0000:04:00.0: profile[ 4]-- 3/16 @ 0x        f6000000 (size 0x  400000)
ib_mthca 0000:04:00.0: profile[ 5]-- 4/16 @ 0x        f6400000 (size 0x  200000)
ib_mthca 0000:04:00.0: profile[ 6]--12/15 @ 0x        f6600000 (size 0x  100000)
ib_mthca 0000:04:00.0: profile[ 7]-- 8/13 @ 0x        f6700000 (size 0x   80000)
ib_mthca 0000:04:00.0: profile[ 8]--11/11 @ 0x        f6780000 (size 0x   10000)
ib_mthca 0000:04:00.0: profile[ 9]-- 6/ 5 @ 0x        f6790000 (size 0x     800)
ib_mthca 0000:04:00.0: HCA memory: allocated 106050 KB/124928 KB (18878 KB free)
ib_mthca 0000:04:00.0: Allocated EQ 1 with 65536 entries
ib_mthca 0000:04:00.0: Allocated EQ 2 with 128 entries
ib_mthca 0000:04:00.0: Allocated EQ 3 with 128 entries
ib_mthca 0000:04:00.0: Setting mask 000000000003c3fe for eqn 2
ib_mthca 0000:04:00.0: Setting mask 0000000000000400 for eqn 3
ib_mthca 0000:04:00.0: Registering memory at 0 (iova 0) in PD 1; shift 30, npages 1.
ib_mthca 0000:04:00.0: Registering memory at 0 (iova 0) in PD 2; shift 30, npages 1.
ib_mthca 0000:04:00.0: Registering memory at 0 (iova 0) in PD 3; shift 30, npages 1.


# ifconfig ib0
ib0       Link encap:UNSPEC  HWaddr
00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
          BROADCAST MULTICAST  MTU:2044  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:128
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
# ifconfig ib0 up 10.0.0.2/8
# ping 10.0.0.2
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=0 ttl=64 time=0.035 ms
64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=0.015 ms
[etc]

(did same on the other box, configuring it to 10.0.0.4, then on
10.0.0.2: ) 

# ping 10.0.0.4
PING 10.0.0.4 (1Unable to handle kernel NULL pointer dereference at virtual address 0000000c
 printing eip:
f89abc03
*pde = 36a1b001
Oops: 0000 [#1]
SMP
Modules linked in: ib_ipoib ib_sa_client ib_client_query ib_mad ib_poll ib_mthca ib_core ib_services e100 e1000 floppy sg scsi_mod microcode
CPU:    0
EIP:    0060:[<f89abc03>]    Not tainted VLI
EFLAGS: 00010046   (2.6.9ib)
EIP is at mthca_post_send+0x575/0x70d [ib_mthca]
eax: 00000000   ebx: c03d5d90   ecx: 00000000   edx: f7d59280
esi: f7d59280   edi: c03d5d8c   ebp: f654e010   esp: c03d5cd0
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, threadinfo=c03d4000 task=c034fac0)
Stack: f7d59280 00000000 00000206 f659b080 f8c15153 f6d3d880 c03d5d54 00c15153
       00000000 00000000 00000001 f6d3d880 00000000 00000000 00000286 00000000
       f6981800 f8c15153 88088a89 00000000 00000000 f6981800 f6afa220 c03d5d8c
Call Trace:
 [<f8955046>] ipoib_send+0x1de/0x3e8 [ib_ipoib]
 [<f8956884>] ipoib_mcast_send+0x2a/0x2e [ib_ipoib]
 [<f8953c90>] ipoib_start_xmit+0x30e/0x320 [ib_ipoib]
 [<f8953d6a>] ipoib_hard_header+0x90/0xcf [ib_ipoib]
 [<c02f119e>] arp_create+0x1e9/0x269
 [<c02c0645>] qdisc_restart+0x14a/0x1d1
 [<c02b52db>] dev_queue_xmit+0x216/0x29f
 [<c02f0afc>] arp_solicit+0x104/0x1d8
 [<c02ba105>] neigh_timer_handler+0x176/0x27d
 [<c02b9f8f>] neigh_timer_handler+0x0/0x27d
 [<c01253b4>] run_timer_softirq+0xc7/0x180
 [<c01214a2>] __do_softirq+0xba/0xc9
 [<c01214de>] do_softirq+0x2d/0x2f
 [<c0112412>] smp_apic_timer_interrupt+0x90/0xfa
 [<c0103d08>] default_idle+0x2a/0x2d
 [<c0103cde>] default_idle+0x0/0x2d
 [<c010679a>] apic_timer_interrupt+0x1a/0x20
 [<c0103cde>] default_idle+0x0/0x2d
 [<c0103d08>] default_idle+0x2a/0x2d
 [<c0103d7c>] cpu_idle+0x37/0x40
 [<c03d68b6>] start_kernel+0x18d/0x1cb
 [<c03d6337>] unknown_bootoption+0x0/0x182
Code: 44 24 34 75 10 83 c5 10 c7 44 24 28 02 00 00 00 e9 b4 fb ff ff 8b
54 24 6c 8b 44 24 70 89 10 e9 31 fe ff ff 8b 5c 24 6c 8b 43 20 <8b> 48
0c 89 c8 89 ca 25 00 ff 00 00 c1 e2 18 c1 e0 08 09 c2 89
 0.0.0.4) 56(84) <0>Kernel panic - not syncing: Fatal exception in interrupt



More information about the openib-general mailing list