[OpenAFS] Problems with afs driver on a Solaris 8 SF15K

rogbazan rogbazan <rogbazan@gmail.com>
Thu, 10 Mar 2005 15:50:54 -0600


Hi, we had a client on a SunFire 15K, and it died, a few days ago.
Sun Microsystems review the dump generated by the host at the panic
time; and this what they told us about the problem:

Lines in the dump file:

Mar 2 20:07:03 2005 panic[cpu482]/thread=3009a3306c0: getdcache <--
Panic: it points to "afs" according to the lines below.
Mar 2 20:07:03 2005
Mar 2 20:07:03 2005 000002a10d587230 afs:osi_Panic+54 (78a1bad8, 0, 0,
2a10d5875d8, 0,
3009a3c7900)
Mar 2 20:07:03 2005 %l0-3: 0000000078a1bad8 0000000000000000 0000000000002710
000000004226711d
Mar 2 20:07:03 2005 %l4-7: 00000000040a0000 000003003dc71aa0 0000000000000000
000002a10d587260
Mar 2 20:07:03 2005 000002a10d587300 afs:afs_GetDCache+e44
(3003dc71aa0, 40b0000,
2a10d5877c4, 2a10d587
6a0, 2a10d58769c, 2)
Mar 2 20:07:03 2005 %l0-3: 000000000000012d 0000000000204421 000003003dc71c28
0000000000005f74
Mar 2 20:07:03 2005 %l4-7: 0000000000000001 000002a7c9a06000 0000000000000000
0000000000000000
Mar 2 20:07:03 2005 000002a10d5875c0 afs:afs_PrefetchChunk+d8
(3003dc71aa0, 3007272bc80,
3009a3c7900, 2
a10d5877c4, 3003dc71aa0, 2a7c9a04000)
Mar 2 20:07:03 2005 %l0-3: 0000000000000003 000003007272bc80 0000000000010000
00000000040b0000
Mar 2 20:07:03 2005 %l4-7: 0000000000000003 000002a7c9a04000 00000300011e7ef8
0000000000000000
Mar 2 20:07:03 2005 000002a10d5876d0 afs:afs_nfsrdwr+1920
(3003dc71aa0, 2a10d587a00, 1, 0,
3009a3c7900,
0)
Mar 2 20:07:03 2005 %l0-3: 0000000000000000 0000000004090000 00000000040a0000
0000000000000000
Mar 2 20:07:03 2005 %l4-7: 0000000000000000 0000000010423c18 0000000000000000
0000000000000000
Mar 2 20:07:03 2005 000002a10d587860 afs:afs_vmwrite+7c (3003dc71aa0,
2a10d587a00, 0,
3009a3c7900, 300b
dea00a0, 300bdea0000)
Mar 2 20:07:03 2005 %l0-3: 0000000078476e50 000003006489ba38 0000000000040000
0000000000000001
Mar 2 20:07:03 2005 %l4-7: 000002a109577d20 0000030020b83240 0000000000010bed
000002a100dbf968
Mar 2 20:07:03 2005 000002a10d587940 genunix:write+204 (4080000,
40000, 2002, 30069bcd5b0,
4, 40000)
Mar 2 20:07:03 2005 %l0-3: 00000000789f2e70 0000000000040000 000003003dc71aa0
0000000000000000
Mar 2 20:07:03 2005 %l4-7: 0000000000000004 0000000010423790 00000300654e05c0
0000030067ebcc78
Mar 2 20:07:04 2005 000002a10d587a40 genunix:write32+30 (4, ff230000,
40000, 2, 21940,
125d0)
Mar 2 20:07:04 2005 000002a10d587a40 genunix:write32+30 (4, ff230000,
40000, 2, 21940,
125d0)
Mar 2 20:07:04 2005 %l0-3: 0000000000000004 0000030049c5d520 0000030066b7a028
0000000000000000
Mar 2 20:07:04 2005 %l4-7: 0000000000000001 0000030049c5d520 0000000000000000
0000000000000000
Mar 2 20:07:04 2005
Mar 2 20:07:34 2005 syncing file systems... <-- it tries to close the
file systemscorrectly.
panic[cpu482]/thread=3009a3306c0: panic sync timeout <-- It does not
sync, time expired.
It generates a second  panic.
Mar 2 20:07:34 2005 dumping to /dev/md/dsk/d70, offset 7283474432 <--
The dump file of the second panic is saved.


After that, a Sun engineer says:


"I inform to you that the panic occurred in the "afs" driver. SUN does not
support the "afs" driver, so we do not have the source code for it and we cannot
provide the customer with a solution. This customer needs to contact the company
that supports the "afs" driver.
That was Transarc initially, but I believe Transarc was taken over by IBM.
Best Regards,
Paul McKernan, PTS-KERNEL AMER"


Does any body knows about a similar problem or a patch that prevents that.

The client is a Transarc v.3.6 Rel.2.48.
Regards