[OpenAFS] Stability of AFS

makowskm@chemia.uj.edu.pl makowskm@chemia.uj.edu.pl
Thu, 24 Oct 2002 10:18:05 +0200 (CEST)


We are using AFS for few months in our organization. For two weeks we have
constant problems with stability of file system. Every 2(3) days it
collapses producing system logs like those:

Oct 23 15:18:31 porsacz kernel: Unable to handle kernel paging request at
virtual address 0f3c8b21
Oct 23 15:18:31 porsacz kernel:  printing eip:
Oct 23 15:18:31 porsacz kernel: f883bde3
Oct 23 15:18:31 porsacz kernel: *pde = 00000000
Oct 23 15:18:31 porsacz kernel: Oops: 0002
Oct 23 15:18:31 porsacz kernel: libafs-2.4.18-10-athlon.mp soundcore
eepro100 ext3 jbd 3w-xxxx sd_mod scsi_mod
Oct 23 15:18:31 porsacz kernel: CPU:    1
Oct 23 15:18:31 porsacz kernel: EIP:    0010:[<f883bde3>]    Tainted: PF
Oct 23 15:18:31 porsacz kernel: EFLAGS: 00010246
Oct 23 15:18:31 porsacz kernel:
Oct 23 15:18:31 porsacz kernel: EIP is at journal_commit_transaction [jbd]
0x7c3 (2.4.18-10smp)
Oct 23 15:18:31 porsacz kernel: eax: 0f3c8b11   ebx: f6488c90   ecx:
00000b5c   edx: f6837840
Oct 23 15:18:31 porsacz kernel: esi: 00000000   edi: f6946600   ebp:
e3787f90   esp: f69bde80
Oct 23 15:18:31 porsacz kernel: ds: 0018   es: 0018   ss: 0018
Oct 23 15:18:31 porsacz kernel: Process kjournald (pid: 149,
stackpage=f69bd000)
Oct 23 15:18:31 porsacz kernel: Stack: 00003016 00000000 00000f9c c5363064
0000000a cc065ac0 cd977bd0 00000d77
Oct 23 15:18:31 porsacz kernel:        00000001 ec274700 ec7e15c0 00000000
d7bbc3c0 cb1c1240 cb1c11c0 cb1c1140
Oct 23 15:18:31 porsacz kernel:        cb1c10c0 cb5d3f40 cb5d3ec0 cb5d3e40
cb5d3dc0 cb5d3d40 cb1c1d40 cb1c1cc0
Oct 23 15:18:31 porsacz kernel: Call Trace: [<f883e7e6>] kjournald [jbd]
0x136
Oct 23 15:18:31 porsacz kernel: [<f883e690>] commit_timeout [jbd] 0x0
Oct 23 15:18:31 porsacz kernel: [<c0107286>] kernel_thread [kernel] 0x26
Oct 23 15:18:31 porsacz kernel: [<f883e6b0>] kjournald [jbd] 0x0
Oct 23 15:18:31 porsacz kernel:
Oct 23 15:18:31 porsacz kernel:
Oct 23 15:18:31 porsacz kernel: Code: f0 ff 40 10 8b 03 f0 0f ba 68 18 0a
8b 44 24 1c 50 8d 44 24

	Checking the server status after such events don't show anything wrong,
but in fact none of the AFS clients can get to file system. All what can
be done is to obtain a token.The only way to bring back functionality is
restarting the server machine.

	We are using OpenAFS ver.1.2.6 on RedHat 7.3 with OpenAFS modules
compiled for our kernel (2.4.18-10smp). The server works as SMP with two
Athlons1800+.The file system is located on the RAID5 with ext3 type
partition. The machine has both AFS server and client functionality and
the client cache is located on a separate partition of ext2 type.

Could anyone help us to explain the instability of AFS in such configuration?

Yours,

Marcin Makowski
Department of the Theoretical Chemistry
Jagiellonian University
makowskm@chemia.uj.edu.pl