[OpenAFS] openafs and xfs hangs (not a cache problem)

Alexander Bergolth leo@strike.wu-wien.ac.at
Fri, 19 Jul 2002 11:12:31 +0200


Hi!

I've recently upgraded my box to from the XFS enabled version of Redhat
7.2 to XFS-RH 7.3. Additionally I'm using the openafs-client (1.2.3 and
1.2.5). (The AFS-Cache is an EXT2-Filesystem.)

Since the upgrade, I'm expiriencing hanging processes when creating
large files in a local filesystem (around 700 MB):
A command like
    head -c 700000000 /dev/zero > testfile
hangs after having written the last byte. Other processes that try to
access the same file are blocking too. (testfile resides in an
XFS-Filesystem.)

I've been able to reproduce the system on several machines running
kernel-2.4.18-4SGI_XFS_1.1 and openafs-1.2.5-rh7.3.1. Using
kernel-2.4.9-21SGI_XFS_1.0.2 and openafs-1.2.3-rh7.2.2 didn't show the
problem.

I've also made the following observations:

*) The problem appears only in conjunction with the afs daemon.
*) Loading the AFS-Kernel Module is not enough to hang the process.
*) Only when afsd is running, processes that create lange files will hang.
*) Once you stop the daemon, hanging processes will return.
*) The call trace of one afsd-Process always contains some xfs_
functions when the problem occurs. (See below.)

The Call Trace of the hanging process always looks like that:

[<c0128796>] truncate_list_pages [kernel] 0x1f6
[<c01287db>] truncate_inode_pages [kernel] 0x3b
[<c01a6ec4>] xfs_itruncate_start [kernel] 0x74
[<c01bd894>] xfs_inactive_free_eofblocks [kernel] 0x1c4
[<c01bdf17>] xfs_release [kernel] 0x97
[<c01c8b40>] linvfs_release [kernel] 0x20
[<c0139e4d>] fput [kernel] 0x4d
[<c0138d73>] filp_close [kernel] 0x53
[<c0138dc3>] sys_close [kernel] 0x43
[<c0108923>] system_call [kernel] 0x33

afsd's Call Trace:

afsd          S C0358000    72  1153      1                1154 (NOTLB)
Call Trace: [<c011f8e4>] schedule_timeout [kernel] 0x14
[<c0116338>] ll_copy_to_user [kernel] 0x38
[<c0232f86>] wait_for_packet [kernel] 0xe6
[<c0233090>] skb_recv_datagram [kernel] 0xb0
[<c0262499>] udp_recvmsg [kernel] 0x59
[<c01f19fc>] __make_request [kernel] 0x25c
[<c0268059>] inet_recvmsg [kernel] 0x39
[<c022df71>] sock_recvmsg [kernel] 0x31
[<c018a835>] xfs_bmbt_get_state [kernel] 0x25
[<c01826b8>] xfs_bmap_do_search_extents [kernel] 0x348
[<f8987b01>] osi_NetReceive [libafs-2.4.18-4SGI_XFS_1.1-i686] 0xbd
[<c0182738>] xfs_bmap_search_extents [kernel] 0x48
[<c01152a2>] __wake_up [kernel] 0x32
[<f89916fe>] afs_osi_Wakeup [libafs-2.4.18-4SGI_XFS_1.1-i686] 0xe
[<f89889f9>] rxi_AllocDataBuf [libafs-2.4.18-4SGI_XFS_1.1-i686] 0x2d
[<f898841d>] rxk_ReadPacket [libafs-2.4.18-4SGI_XFS_1.1-i686] 0x95
[<f8988546>] rxk_Listener [libafs-2.4.18-4SGI_XFS_1.1-i686] 0x6e
[<f8993a06>] afs_syscall_call [libafs-2.4.18-4SGI_XFS_1.1-i686] 0x15e
[<f89b4dcc>] afs_RX_Running [libafs-2.4.18-4SGI_XFS_1.1-i686] 0x0
[<c018a835>] xfs_bmbt_get_state [kernel] 0x25
[<c0182626>] xfs_bmap_do_search_extents [kernel] 0x2b6
[<c0182738>] xfs_bmap_search_extents [kernel] 0x48
[<c0183c8b>] xfs_bmapi [kernel] 0x34b
[<c0208a2c>] ide_dmaproc [kernel] 0x1ec
[<c01f10fe>] generic_unplug_device [kernel] 0x1e
[<c012a4dc>] filemap_nopage [kernel] 0xbc
[<c0132135>] __alloc_pages [kernel] 0x75
[<c0136aed>] page_remove_rmap [kernel] 0x5d
[<c01260c9>] do_wp_page [kernel] 0x229
[<c0126886>] handle_mm_fault [kernel] 0x106
[<c01143aa>] do_page_fault [kernel] 0x12a
[<f8994823>] afs_syscall [libafs-2.4.18-4SGI_XFS_1.1-i686] 0x10b
[<c01218c0>] sys_setpriority [kernel] 0x60
[<c0108923>] system_call [kernel] 0x33

Additional information about the system and three test runs is avaliable at
    http://leo.kloburg.at/xfs-afs/

Can anybody help to find out what's going on?

Any hints are greatly appreciated,
--leo

-----------------------------------------------------------------------
Alexander (Leo) Bergolth                          leo@leo.wu-wien.ac.at
WU-Wien - Zentrum fuer Informatikdienste       http://leo.wu-wien.ac.at
                    Computers are like air conditioners -
              they stop working properly when you open Windows