[OpenAFS] deadlock in OpenAFS 1.4.11 (Solaris 5.10)
Douglas E. Engert
deengert@anl.gov
Fri, 09 Apr 2010 11:33:48 -0500
What happens if you have exported LANG=C into your environment.
Based on you trace the freeze comes before AFS does anything. Or
is just the truss has not written all its output.
John Tang Boyland wrote:
> We get an occasional deadlock happening on Solaris 5.10 using
> OpenAFS 1.4.11. After the problem starts, any attempt to use AFS
> on the machine freezes: For example:
>
> % truss -f touch /afs/not-here
> 15694: execve("/usr/bin/touch", 0x08047E20, 0x08047E2C) argc = 2
> 15694: resolvepath("/usr/lib/ld.so.1", "/lib/ld.so.1", 1023) = 12
> 15694: resolvepath("/usr/bin/touch", "/usr/bin/touch", 1023) = 14
> 15694: sysconfig(_CONFIG_PAGESIZE) = 4096
> 15694: xstat(2, "/usr/bin/touch", 0x08047BF8) = 0
> 15694: open("/var/ld/ld.config", O_RDONLY) = 3
> 15694: fxstat(2, 3, 0x08047B38) = 0
> 15694: mmap(0x00000000, 128104, PROT_READ, MAP_SHARED, 3, 0) = 0xFEFA1000
> 15694: close(3) = 0
> 15694: mmap(0x00000000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1, 0) = 0xFEF90000
> 15694: xstat(2, "/lib/libc.so.1", 0x08047440) = 0
> 15694: resolvepath("/lib/libc.so.1", "/lib/libc.so.1", 1023) = 14
> 15694: open("/lib/libc.so.1", O_RDONLY) = 3
> 15694: mmap(0x00010000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN, 3, 0) = 0xFEF80000
> 15694: mmap(0x00010000, 880640, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEEA0000
> 15694: mmap(0xFEEA0000, 775469, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 3, 0) = 0xFEEA0000
> 15694: mmap(0xFEF6E000, 26855, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_INITDATA, 3, 778240) = 0xFEF6E000
> 15694: mmap(0xFEF75000, 5016, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANON, -1, 0) = 0xFEF75000
> 15694: munmap(0xFEF5E000, 65536) = 0
> 15694: memcntl(0xFEEA0000, 123376, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
> 15694: close(3) = 0
> 15694: munmap(0xFEF80000, 32768) = 0
> 15694: mmap(0x00010000, 24576, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEF80000
> 15694: getcontext(0x080479B0)
> 15694: getrlimit(RLIMIT_STACK, 0x080479A8) = 0
> 15694: getpid() = 15694 [15692]
> 15694: lwp_private(0, 1, 0xFEF82000) = 0x000001C3
> 15694: setustack(0xFEF82060)
> 15694: sysi86(SI86FPSTART, 0xFEF75A58, 0x0000133F, 0x00001F80) = 0x00000001
> 15694: brk(0x08062758) = 0
> 15694: brk(0x08064758) = 0
> 15694: xstat(2, "/usr/lib/locale/en_US.UTF-8/en_US.UTF-8.so.3", 0x08046D08) = 015694: resolvepath("/usr/lib/locale/en_US.UTF-8/en_US.UTF-8.so.3", "/usr/lib/locale/en_US.UTF-8/en_US.UTF-8.so.3", 1023) = 44
> 15694: open("/usr/lib/locale/en_US.UTF-8/en_US.UTF-8.so.3", O_RDONLY) = 3
> 15694: mmap(0x00010000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN, 3, 0) = 0xFEE90000
> 15694: mmap(0x00010000, 2297856, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEC00000
> 15694: mmap(0xFEC00000, 2225278, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 3, 0) = 0xFEC00000
> 15694: mmap(0xFEE2F000, 4234, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_INITDATA, 3, 2224128) = 0xFEE2F000
> 15694: munmap(0xFEE20000, 61440) = 0
> 15694: memcntl(0xFEC00000, 7188, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
> 15694: close(3) = 0
> 15694: xstat(2, "/usr/lib/locale/en_US.UTF-8/methods_en_US.UTF-8.so.3", 0x08046C60) = 0
> 15694: resolvepath("/usr/lib/locale/en_US.UTF-8/methods_en_US.UTF-8.so.3", "/usr/lib/locale/common/methods_unicode.so.3", 1023) = 43
> 15694: open("/usr/lib/locale/en_US.UTF-8/methods_en_US.UTF-8.so.3", O_RDONLY) = 3
> 15694: mmap(0xFEE90000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xFEE90000
> 15694: mmap(0x00000000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1, 0) = 0xFEE80000
> 15694: mmap(0x00010000, 122880, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEE60000
> 15694: mmap(0xFEE60000, 55437, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 3, 0) = 0xFEE60000
> 15694: mmap(0xFEE7D000, 2524, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_INITDATA, 3, 53248) = 0xFEE7D000
> 15694: munmap(0xFEE6E000, 61440) = 0
> 15694: memcntl(0xFEE60000, 2532, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
> 15694: close(3) = 0
> 15694: xstat(2, "/usr/lib/locale/en_US.UTF-8/libc.so.1", 0x08046C60) Err#2 ENOENT
> 15694: munmap(0xFEE90000, 32768) = 0
> 15694: sysconfig(_CONFIG_PAGESIZE) = 4096
>
> FREEZE
>
> On a machine which has not had the problem (yet...) the output continues
> ...
> 24608: sysconfig(_CONFIG_PAGESIZE) = 4096
> 24608: stat64("/afs/not-here", 0x08047C90) Err#2 ENOENT
> 24608: creat64("/afs/not-here", 0666) Err#30 EROFS
> 24608: open("/usr/lib/locale/en_US.UTF-8/LC_MESSAGES/SUNW_OST_OSCMD.mo", O_RDONLY) Err#2 ENOENT
> 24608: fstat64(2, 0x08046EA0) = 0
> 24608: write(2, " t o u c h", 5) = 5
> 24608: write(2, " : ", 2) = 2
> 24608: write(2, " / a f s / n o t - h e r".., 13) = 13
> 24608: write(2, " c a n n o t c r e a".., 15) = 15
> 24608: _exit(1)
>
> In other words, the stat64 call accesses AFS and (on the machine
> with the problem), the thread gets stuck in the AFS tarbaby.
>
> I suspected it was due to logging, so I changed the configuration to
> mount a dedicated partition for /usr/vice/cache, and rebooted. The '
> machine was fine for a month or two, but problem has re-occurred.
>
> The machine is used frequently (it's our main computer server for
> undergraduate classes) but "fortunately" AFS is not very popular here
> so most courses don't use it (partly because of nasty things like
> this happening now and then) and so the machine is still being used
> for non-AFS courses. Hence I hadn't tried to install a newer version
> of OpenAFS. If this is a known bug with OpenAFS, I will indeed
> ask them to take the machine offline long enough to fix this.
> (Political capital and all; I hope people understand.)
>
> I haven't tried to reproduce this bug (and wouldn't want to
> on the computer server!): it only seems to happen on these
> main compute servers -- never on my little research machines... :-(
>
> Any help would be appreciated.
>
> John
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>
>
--
Douglas E. Engert <DEEngert@anl.gov>
Argonne National Laboratory
9700 South Cass Avenue
Argonne, Illinois 60439
(630) 252-5444