[OpenAFS] Re: File creation delays

John W. Sopko Jr. sopko@cs.unc.edu
Wed, 17 Mar 2010 14:43:30 -0400


Andrew Deason wrote, On 3/17/2010 2:02 PM:
> On Wed, 17 Mar 2010 13:42:10 -0400
> "John W. Sopko Jr."<sopko@cs.unc.edu>  wrote:
>
>>> Is there anything in FileLog around this time, possibly about
>>> failing to break callbacks on a certain host?
>>
>> No clues, one of the first places I looked.
>
> If this fileserver is not under any load, turning up the debug level can
> tell you if it's the fileserver that's being slow and (probably) why.
> Sending the fileserver process a TSTP signal 4 times will get you a ton
> of debug information in FileLog. Looking around for SAFS_RemoveDir or
> SAFS_MakeDir should tell you what's going on. Send it a HUP to stop
> spewing debug info.

Here is the strace command and the output of the FileLog trace, thanks for
your help. I did a "rmdir try" and the strace command complained
"No such device", if I "fs flushvolume" on the client it shows the
directory got removed else it shows it still there.

Output from the "strace -tt rmdir try" command on the client host IP 
152.2.140.200:

14:25:39.851544 rmdir("try")            = -1 ENODEV (No such device)
14:25:57.908574 write(2, "rmdir: ", 7rmdir: )  = 7
14:25:57.908721 write(2, "try", 3try)      = 3
14:25:57.908849 open("/usr/share/locale/locale.alias", O_RDONLY) = 3
14:25:57.908955 fstat(3, {st_mode=S_IFREG|0644, st_size=2528, ...}) = 0
14:25:57.909092 mmap(NULL, 4096, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b394ab7a000
14:25:57.909208 read(3, "# Locale name alias data base.\n#"..., 4096) = 2528
14:25:57.909369 read(3, "", 4096)       = 0
14:25:57.909479 close(3)                = 0
14:25:57.909588 munmap(0x2b394ab7a000, 4096) = 0
14:25:57.909711 open("/usr/share/locale/en_US/LC_MESSAGES/libc.mo", O_RDONLY) 
= -1 ENOENT (No such file or directory)
14:25:57.909827 open("/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = 
-1 ENOENT (No such file or directory)
14:25:57.909928 write(2, ": No such device", 16: No such device) = 16
14:25:57.910061 write(2, "\n", 1)       = 1

Here is the file server log "kill TSTP" 4 times during the same time,
the client IP is 152.2.140.200

Wed Mar 17 14:25:39 2010 [12] SAFS_RemoveDir    try,  Did = 536884167.59.4927, 
Host 152.2.140.200:7001, Id 3903
Wed Mar 17 14:25:39 2010 [12] BCB: BreakCallBack(all but 152.2.140.200:7001, 
(536884167,59,4927))
Wed Mar 17 14:25:40 2010 [14] SAFS_GetCapabilties
Wed Mar 17 14:25:40 2010 [14] FindClient: authenticating connection: authClass=0
Wed Mar 17 14:25:40 2010 [10] SAFS_FetchStatus,  Fid = 1769560786.8714.9111, 
Host 152.2.131.239:7001, Id 32766
Wed Mar 17 14:25:40 2010 [10] SAFS_FetchStatus returns 0
Wed Mar 17 14:25:40 2010 [8] SAFS_GetCapabilties
Wed Mar 17 14:25:40 2010 [8] FindClient: authenticating connection: authClass=0
Wed Mar 17 14:25:43 2010 [6] SAFS_GetCapabilties
Wed Mar 17 14:25:43 2010 [6] FindClient: authenticating connection: authClass=0
Wed Mar 17 14:25:46 2010 [9] SAFS_GetCapabilties
Wed Mar 17 14:25:46 2010 [9] FindClient: authenticating connection: authClass=0
Wed Mar 17 14:25:46 2010 [11] SAFS_GetCapabilties
Wed Mar 17 14:25:46 2010 [11] FindClient: authenticating connection: authClass=0
Wed Mar 17 14:25:47 2010 [7] SAFS_ExtendLock Fid = 536878631.42038.185826, 
Host 152.2.128.40:7001, Id 1269
Wed Mar 17 14:25:47 2010 [7] SAFS_ExtendLock returns 0
Wed Mar 17 14:25:47 2010 [12] Starting multibreakcall back on all addr for 
host 152.2.140.115
Wed Mar 17 14:25:48 2010 [4] SAFS_ExtendLock Fid = 536874899.6362.427923, Host 
152.2.140.129:7001, Id 1291
Wed Mar 17 14:25:48 2010 [4] SAFS_ExtendLock returns 0
Wed Mar 17 14:25:48 2010 [3] SAFS_ExtendLock Fid = 536874899.6550.427914, Host 
152.2.140.129:7001, Id 1291
Wed Mar 17 14:25:48 2010 [3] SAFS_ExtendLock returns 0
Wed Mar 17 14:25:51 2010 [13] SAFS_GetCapabilties
Wed Mar 17 14:25:51 2010 [13] FindClient: authenticating connection: authClass=0
Wed Mar 17 14:25:51 2010 [14] SAFS_GetCapabilties
Wed Mar 17 14:25:51 2010 [14] FindClient: authenticating connection: authClass=0
Wed Mar 17 14:25:51 2010 [10] SAFS_GetTime returns 1268850351, 919888
Wed Mar 17 14:25:54 2010 [12] BCB: Failed on file 536884167.59.4927, Host 
152.2.140.115:7001 is down
Wed Mar 17 14:25:54 2010 [12] SAFS_RemoveDir    returns 0
Wed Mar 17 14:25:57 2010 [8] SAFS_FetchStatus,  Fid = 1769568877.1.1, Host 
152.2.132.230:7001, Id 32766
Wed Mar 17 14:25:57 2010 [8] SAFS_FetchStatus returns 0
Wed Mar 17 14:25:57 2010 [6] SAFS_FetchStatus,  Fid = 1769568877.9.10, Host 
152.2.132.230:7001, Id 32766
Wed Mar 17 14:25:57 2010 [6] SAFS_FetchStatus returns 0
Wed Mar 17 14:25:57 2010 [9] SAFS_RemoveDir     try,  Did = 536884167.59.4927, 
Host 152.2.140.200:7001, Id 3903
Wed Mar 17 14:25:57 2010 [9] SAFS_RemoveDir     returns 2


-- 
John W. Sopko Jr.               University of North Carolina
email: sopko AT cs.unc.edu      Computer Science Dept., CB 3175
Phone: 919-962-1844             Fred Brooks Building; Room 140
Fax:   919-962-1799             Chapel Hill, NC 27599-3175