[OpenAFS] accessing /afs processes go into device wait

John Sopko sopko@cs.unc.edu
Thu, 8 Nov 2018 14:41:07 -0500


Wow! Removing -afsdb and adding our db servers in the CellServDB seems
to have fixed the problem. Does not make any sense, this machine and
others running many years with -afsdb. And fs listcells works when
-afsdb is used:

% fs listcells
Cell dynroot on hosts.
Cell cs.unc.edu on hosts toucan.cs.unc.edu quail.cs.unc.edu kiwi.cs.unc.edu.

 % host -t AFSDB cs.unc.edu
cs.unc.edu has AFSDB record 1 kiwi.cs.unc.edu.
cs.unc.edu has AFSDB record 1 quail.cs.unc.edu.
cs.unc.edu has AFSDB record 1 toucan.cs.unc.edu.

Thanks for the help. Is this a known issue?


On Thu, Nov 8, 2018 at 1:59 PM Stephan Wiesand <stephan.wiesand@desy.de> wrote:
>
> Have you tried w/o -afsdb?
>
> > On 08 Nov 2018, at 19:48, John Sopko <sopko@cs.unc.edu> wrote:
> >
> > nsswitch and DNS the same, the AFSDB records resolve fine, the
> > /afs/cs.unc.edu cell works fine, just not /afs.
> >
> >
> > On Thu, Nov 8, 2018 at 12:52 PM Stephan Wiesand <stephan.wiesand@desy.de> wrote:
> >>
> >>
> >>> On 8. Nov 2018, at 18:22, John Sopko <sopko@cs.unc.edu> wrote:
> >>>
> >>> I have been running two legacy Redhat 6.x web servers for several
> >>> years. The apache httpd processes started to go into device wait state
> >>> the last few days on one of the servers, the other server is fine,
> >>> both are configured pretty much the same. I tracked this down to the
> >>> web server trying to stat /afs/.htaccess. If I try to do an ls in /afs
> >>> or cat /afs/.htaccess which does not exist, the commands take a long
> >>> time to complete and first go into device wait state, it can take
> >>> several minutes or they may hang indefinitely. The afs file system
> >>> seems to be working fine, just accessing under /afs is the problem. On
> >>> other Redhat 6.x systems accessing /afs is fast and have no problems.
> >>
> >> Are the nsswitch and DNS resolver configurations the same on all systems?
> >> Any differences in network restrictions?
> >> Does it help to run afsd without -afsdb?
> >>
> >> Just a wild guess,
> >>        Stephan
> >>
> >>>
> >>> I am running afsd with:
> >>>
> >>> /usr/vice/etc/afsd -dynroot -fakestat-all -afsdb
> >>>
> >>> Note I tried fakestat-all to see if that would help, I have been
> >>> running just -fakesat, our db servers have afsdb records.
> >>>
> >>> I removed all cells accept for our cell in CellServDB so only have this:
> >>>
> >>> % pwd
> >>> /afs
> >>>
> >>> % ls -l
> >>> total 4
> >>> lrwxr-xr-x 1 root root   10 Dec 31  1969 cs -> cs.unc.edu/
> >>> drwxr-xr-x 8 root root 2048 Mar  6  2015 cs.unc.edu/
> >>> lrwxr-xr-x 1 root root   10 Dec 31  1969 unc -> cs.unc.edu/
> >>>
> >>> I re-formatted the /usr/vice/cache partition and that did not help.
> >>>
> >>> I cannot find any hardware problems, no clues in the syslog or on the
> >>> console, the system disk including the cache is on a raid1/mirror
> >>> disk. This is a Dell server and I run Dell OpenMange which is really
> >>> good at reporting system and especially disk errors.
> >>>
> >>> I am running the same afsd verison on our remaining rhel 6.x servers:
> >>>
> >>> % fs version
> >>> openafs 1.6.22.2
> >>>
> >>> Distributor ID: RedHatEnterpriseWorkstation
> >>> Release:        6.10
> >>>
> >>> The problem is intermittent but goes into device wait most of the
> >>> time, for example the first time ran fine, the second time it took
> >>> 14.96 seconds.
> >>>
> >>> % time ls -l
> >>> total 4
> >>> lrwxr-xr-x 1 root root   10 Dec 31  1969 cs -> cs.unc.edu
> >>> drwxr-xr-x 8 root root 2048 Mar  6  2015 cs.unc.edu
> >>> lrwxr-xr-x 1 root root   10 Dec 31  1969 unc -> cs.unc.edu
> >>> 0.000u 0.000s 0:00.00 0.0%      0+0k 0+0io 0pf+0w
> >>>
> >>> % time ls -l
> >>> total 4
> >>> lrwxr-xr-x 1 root root   10 Dec 31  1969 cs -> cs.unc.edu
> >>> drwxr-xr-x 8 root root 2048 Mar  6  2015 cs.unc.edu
> >>> lrwxr-xr-x 1 root root   10 Dec 31  1969 unc -> cs.unc.edu
> >>> 0.000u 0.000s 0:14.96 0.0%      0+0k 0+0io 0pf+0w
> >>>
> >>> Thanks for any help or ideas to try.
>


-- 
John W. Sopko Jr.
University of North Carolina
Computer Science Dept CB 3175
Chapel Hill, NC 27599-3175

Fred Brooks Building; Room 140
Computer Services Systems Specialist
email: sopko AT cs.unc.edu
phone: 919-590-6144