[OpenAFS-devel] reproducible afsd/libafs lockup

Neulinger, Nathan nneul@umr.edu
Fri, 22 Mar 2002 10:27:52 -0600


Here are the two fstraces for these ops:

time 600.465456, pid 4834: Access vp 0xe0ab2f68 mode 0xc0 len (0x0,
0x800)
time 600.465456, pid 4834: Symlink dir 0xe0ab2f68 link l0
time 600.465456, pid 4834: GetdCache vp 0xe0ab2f68 dcache 0xe11f7480
dcache low-version 0x2b8, vcache low-version 0x2b8
time 600.465456, pid 4834: GetdCache tlen 0x800 flags 0x1 abyte (0x0,
0x0) Position (0x0, 0x0)
time 600.465456, pid 4834: Analyze RPC op 9 conn 0xd7549d00 code 0x0
user 0x419b488a
time 600.465456, pid 4834: ProcessFS vp 0xe0b1aaf0 old len (0x0, 0x0)
new len (0x0, 0x400)
time 600.465456, pid 4838: Access vp 0xe0a9d000 mode 0x40 len (0x0,
0x800)



time 600.495453, pid 4834: Access vp 0xe0ab2f68 mode 0xc0 len (0x0,
0x800)
time 600.495453, pid 4834: Symlink dir 0xe0ab2f68 link l0
time 600.495453, pid 4834: GetdCache vp 0xe0ab2f68 dcache 0xe11f7480
dcache low-version 0x2ba, vcache low-version 0x2ba
time 600.495453, pid 4834: GetdCache tlen 0x800 flags 0x1 abyte (0x0,
0x0) Position (0x0, 0x0)
time 600.495453, pid 4834: Analyze RPC op 9 conn 0xd7549d00 code
0xfffffe3e user 0x419b488a
time 601.495348, pid 4834: Analyze RPC op 9 conn 0x0 code 0xffffffff
user 0x419b488a
time 601.495348, pid 4834: Returning code -1 from 31
time 601.495348, pid 4841: Access vp 0xe0a9d000 mode 0x40 len (0x0,
0x800)
time 601.495348, pid 4841: Access vp 0xe0a9d410 mode 0x40 len (0x0,
0x1000)


First one was 1024, second was 1025.  The second Analyze RPC line looks
odd on the 1025 test.

-- Nathan

------------------------------------------------------------
Nathan Neulinger                       EMail:  nneul@umr.edu
University of Missouri - Rolla         Phone: (573) 341-4841
Computing Services                       Fax: (573) 341-4216


> -----Original Message-----
> From: Neulinger, Nathan=20
> Sent: Friday, March 22, 2002 10:15 AM
> To: openafs-devel@openafs.org
> Subject: RE: [OpenAFS-devel] reproducible afsd/libafs lockup
>=20
>=20
> Ok. I narrowed it down. Attempting to make a symlink with=20
> target length
> > 1024 will immediately cause client to lose contact with=20
> file server on
> linux.=20
>=20
> This appears to be linux specific, and is not specific to=20
> recent client
> builds, even happening on an old 2.2.x build I have.
>=20
> -- Nathan
>=20
> ------------------------------------------------------------
> Nathan Neulinger                       EMail:  nneul@umr.edu
> University of Missouri - Rolla         Phone: (573) 341-4841
> Computing Services                       Fax: (573) 341-4216
>=20
>=20
> > -----Original Message-----
> > From: Neulinger, Nathan=20
> > Sent: Friday, March 22, 2002 9:27 AM
> > To: openafs-devel@openafs.org
> > Subject: RE: [OpenAFS-devel] reproducible afsd/libafs lockup
> >=20
> >=20
> > Appears this test causes it:
> >=20
> > troot-srvtst07(132)> fs checks ; ./fsstress -v -n 8 -p 1 -d
> > /umr/u/nneul/fsstress/
> > All servers are running.
> > seed =3D 1016547366
> > 0/0: dwrite - no filename
> > 0/1: chown . 7536 0
> > 0/2: creat f0 x:0 0 0
> > 0/3: symlink l1 0
> > 0/4: fdatasync - no filename
> > 0/5: symlink l2 110
> > 0/6: truncate - no filename
> > 0/7: creat f3 x:0 110 0
> >=20
> > I'm trying to track down a more specific trigger.
> >=20
> > -- Nathan
> >=20
> > ------------------------------------------------------------
> > Nathan Neulinger                       EMail:  nneul@umr.edu
> > University of Missouri - Rolla         Phone: (573) 341-4841
> > Computing Services                       Fax: (573) 341-4216
> >=20
> >=20
> > > -----Original Message-----
> > > From: Neulinger, Nathan=20
> > > Sent: Friday, March 22, 2002 9:15 AM
> > > To: openafs-devel@openafs.org
> > > Subject: RE: [OpenAFS-devel] reproducible afsd/libafs lockup
> > >=20
> > >=20
> > > Interesting... I grabbed fsstress from kolya's web page=20
> and started
> > > running it on this station. The instant I start running it=20
> > against an
> > > afs directory that client loses contact with the server=20
> > that the test
> > > afs dir is located on. Running fs checks regains connection.=20
> > > And that is
> > > with a -p 1 test. Haven't even tried the -p # for larger #.
> > >=20
> > > -- Nathan
> > >=20
> > > ------------------------------------------------------------
> > > Nathan Neulinger                       EMail:  nneul@umr.edu
> > > University of Missouri - Rolla         Phone: (573) 341-4841
> > > Computing Services                       Fax: (573) 341-4216
> > >=20
> > >=20
> > > > -----Original Message-----
> > > > From: Neulinger, Nathan=20
> > > > Sent: Friday, March 22, 2002 8:51 AM
> > > > To: openafs-devel@openafs.org
> > > > Subject: [OpenAFS-devel] reproducible afsd/libafs lockup
> > > >=20
> > > >=20
> > > > Have not dug into this much yet, but with recent (and=20
> > maybe old, not
> > > > sure since I don't have a machine I can hose at the=20
> moment that is
> > > > running old code) builds, I can trigger a real quick complete=20
> > > > cm lockup
> > > > by doing this in a high level directory in my cell. (i.e.=20
> > a dir with
> > > > alot of stuff under it).
> > > >=20
> > > > find . -follow -type f -print | xargs -P 8 -n 30 wc
> > > >=20
> > > > This one is a different symptom and situation from the=20
> > other problem
> > > > I've been talking about. In that problem, you can still=20
> > talk to the
> > > > cache manager with cmdebug and fs. With this one, the cm=20
> > is totally
> > > > non-responsive.=20
> > > >=20
> > > > -- Nathan
> > > >=20
> > > > ------------------------------------------------------------
> > > > Nathan Neulinger                       EMail:  nneul@umr.edu
> > > > University of Missouri - Rolla         Phone: (573) 341-4841
> > > > Computing Services                       Fax: (573) 341-4216
> > > > _______________________________________________
> > > > OpenAFS-devel mailing list
> > > > OpenAFS-devel@openafs.org
> > > > https://lists.openafs.org/mailman/listinfo/openafs-devel
> > > >=20
> > > _______________________________________________
> > > OpenAFS-devel mailing list
> > > OpenAFS-devel@openafs.org
> > > https://lists.openafs.org/mailman/listinfo/openafs-devel
> > >=20
> > _______________________________________________
> > OpenAFS-devel mailing list
> > OpenAFS-devel@openafs.org
> > https://lists.openafs.org/mailman/listinfo/openafs-devel
> >=20
> _______________________________________________
> OpenAFS-devel mailing list
> OpenAFS-devel@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-devel
>=20