[OpenAFS] fileserver on etch may crash because ulimit -s 8192

Jose Calhariz jose.calhariz@tagus.ist.utl.pt
Wed, 3 Oct 2007 11:31:59 +0100


--X1bOJ3K7DJ5YkBrT
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Oct 02, 2007 at 10:58:20PM -0700, Russ Allbery wrote:
> Jose Calhariz <jose.calhariz@tagus.ist.utl.pt> writes:
>=20
> > This mailing list knows that in Debian etch where stack is limited by
> > default to 8192, may not be enough to run a fileserver?
>=20
> I'm running ten production AFS file servers for Stanford University on
> Debian etch and have never had a problem with this.  Under what
> circumstances did you run into trouble?
>=20

I have only 2 fileservers foo and bar :-)=20

When server bar started with problems I initiated to move online the
volumes from server bar to server foo.  Before the finish of the move
I had to stop server bar to do a fsck on /vicepa that failed.  While I
was trying to make fsck succeed on bar, foo does the programed restart
on 4 am of Sunday.

On the morning of Sunday I found the two fileservers down, my extra 3
DB/Mail servers with problems because the mail server had started to
many process, and last but not least my backup server couldn't start=20
because of the afs client.  It was stopping on the launch of afsd.

On foo server I didn't find any good message error message on
/var/log/openafs behind the Salvage that finished with success.
Whenever I had done /etc/init.d/openafs-fileserver stop and start the
foo server went into Salvage and in the end I couldn't get a "vos
listvol foo".

Restarting the 3 extra DB/Mail servers solved problems with the backup
server.   After this I tried a hint I found in the Internet, someone
with the same problem like I had with foo server, said ulimit -s
8192 was not enough and would bug report to Debian.  So I have done
an "ulimit -s unlimited" on shell and started one more time the
fileserver.  This time after a successful salvage I had the volumes
online.=20

I didn't found any bug report on BTS or on the changelog about this
issue.  So I am asking here.  As more people could had this same issue
on other Linux distributions or Unix.

Maybe my problem was the 3 extra DB server with problems, as I didn't
had enough DB servers for quorum, I had maybe 1 or 2 DB servers out of
5. =20

     Jos=E9 Calhariz



--=20
P.S. [En_US] The sig below is from my random sig-generator, which strangely
often seems to pick signatures which are apropriate to the message at
hand!

P.S. [Pt_Pt] A assinatura em baixo =E9 do gerador aleat=F3rio de
assinaturas, que estranhamente, escolhe com frequ=EAncia assinaturas que
parecem apropriadas ao email!
--
Onde quer que voc=EA esteja, voc=EA sempre estar=E1 l=E1!

--X1bOJ3K7DJ5YkBrT
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHA2+fQlvqh9sPbBoRAobcAJ9NHXDefsa/FzwVGWhq89MC0QYoFQCgoGVo
9jg5cKuWTn5nzI0xZfhxOrY=
=57NL
-----END PGP SIGNATURE-----

--X1bOJ3K7DJ5YkBrT--