[OpenAFS] fileserver on etch may crash because ulimit -s 8192
Jose Calhariz
jose.calhariz@tagus.ist.utl.pt
Wed, 3 Oct 2007 11:31:59 +0100
--X1bOJ3K7DJ5YkBrT
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Tue, Oct 02, 2007 at 10:58:20PM -0700, Russ Allbery wrote:
> Jose Calhariz <jose.calhariz@tagus.ist.utl.pt> writes:
>=20
> > This mailing list knows that in Debian etch where stack is limited by
> > default to 8192, may not be enough to run a fileserver?
>=20
> I'm running ten production AFS file servers for Stanford University on
> Debian etch and have never had a problem with this. Under what
> circumstances did you run into trouble?
>=20
I have only 2 fileservers foo and bar :-)=20
When server bar started with problems I initiated to move online the
volumes from server bar to server foo. Before the finish of the move
I had to stop server bar to do a fsck on /vicepa that failed. While I
was trying to make fsck succeed on bar, foo does the programed restart
on 4 am of Sunday.
On the morning of Sunday I found the two fileservers down, my extra 3
DB/Mail servers with problems because the mail server had started to
many process, and last but not least my backup server couldn't start=20
because of the afs client. It was stopping on the launch of afsd.
On foo server I didn't find any good message error message on
/var/log/openafs behind the Salvage that finished with success.
Whenever I had done /etc/init.d/openafs-fileserver stop and start the
foo server went into Salvage and in the end I couldn't get a "vos
listvol foo".
Restarting the 3 extra DB/Mail servers solved problems with the backup
server. After this I tried a hint I found in the Internet, someone
with the same problem like I had with foo server, said ulimit -s
8192 was not enough and would bug report to Debian. So I have done
an "ulimit -s unlimited" on shell and started one more time the
fileserver. This time after a successful salvage I had the volumes
online.=20
I didn't found any bug report on BTS or on the changelog about this
issue. So I am asking here. As more people could had this same issue
on other Linux distributions or Unix.
Maybe my problem was the 3 extra DB server with problems, as I didn't
had enough DB servers for quorum, I had maybe 1 or 2 DB servers out of
5. =20
Jos=E9 Calhariz
--=20
P.S. [En_US] The sig below is from my random sig-generator, which strangely
often seems to pick signatures which are apropriate to the message at
hand!
P.S. [Pt_Pt] A assinatura em baixo =E9 do gerador aleat=F3rio de
assinaturas, que estranhamente, escolhe com frequ=EAncia assinaturas que
parecem apropriadas ao email!
--
Onde quer que voc=EA esteja, voc=EA sempre estar=E1 l=E1!
--X1bOJ3K7DJ5YkBrT
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFHA2+fQlvqh9sPbBoRAobcAJ9NHXDefsa/FzwVGWhq89MC0QYoFQCgoGVo
9jg5cKuWTn5nzI0xZfhxOrY=
=57NL
-----END PGP SIGNATURE-----
--X1bOJ3K7DJ5YkBrT--