[OpenAFS] fileserver on etch may crash because ulimit -s 8192
Jose Calhariz
jose.calhariz@tagus.ist.utl.pt
Thu, 4 Oct 2007 03:19:42 +0100
--jI8keyz6grp/JLjh
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Wed, Oct 03, 2007 at 05:43:06PM -0700, Russ Allbery wrote:
> Jose Calhariz <jose.calhariz@tagus.ist.utl.pt> writes:
>=20
> > I had an error message from the reiserfs, on Thursday night. But the
> > corruption went bigger, and more and more volumes were going offline.
> > So I stop it to do an fsck. That't when the fsck failed. I didn't
> > stopped the fileserver on Friday because was production hours. Maybe my
> > killing mistake.
>=20
> Ah, okay.
>=20
> I definitely recommend against using ReiserFS for any production purposes
> (completely apart from whether you use AFS or not).
I don't know what happen. I have only two leads. One IO error
message from reiserfs on the begin of everything. And after the loss
I found a strange behavior with the hardware RAID5. I need to do
further investigation.
And most important I learned I don't know enough about reiserfs guts.
So I really don't understand the error messages from reiserfsck. I
will move into ext3, that I know very well, or XFS, I have a local
expert that can to help in case o trouble with XFS.
I remember see an online presentation from an AFS workshop were XFS
was considered best than ext3 for /vicep partitions.
>=20
> > I can be wrong, but I need to use my root.afs. I need a link on /afs as
> > a shortcut for my cellname. So I can't use -dynroot on some clients.
> > Correct me if I am wrong.
>=20
> This is what the CellAlias configuration file is for. It's hard to tell
> exactly why the client didn't work; it doesn't sound like you have much
> information about what failed or what could have been happening.
Thank you. I didn't know about that file.
>=20
> > I am talking by memory, as I didn't saved the log files. I had seen
> > messages of exit with various numbers, 0, 1 and maybe 15. No core file,
> > how do I enable core files?
>=20
> Make sure that you don't have core limit size limited when you start the
> file server and they should happen automatically if the file server
> actually fails. =20
Ok, I have by default "ulimit -c 0". I don't depend on core files for
so many years I forget about ulimit -c 0. Now I am a sysadm not a
programmer. I only program in bash and install gdb for other people
to use, not for myself :-)
> But if you don't have any exit status other than 0, 1,
> and 15, the file server isn't failing. Which again raises the question of
> what the problem actually is.
>=20
> If the file server is not existing with any status other than those three,
> I'm 99% certain that the stack limit is not an issue for you. What I
> would expect, were it to run into a stack limit, would be a bus error or
> segfault.
I have restarted my fileserver. No problem this time with "ulimit -s
8192". So I think you are right. My last 3 VLDB servers were in
trouble on that day and were creating more problems everywhere. The
salvage was taking 40 minutes, so I had time to solve the other
problems before I put all my efforts on the last one. The failing
file server.=20
Thank you for your help on this issue.
--=20
P.S. [En_US] The sig below is from my random sig-generator, which strangely
often seems to pick signatures which are apropriate to the message at
hand!
P.S. [Pt_Pt] A assinatura em baixo =E9 do gerador aleat=F3rio de
assinaturas, que estranhamente, escolhe com frequ=EAncia assinaturas que
parecem apropriadas ao email!
--
A vantagem de ser milion=E1rio =E9 poder falar o que ser quer, para quem se=
quer e como se quer
--Pr=EDncipe Johannes von Thurn und
--jI8keyz6grp/JLjh
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFHBE2+Qlvqh9sPbBoRAmjWAJwKyfk80Wt1O0aPUySGTkbCMHb/AQCgxbsw
1+jl5d6AsYMfswMPoPNIpWk=
=UEIK
-----END PGP SIGNATURE-----
--jI8keyz6grp/JLjh--