[OpenAFS] broken callbacks

Mark Vitale mvitale@sinenomine.net
Wed, 20 Apr 2016 18:55:22 +0000


On Apr 20, 2016, at 11:54 AM, Andreas Hirczy <ahi@itp.tugraz.at> wrote:
> In that context 3 questions regarding nFEs and nCBs entered my mind:
>=20
> - What's the meaning of those?

nFEs is the number of FileEntry table entries currently in use by the files=
erver.
One entry is used for each unique file (vnode) the fileserver has given cal=
lbacks for.

nCBs is the number of CallBack table entries currently in use by the filese=
rver.
One entry is used for each callback the fileserver has given out.

nblks is the total size (number of slots) in each table, as set by the file=
server -cb parm.

> - I found some information about nFEs and nCBs in
>   <https://www.openafs.org/pages/newsletter/newsletter-2013-03-volume004-=
issue05.html#openafs_tuning__part_i__fileservers__general>:
>   "If nFEs or nCBs ever exceeds nblks, that is when the fileserver runs
>   out of callbacks." I found that those metrices have a similar
>   behaviour, but are usually not the same.

This is correct.

>   Should I consider to store just "max(nFEs, nCBs)" or can I learn
>   something from this difference?
You could store both; often a single file may have multiple callbacks.

> - Sometimes I see a spike in the usage of those values, e.g.
>   <https://itp.tugraz.at/~ahi/privat/OpenAFS/graph_nCBs_2016-04-20.png>
>   I can find out about volumes if I turn the debug level of the
>   fileserver processes up 3 times, but this uses quite a bit of space
>   to leave turned on permanently. Is there some easy accessible data on
>   the (historic) distribution of callback on volumes?

You could use the fileserver -auditlog to track accesses by fid (which incl=
udes the volume id).

Or for grand totals, you can continue to store the information you are gett=
ing from xstat_fs_test.

>   I'd rather prefer not to increase -cb, since this seems to be not our
>   usual usage pattern.
>   ----
>   Wed Apr 20 12:59:57 2016 We have run out of callback space; forcing
>                            callback revocation. This suggests the
>                            fileserver is configured with insufficient
>                            callbacks; you probably want to increase the
>                            -cb fileserver parameter (current setting:
>                            1048576). The fileserver will continue to
>                            operate, but this may indicate a severe
>                            performance problem
The GetSomeSpace_r routine issues this log message; it is only called when =
either the CB or FE freelist is exhausted (that is, the count has reached n=
blks).   This routine also increments the GotSomeSpaces counter, which is i=
n the same xstat_fs_test collection as nCBs, nFEs, and nblks.   If you are =
running close to the edge on callbacks, you should definitely track that co=
unter, and ideally you want it to always remain zero.

Regards,
--
Mark Vitale
Sine Nomine Associates