[OpenAFS] broken callbacks
Mark Vitale
mvitale@sinenomine.net
Wed, 20 Apr 2016 18:55:22 +0000
On Apr 20, 2016, at 11:54 AM, Andreas Hirczy <ahi@itp.tugraz.at> wrote:
> In that context 3 questions regarding nFEs and nCBs entered my mind:
>=20
> - What's the meaning of those?
nFEs is the number of FileEntry table entries currently in use by the files=
erver.
One entry is used for each unique file (vnode) the fileserver has given cal=
lbacks for.
nCBs is the number of CallBack table entries currently in use by the filese=
rver.
One entry is used for each callback the fileserver has given out.
nblks is the total size (number of slots) in each table, as set by the file=
server -cb parm.
> - I found some information about nFEs and nCBs in
> <https://www.openafs.org/pages/newsletter/newsletter-2013-03-volume004-=
issue05.html#openafs_tuning__part_i__fileservers__general>:
> "If nFEs or nCBs ever exceeds nblks, that is when the fileserver runs
> out of callbacks." I found that those metrices have a similar
> behaviour, but are usually not the same.
This is correct.
> Should I consider to store just "max(nFEs, nCBs)" or can I learn
> something from this difference?
You could store both; often a single file may have multiple callbacks.
> - Sometimes I see a spike in the usage of those values, e.g.
> <https://itp.tugraz.at/~ahi/privat/OpenAFS/graph_nCBs_2016-04-20.png>
> I can find out about volumes if I turn the debug level of the
> fileserver processes up 3 times, but this uses quite a bit of space
> to leave turned on permanently. Is there some easy accessible data on
> the (historic) distribution of callback on volumes?
You could use the fileserver -auditlog to track accesses by fid (which incl=
udes the volume id).
Or for grand totals, you can continue to store the information you are gett=
ing from xstat_fs_test.
> I'd rather prefer not to increase -cb, since this seems to be not our
> usual usage pattern.
> ----
> Wed Apr 20 12:59:57 2016 We have run out of callback space; forcing
> callback revocation. This suggests the
> fileserver is configured with insufficient
> callbacks; you probably want to increase the
> -cb fileserver parameter (current setting:
> 1048576). The fileserver will continue to
> operate, but this may indicate a severe
> performance problem
The GetSomeSpace_r routine issues this log message; it is only called when =
either the CB or FE freelist is exhausted (that is, the count has reached n=
blks). This routine also increments the GotSomeSpaces counter, which is i=
n the same xstat_fs_test collection as nCBs, nFEs, and nblks. If you are =
running close to the edge on callbacks, you should definitely track that co=
unter, and ideally you want it to always remain zero.
Regards,
--
Mark Vitale
Sine Nomine Associates