[OpenAFS-devel] Retry transaction creates on transient problems
Rainer Toebbicke
rtb@pclella.cern.ch
Thu, 16 Apr 2009 17:11:59 +0200
--------------040707020001050208090405
Content-Type: text/plain; charset="ISO-8859-15"; format=flowed
Content-Transfer-Encoding: 7bit
The attached patch causes a transient failure to create a volume transaction
to be retried, brutally three times in 1 sec intervals.
The problem usually only affects servers with ten-thousands of volumes, where
a simple "vos listvol" could easily disturb a simultaneous "vos backupsys", or
one out of two simultaneous "vos listvols" could print thousands of error
messages depending on how they race.
Note: The patch "undoes" another patch (and its correction) in that area -
that's not elegant but ok as it predates that patch and fixes both problems in
one go, for the first one in a slightly different manner.
bcc'ed to openafs-bugs
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rainer Toebbicke
European Laboratory for Particle Physics(CERN) - Geneva, Switzerland
Phone: +41 22 767 8985 Fax: +41 22 767 7155
--------------040707020001050208090405
Content-Type: text/plain; name="p_voltrans_retry"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline; filename="p_voltrans_retry"
Index: openafs/src/volser/voltrans.c
===================================================================
RCS file: /cvs/openafs/src/volser/voltrans.c,v
retrieving revision 1.10.2.5
diff -u -r1.10.2.5 voltrans.c
--- openafs/src/volser/voltrans.c 18 Oct 2008 19:26:17 -0000 1.10.2.5
+++ openafs/src/volser/voltrans.c 16 Apr 2009 14:21:37 -0000
@@ -75,21 +75,31 @@
NewTrans(afs_int32 avol, afs_int32 apart)
{
/* set volid, next, partition */
- struct volser_trans *tt, *newtt;
+ struct volser_trans *tt;
struct timeval tp;
struct timezone tzp;
+ int retries = 3;
- newtt = (struct volser_trans *)malloc(sizeof(struct volser_trans));
VTRANS_LOCK;
/* don't allow the same volume to be attached twice */
- for (tt = allTrans; tt; tt = tt->next) {
- if ((tt->volid == avol) && (tt->partition == apart)) {
- VTRANS_UNLOCK;
- free(newtt);
- return (struct volser_trans *)0; /* volume busy */
+ while (1) {
+ for (tt = allTrans; tt; tt = tt->next) {
+ if ((tt->volid == avol) && (tt->partition == apart)) break;
}
+ if (tt == NULL) break;
+
+ VTRANS_UNLOCK;
+ if (retries-- > 0)
+#ifdef AFS_PTHREAD_ENV
+ sleep(1); /* Allow for short lock-ups */
+#else
+ IOMGR_Sleep(1); /* Allow for short lock-ups */
+#endif /*AFS_PTHREAD_ENV*/
+ else
+ return (struct volser_trans *)0; /* volume busy */
+ VTRANS_LOCK;
}
- tt = newtt;
+ tt = (struct volser_trans *)malloc(sizeof(struct volser_trans));
memset(tt, 0, sizeof(struct volser_trans));
tt->volid = avol;
tt->partition = apart;
--------------040707020001050208090405--