[OpenAFS-devel] delays and lost contact with fileserver with 1.3.84 and higher

Tom Keiser tkeiser@gmail.com
Tue, 1 Nov 2005 14:13:50 -0400


------=_Part_36257_7148196.1130868830944
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

On 10/31/05, Alexander Bergolth <leo@strike.wu-wien.ac.at> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 10/31/2005 02:17 PM, Jeffrey Altman wrote:
> > Alexander Bergolth wrote:
> >>On 10/31/2005 10:11 AM, Harald Barth wrote:
> >>>>Chas wrote to me, but I think that's more useful for you.
> >>>>
> >>>>>>I'd like to hear more about the changes to rx that were made betwee=
n
> >>>>>>82 and 84, what was the intended outcome?
> >>>>>
> >>>>>there does seem to be one set of changes that is outside the scope o=
f the
> >>>>>rx packet queue changes.  it a long shot, but you could revert it an=
d see
> >>>>>if that  helps.
> >>>>>
> >>>>>http://www.openafs.org/cgi-bin/cvsweb.cgi/openafs/src/rx/rx.c.diff?r=
1=3D1.22.2.30&r2=3D1.22.2.31
> >>>>>
> >>>>>   DELTA STABLE12-rx-makecall-race-fix-20050518 AUTHOR
> >
> > This patch is most certainly not related to your problems.  The patch
> > removes a race condition that allowed rx threads to sleep forever.
>
> It is this patch that causes the stalls on my system:
>
> http://www.openafs.org/cgi-bin/cvsweb.cgi/openafs/src/rx/rx.c#rev1.58.2.1=
9
>
> - -------------------- snipp! --------------------
> DELTA STABLE14-rx-fpq-bulk-free-20050529
> AUTHOR tkeiser@psu.edu
> FIXES 19027
>
> After profiling RX for a while, I've found a few more bottlenecks in the
> packet handling code.  This patch addresses a couple of these issues.
> The major change in this patch is a new API to allow bulk packet
> alloc/free ops on rx_queue's of packets.  Benefits include reduced lock
> contention on rx_freePktQ_lock, elimination of a lot of unnecessary cache
> line invalidates, and reduced register window thrashing on sparc.
>
> In addition, this patch dedicates one rx_packet per thread to rxi_SendAck=
,
> since that function is in the critical path, and represents a large
> percentage of execution time.
> - -------------------- snipp! --------------------
>
> http://www.openafs.org/cgi-bin/cvsweb.cgi/openafs/src/rx/rx.c.diff?r1=3D1=
.58.2.18&r2=3D1.58.2.19
>
>
> I reverted it in 1.3.84 and now the delays are gone.
>

Could you try the attached patch on a pristine >=3D 1.3.84 tree?  I
can't reproduce your bug, but if you're hitting this code path, then
the math error this patch fixes may help your problem.

Regards,

--
Tom Keiser
tkeiser@gmail.com

------=_Part_36257_7148196.1130868830944
Content-Type: text/plain; name=rx-allocator-fix.patch.txt; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="rx-allocator-fix.patch.txt"

diff -uNr openafs-cvs-current/src/rx/rx_packet.c openafs-cvs-current-rx-allocator-fixes/src/rx/rx_packet.c
--- openafs-cvs-current/src/rx/rx_packet.c	2005-10-31 16:38:33.000000000 -0500
+++ openafs-cvs-current-rx-allocator-fixes/src/rx/rx_packet.c	2005-10-31 16:41:01.848215088 -0500
@@ -381,6 +381,8 @@
     register struct rx_packet *c, *nc;
     SPLVAR;
 
+    osi_Assert(num_pkts >= 0);
+
     if (!num_pkts) {
 	queue_Count(q, c, nc, rx_packet, num_pkts);
 	if (!num_pkts)
@@ -413,6 +415,8 @@
     register struct rx_packet *p, *np;
     SPLVAR;
 
+    osi_Assert(num_pkts >= 0);
+
     if (!num_pkts) {
         for (queue_Scan(q, p, np, rx_packet), num_pkts++) {
             RX_FPQ_MARK_FREE(p);
@@ -2548,7 +2552,7 @@
 	queue_Init(&q);
 
 	/* Free any extra elements in the wirevec */
-	for (j = MAX(2, i), nb = j - p->niovecs; j < p->niovecs; j++) {
+	for (j = MAX(2, i), nb = p->niovecs - j; j < p->niovecs; j++) {
 	    queue_Append(&q,RX_CBUF_TO_PACKET(p->wirevec[j].iov_base, p));
 	}
 	if (nb)



------=_Part_36257_7148196.1130868830944--