1.2.9 unstable ? (was [OpenAFS] inaccessibble volume - please help)

Rudolph T Maceyko rtm@cert.org
Thu, 22 May 2003 13:34:02 -0400


--On Wednesday, May 21, 2003 23:56:08 -0400 Derrick J Brashear 
<shadow@dementia.org> wrote:

>> FTR, your "umount.diff" patch has fixed the problems we were seeing
>> with shutdown hanging.  We saw it mostly under Red Hat 7.3, but have
>> also applied the patch to Red Hat 9 boxes.  (We aren't running any 8
>> boxes.)
>
> Perchance do you know if "umount /afs" succeeded or failed?
> If it failed, afsd -shutdown would return EACCES and never trigger the
> code path which I believe is suspect.

I'm going back over console logs now to see whether there have been any 
shutdown/umount problems w/o hangs.

/afs busy + stock 1.2.9 = hung system at shutdown:

  Stopping AFS services.....
  umount: /afs: device is busy
  libafs-2.4.18-27.7.x-i686: Device or resource busy
    .
    .
    .
  Shutting down interface eth0:  [  OK  ]
  Shutting down loopback interface:  [  OK  ]
  Starting killall:  [  OK  ]
  Sending all processes the TERM signal...
  Sending all processes the KILL smd: recovery thread got woken up ...
  ignal... md: recovery thread finished ...

  Syncing hardware clock to system time afs: Lost contact with file 
server a.b.c.14 in cell cert.org (all multi-homed ip addresses down for 
the server)
  afs: Lost contact with file server a.b.c.14 in cell cert.org (all 
multi-homed ip addresses down for the server)

  Turning off swap:
  Turning off quotas:
  Unmounting file systems:  umount2: Device or resource busy
  umount: AFS: not found
  umount: /afs: Illegal seek

  afs: Lost contact with file server a.b.c.15 in cell cert.org (all 
multi-homed ip addresses down for the server)
  afs: Lost contact with file server a.b.c.15 in cell cert.org (all 
multi-homed ip addresses down for the server)
  afs: Lost contact with volume location server a.b.c.11 in cell 
cert.org
  afs: Lost contact with volume location server a.b.c.11 in cell 
cert.org
  afs: Lost contact with volume location server a.b.c.13 in cell 
cert.org
  afs: Lost contact with volume location server a.b.c.13 in cell 
cert.org
  afs: Lost contact with volume location server a.b.c.12 in cell 
cert.org
  afs: Lost contact with volume location server a.b.c.12 in cell 
cert.org
  Unmounting file systems (retry):  WARM shutting down of: CB... afs... 
BkG... CTrunc... AFSDB... RxEvent... RxListener...
  (system hung at this point)

/afs busy + 1.2.9 patched with umount.diff = system not hung at 
shutdown:

  Stopping AFS services.....
  umount: /afs: device is busy
  libafs-2.4.20-13.7-i686: Device or resource busy
    .
    .
    .
  Shutting down interface eth0:  [  OK  ]
  Shutting down loopback interface:  [  OK  ]
  Starting killall:  [  OK  ]
  Sending all processes the TERM signal...
  Sending all processes the KILL smd: recovery thread got woken up ...
  ignal...
  Syncing hardware clock to system time
  Turning off swap:
  Turning off quotas:
  Unmounting file systems:  afs_cacheDp 1 at stop

  Please stand by while rebooting the system...
  flushing ide devices: hdc
  Restarting system.

Normal shutdown:

  Stopping AFS services.....
  WARM shutting down of: CB... afs... BkG... CTrunc... AFSDB... 
RxEvent... RxListener...
  afs_cacheDp 1 at stop
    .
    .
    .
  Shutting down interface eth0:  [  OK  ]
  Shutting down loopback interface:  [  OK  ]
  Starting killall:  [  OK  ]
  Sending all processes the TERM signal...
  Sending all processes the KILL smd: recovery thread got woken up ...
  ignal...
  Syncing hardware clock to system time
  Turning off swap:
  Turning off quotas:
  Unmounting file systems:
  Please stand by while rebooting the system...
  flushing ide devices: hdc
  Restarting system.

Rudy