[OpenAFS] Weird volserver problem

Derrick J Brashear shadow@dementia.org
Sat, 28 Jul 2007 17:18:43 -0400 (EDT)


you probably want the volserver clone locking patch in (i'm guessing)
src/vol/clone.c since 1.4.4

On Sat, 28 Jul 2007, Brian Sebby wrote:

> We're having a strange problem that just started happening this afternoon
> on one of our fileservers that appears to be related to the volserver.
>
> We have a number of jobs that perform vos release commands, and today we
> started getting error messages from them indicating that they were timing
> out, etc.  Trying to run various "vos" commands takes forever, and although
> they eventually return the information, they sit there for several minutes
> before they succeed.
>
> I'm seeing a number of messages like this in the VolserLog file:
>
> Sat Jul 28 16:02:11 2007 trans 60 on volume 1818569609 has been idle for more than 570 seconds
> Sat Jul 28 16:02:11 2007 trans 55 on volume 1818569660 has been idle for more than 600 seconds
> Sat Jul 28 16:02:11 2007 trans 55 on volume 1818569660 has timed out
> Sat Jul 28 16:02:41 2007 trans 60 on volume 1818569609 has been idle for more than 600 seconds
> Sat Jul 28 16:02:41 2007 trans 60 on volume 1818569609 has timed out
>
> and
>
> Sat Jul 28 15:59:41 2007 1 Volser: DumpVolume: Rx call failed during dump, error
> -01
> Sat Jul 28 15:59:41 2007 1 Volser: DumpVolume: Rx call failed during dump, error
> -01
> Sat Jul 28 15:59:41 2007 1 Volser: DumpVolume: Rx call failed during dump, error
> -01
> Sat Jul 28 15:59:41 2007 1 Volser: DumpVolume: Rx call failed during dump, error
> -01
> Sat Jul 28 15:59:41 2007 1 Volser: DumpVolume: Rx call failed during dump, error
> -01
> Sat Jul 28 15:59:41 2007 1 Volser: DumpVolume: Rx call failed during dump, error
> -01
> Sat Jul 28 15:59:41 2007 1 Volser: DumpVolume: Rx call failed during dump, error
> -01
> Sat Jul 28 15:59:41 2007 1 Volser: DumpVolume: Rx call failed during dump, error
> -01
>
> These volumes are on SAN storage, using ZFS as the backend fileserver.
> We're running the 1.4.4 namei fileserver on Solaris with the -nofsync patch.
>
> Here are the bos parameters we're using:
>
> Instance fs, (type is fs) currently running normally.
>    Auxiliary status is: file server running.
>    Process last started at Sat Jul 28 15:50:38 2007 (3 proc starts)
>    Last exit at Sat Jul 28 15:50:38 2007
>    Command 1 is '/usr/afs/bin/fileserver -nojumbo -nofsync'
>    Command 2 is '/usr/afs/bin/volserver -nojumbo -nofsync'
>    Command 3 is '/usr/afs/bin/salvager'
>
> Any help would be greatly appreciated.
>
>
> Brian
>
> -- 
> Brian Sebby  (sebby@anl.gov)  |  Unix and Operation Services
> Phone: +1 630.252.9935        |  Computing and Information Systems
> Fax:   +1 630.252.4601        |  Argonne National Laboratory
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>