[OpenAFS] file-server: salvaging

Klaas Hagemann kerberos@northsailor.de
Mon, 27 Jan 2003 12:30:56 +0100


Klaas Hagemann schrieb:
> Klaas Hagemann schrieb:
> 
>> Derrick J Brashear schrieb:
>>
>>> On Fri, 24 Jan 2003, Klaas Hagemann wrote:
>>>
>>>
>>>> salvaging starts very often and file server prozesses are staying 
>>>> running but do not have the bosserver as ppid.
>>>
>>>
>>>
>>>
>>> either the bosserver is dying and somwething restarting it (doubt it) or
>>> more likely the "main" pthread is dying but the rest stay running. 
>>> strace
>>> output, core.(pid) or logs might be helpful.
>>
>>
>> This sunday this error occured again on another file-server.
>> All the file-server prozesses have the "1" as pid and the volumes are 
>> not accessible any more. I am not sure whether the bosserver was still 
>> running or not, because my kollegue restarted it.
>>
>> The AFS-Logs are empty, cause they were deleted on the new startup. I 
>> will keep them the next time.
>>
>> The file servers are running on suse linux 7.3. Are there any 
>> kernel-parameters which could be set? We had openafs running in our 
>> testing-enviroment without any problems, so i think this problem only 
>> occurs when many clients access the file-server.
>>
>> I will post any log-files when i get them, but any help or suggestions 
>> is very very welcome.
> 
> 
> Now i got the logs from a file-Server on which the problem occured.
> I was not able to find any entries in /var/log/messages and did not get 
> any core.

The mail with all the logs still waits for moderator approval.
So i post the main parts of the log file here:

in FileLog i found:

Mon Jan 27 10:11:15 2003 File server starting
Mon Jan 27 10:12:45 2003 Cannot initialize RX

in BosLog i found:

Mon Jan 27 07:03:00 2003: Server directory access is okay
Mon Jan 27 07:03:01 2003: fs:salv exited with code 0
Mon Jan 27 09:59:07 2003: fs:file exited on signal 11
Mon Jan 27 09:59:07 2003: fs:vol exited on signal 15
Mon Jan 27 09:59:09 2003: fs:salv exited with code 0
Mon Jan 27 10:00:39 2003: fs:file exited with code 1
Mon Jan 27 10:00:39 2003: fs:vol exited on signal 15
Mon Jan 27 10:00:40 2003: fs:salv exited with code 0
Mon Jan 27 10:02:10 2003: fs:file exited with code 1
Mon Jan 27 10:02:10 2003: fs:vol exited on signal 15
Mon Jan 27 10:02:10 2003: fs:salv exited with code 0
Mon Jan 27 10:03:40 2003: fs:file exited with code 1
Mon Jan 27 10:03:40 2003: fs:vol exited on signal 15
Mon Jan 27 10:03:41 2003: fs:salv exited with code 0
Mon Jan 27 10:05:11 2003: fs:file exited with code 1
Mon Jan 27 10:05:11 2003: fs:vol exited on signal 15
Mon Jan 27 10:05:12 2003: fs:salv exited with code 0
Mon Jan 27 10:06:42 2003: fs:file exited with code 1
Mon Jan 27 10:06:42 2003: fs:vol exited on signal 15
Mon Jan 27 10:06:43 2003: fs:salv exited with code 0
Mon Jan 27 10:08:13 2003: fs:file exited with code 1
Mon Jan 27 10:08:13 2003: fs:vol exited on signal 15
Mon Jan 27 10:08:13 2003: fs:salv exited with code 0
Mon Jan 27 10:09:43 2003: fs:file exited with code 1
Mon Jan 27 10:09:43 2003: fs:vol exited on signal 15
Mon Jan 27 10:09:44 2003: fs:salv exited with code 0
Mon Jan 27 10:11:14 2003: fs:file exited with code 1
Mon Jan 27 10:11:14 2003: fs:vol exited on signal 15
Mon Jan 27 10:11:15 2003: fs:salv exited with code 0
Mon Jan 27 10:12:45 2003: fs:file exited with code 1
Mon Jan 27 10:12:45 2003: fs:vol exited on signal 15
Mon Jan 27 10:12:46 2003: fs:salv exited with code 0

> Thanks in advance
> 
> Klaas
> 
> 
>>
>> Thanks
>> Klaas