[OpenAFS] find /afs/ breaking the client?
   
    Troy Benjegerdes
     
    hozer@hozed.org
       
    Wed, 21 Feb 2007 16:22:17 -0600
    
    
  
On Wed, Feb 07, 2007 at 09:30:07AM -0500, Derrick J Brashear wrote:
> On Wed, 7 Feb 2007, Jakub Witkowski wrote:
> 
> >>>No, no oops. The system just... blocks. You can interact with programs
> >>>already in memory, access open files, but not open new.
> >>>
> >>>I chose .14 mostly because I was having problems building the module for
> >>>Xen kernel and this version simply was first that I got compiled. I may
> >>>fall back to something more stable now, as I know how to get things
> >>>running.
> >>>
> >>>Which OpenAFS version you recommend for installation on a client? On a
> >>>server?
> >>
> >>For Linux, we haven't recommended any 1.5.x client. 1.4.2, generally,
> >>though 1.4.3rc2 should be out in a day or so.
> >>
> >>If you can get cmdebug information when it's hung, that's be useful to
> >>see.
> >
> >I have done some experiments and my findings are not exactly optimistic.
> >First of all, I found out that the hang was actually caused by some
> >weird interaction between OpenAFS client and libnss-ldap library; in
> >test enviroinment I can reproduce the systemwide hang described above
> >when I set up nsswitch library to look uids up in ldap, but if it is not
> >configured to do so, only the find process hangs - and then, only for a
> >few minutes. Adding -fakestat-all switch makes the problem less
> >pronounced (i.e. find lists more files) but not go away.
> 
> Actually, when it's hung in 1.5.x getting a task list (alt-sysrq-t) would 
> be useful, if you can do it.
I believe I have this problem with 1.5.14 with AFS as the root
filesystem.. I've seen the problem during a make -j8. CMdebug just
hangs, but rxdebug to port 7001 on the hung machine still works
I ended up using 1.5.14 because recent kernels (2.6.19) changed the
makefiles enough so that 'osi_flush.s' is no longer recognized on ppc64, and I
was trying to figure out why it didn't work. Renaming osi_flush.s to
osi_flush.S fixes it, so I'll try 1.4.x..