[OpenAFS] apparent OpenAFS and Pre-emptible Linux Kernel issues
Mon, 31 Mar 2003 09:18:27 -0500
In message <20030330081212.26F8775E0D@mail.icequake.net>,"Ryan Underwood" writes:
>the time at the moment. It would be excellent if someone could look into
>this, as the pre-emptible patches are now in 2.5 mainline and eventually
>this may become an issue anyway.
+RULE #1: Per-CPU data structures need explicit protection
+Two similar problems arise. An example code snippet:
+ struct this_needs_locking tux[NR_CPUS];
+ tux[smp_processor_id()] = some_value;
+ /* task is preempted here... */
+ something = tux[smp_processor_id()];
+First, since the data is per-CPU, it may not have explicit SMP locking, but
+require it otherwise. Second, when a preempted task is finally rescheduled,
+the previous value of smp_processor_id may not equal the current. You must
+protect these situations by disabling preemption around them.
+RULE #2: CPU state must be protected.
+Under preemption, the state of the CPU must be protected. This is arch-
+dependent, but includes CPU structures and state not preserved over a context
+switch. For example, on x86, entering and exiting FPU mode is now a critical
+section that must occur while preemption is disabled. Think what would happen
+if the kernel is executing a floating-point instruction and is then preempted.
+Remember, the kernel does not save FPU state except for user tasks. Therefore,
+upon preemption, the FPU registers will be sold to the lowest bidder. Thus,
+preemption must be disabled around such regions.
+Note, some FPU functions are already explicitly preempt safe. For example,
+kernel_fpu_begin and kernel_fpu_end will disable and enable preemption.
+However, math_state_restore must be called with preemption disabled.
+RULE #3: Lock acquire and release must be performed by same task
+A lock acquired in one task must be released by the same task. This
+means you can't do oddball things like acquire a lock and go off to
+play while another task releases it. If you want to do something
+like this, acquire and release the task in the same code path and
+have the caller wait on an event by the other task.
i would hazard a guess that afs is breaking rule #3. i dont believe that
openafs has any per cpu data structures or uses any particular cpu features
that arent normally preserved by a context switch. i do believe afs's
locking is 'oddball' :)