[OpenAFS-devel] afsd crash openafs-snap-2005-04-08

Martin MOKREJŠ mmokrejs@ribosome.natur.cuni.cz
Wed, 20 Apr 2005 18:07:28 +0200


chas williams - CONTRACTOR wrote:
> In message <426573BC.4010904@ribosome.natur.cuni.cz>,=?ISO-8859-2?Q?Martin_MOKR
> EJ=A9?= writes:
> 
>>Apr 13 11:13:26 aquarius PREEMPT DEBUG_PAGEALLOC
> 
> 
> is this kernel SMP?

I just around that time turned off SMP and HYPERTHREADING SUPPORT.
I still do keep PREEMPT even right now. The reason was that this uniprocessor
P4 machine with HT enabled processor is significantly slower with SMP+HT.
So I disabled that in BIOS and kernel.


Looking back into the logs, it was probably still SMP kernel at that moment but
in BIOS hyperthreading was already disabled. At least my interpretation.

Apr 13 10:59:58 aquarius Linux version 2.6.11.6 (root@aquarius) (gcc version 3.4.3-20050110 (Gentoo Linux 3.4.3.20050110-r1, s
sp-3.4.3.20050110-0, pie-8.7.7)) #2 Sat Apr 9 10:47:05 CEST 2005
Apr 13 10:59:58 aquarius BIOS-provided physical RAM map:
Apr 13 10:59:58 aquarius BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
Apr 13 10:59:58 aquarius BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
Apr 13 10:59:58 aquarius BIOS-e820: 00000000000e8000 - 0000000000100000 (reserved)
Apr 13 10:59:58 aquarius BIOS-e820: 0000000000100000 - 00000000bff30000 (usable)
Apr 13 10:59:58 aquarius BIOS-e820: 00000000bff30000 - 00000000bff40000 (ACPI data)
Apr 13 10:59:58 aquarius BIOS-e820: 00000000bff40000 - 00000000bfff0000 (ACPI NVS)
Apr 13 10:59:58 aquarius BIOS-e820: 00000000bfff0000 - 00000000c0000000 (reserved)
Apr 13 10:59:58 aquarius BIOS-e820: 00000000ffb80000 - 0000000100000000 (reserved)
Apr 13 10:59:58 aquarius 2175MB HIGHMEM available.
Apr 13 10:59:58 aquarius 896MB LOWMEM available.
Apr 13 10:59:58 aquarius found SMP MP-table at 000ff780
Apr 13 10:59:58 aquarius On node 0 totalpages: 786224
Apr 13 10:59:58 aquarius DMA zone: 4096 pages, LIFO batch:1
Apr 13 10:59:58 aquarius Normal zone: 225280 pages, LIFO batch:16
Apr 13 10:59:58 aquarius HighMem zone: 556848 pages, LIFO batch:16
Apr 13 10:59:58 aquarius DMI 2.3 present.
Apr 13 10:59:58 aquarius ACPI: RSDP (v002 ACPIAM                                ) @ 0x000f9e30
Apr 13 10:59:58 aquarius ACPI: XSDT (v001 A M I  OEMXSDT  0x10000426 MSFT 0x00000097) @ 0xbff30100
Apr 13 10:59:58 aquarius ACPI: FADT (v003 A M I  OEMFACP  0x10000426 MSFT 0x00000097) @ 0xbff30290
Apr 13 10:59:58 aquarius ACPI: MADT (v001 A M I  OEMAPIC  0x10000426 MSFT 0x00000097) @ 0xbff30390
Apr 13 10:59:58 aquarius ACPI: OEMB (v001 A M I  OEMBIOS  0x10000426 MSFT 0x00000097) @ 0xbff40040
Apr 13 10:59:58 aquarius ACPI: DSDT (v001  P4CED P4CED106 0x00000106 INTL 0x02002026) @ 0x00000000
Apr 13 10:59:58 aquarius ACPI: Local APIC address 0xfee00000
Apr 13 10:59:58 aquarius ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Apr 13 10:59:58 aquarius Processor #0 15:2 APIC version 20
Apr 13 10:59:58 aquarius ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] disabled)
Apr 13 10:59:58 aquarius ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
Apr 13 10:59:58 aquarius IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23
Apr 13 10:59:58 aquarius ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
Apr 13 10:59:58 aquarius ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
Apr 13 10:59:58 aquarius ACPI: IRQ0 used by override.
Apr 13 10:59:58 aquarius ACPI: IRQ2 used by override.
Apr 13 10:59:58 aquarius ACPI: IRQ9 used by override.
Apr 13 10:59:58 aquarius Enabling APIC mode:  Flat.  Using 1 I/O APICs
Apr 13 10:59:58 aquarius Using ACPI (MADT) for SMP configuration information
Apr 13 10:59:58 aquarius Allocating PCI resources starting at c0000000 (gap: c0000000:3fb80000)
Apr 13 10:59:58 aquarius Built 1 zonelists
Apr 13 10:59:58 aquarius Kernel command line: root=/dev/sda2 single
Apr 13 10:59:58 aquarius mapped APIC to ffffd000 (fee00000)
Apr 13 10:59:58 aquarius mapped IOAPIC to ffffc000 (fec00000)
Apr 13 10:59:58 aquarius Initializing CPU#0
Apr 13 10:59:58 aquarius PID hash table entries: 4096 (order: 12, 65536 bytes)
Apr 13 10:59:58 aquarius Detected 3449.853 MHz processor.
Apr 13 10:59:58 aquarius Using tsc for high-res timesource
Apr 13 10:59:58 aquarius Console: colour VGA+ 80x25
Apr 13 10:59:58 aquarius Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Apr 13 10:59:58 aquarius Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Apr 13 10:59:58 aquarius Memory: 3112972k/3144896k available (2887k kernel code, 31108k reserved, 1355k data, 176k init, 22273
92k highmem)
Apr 13 10:59:58 aquarius Checking if this processor honours the WP bit even in supervisor mode... Ok.
Apr 13 10:59:58 aquarius Calibrating delay loop... 6815.74 BogoMIPS (lpj=3407872)
Apr 13 10:59:58 aquarius Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
Apr 13 10:59:58 aquarius CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000 00004400 00000000 00000000
Apr 13 10:59:58 aquarius CPU: After vendor identify, caps: bfebfbff 00000000 00000000 00000000 00004400 00000000 00000000
Apr 13 10:59:58 aquarius CPU: Trace cache: 12K uops, L1 D cache: 8K
Apr 13 10:59:58 aquarius CPU: L2 cache: 512K
Apr 13 10:59:58 aquarius CPU: After all inits, caps: bfebfbf7 00000000 00000000 00000080 00004400 00000000 00000000
Apr 13 10:59:58 aquarius Intel machine check architecture supported.
Apr 13 10:59:58 aquarius Intel machine check reporting enabled on CPU#0.
Apr 13 10:59:58 aquarius CPU0: Intel P4/Xeon Extended MCE MSRs (12) available
Apr 13 10:59:58 aquarius CPU0: Thermal monitoring enabled
Apr 13 10:59:58 aquarius CPU: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 09
Apr 13 10:59:58 aquarius Enabling fast FPU save and restore... done.
Apr 13 10:59:58 aquarius Enabling unmasked SIMD FPU exception support... done.
Apr 13 10:59:58 aquarius Checking 'hlt' instruction... OK.
Apr 13 10:59:58 aquarius ENABLING IO-APIC IRQs
Apr 13 10:59:58 aquarius ..TIMER: vector=0x31 pin1=2 pin2=-1
Apr 13 10:59:58 aquarius NET: Registered protocol family 16
Apr 13 10:59:58 aquarius PCI: PCI BIOS revision 2.10 entry at 0xf0031, last bus=3
Apr 13 10:59:58 aquarius PCI: Using configuration type 1
Apr 13 10:59:58 aquarius mtrr: v2.0 (20020519)
Apr 13 10:59:58 aquarius ACPI: Subsystem revision 20050211
Apr 13 10:59:58 aquarius ACPI: Interpreter enabled
Apr 13 10:59:58 aquarius ACPI: Using IOAPIC for interrupt routing
Apr 13 10:59:58 aquarius ACPI: PCI Root Bridge [PCI0] (00:00)
Apr 13 10:59:58 aquarius PCI: Probing PCI hardware (bus 00)
Apr 13 10:59:58 aquarius PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.1
Apr 13 10:59:58 aquarius PCI: Transparent bridge - 0000:00:1e.0
Apr 13 10:59:58 aquarius ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
Apr 13 10:59:58 aquarius ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P4._PRT]
Apr 13 10:59:58 aquarius ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P2._PRT]
Apr 13 10:59:58 aquarius ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 *10 11 12 14 15)
Apr 13 10:59:58 aquarius ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 *7 10 11 12 14 15)
Apr 13 10:59:58 aquarius ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 *5 6 7 10 11 12 14 15)
Apr 13 10:59:58 aquarius ACPI: PCI Interrupt Link [LNKD] (IRQs *3 4 5 6 7 10 11 12 14 15)
Apr 13 10:59:58 aquarius ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 *5 6 7 10 11 12 14 15)
Apr 13 10:59:58 aquarius ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled.
Apr 13 10:59:58 aquarius ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled.
Apr 13 10:59:58 aquarius ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 7 10 *11 12 14 15)
Apr 13 10:59:58 aquarius Linux Plug and Play Support v0.97 (c) Adam Belay
Apr 13 10:59:58 aquarius pnp: PnP ACPI init
Apr 13 10:59:58 aquarius pnp: PnP ACPI: found 11 devices
Apr 13 10:59:58 aquarius SCSI subsystem initialized


> 
> 
>>Apr 13 11:13:26 aquarius [<fa059458>] afs_CheckVolumeNames+0x303/0x479 [libafs
>>]
> 
> 
> this seems like a pretty strange place for a crash.  it seems to
> point somewhere around:
> 
> Line 312 of "/scratch/chas/openafs/2.6.11.7/src/libafs/MODLOAD-2.6.11.7-MP/afs_volume.c"
>    starts at address 0x339e0 <afs_CheckVolumeNames+768> and ends at 0x339ed <afs_CheckVolumeNames+781>.
> 
>                     AFS_FAST_HOLD(tvc);
>                     ReleaseReadLock(&afs_xvcache);
> 
> 
>>>>>>              ObtainWriteLock(&afs_xcbhash, 485);
> 
>                     /* LOCKXXX: We aren't holding tvc write lock? */
>                     afs_DequeueCallback(tvc);
>                     tvc->states &= ~CStatd;
>                     ReleaseWriteLock(&afs_xcbhash);
> 
> could you try 1.3.81?  its difficult to work tracebacks against 
> snapshots.

I can, but as I've mentioned earlier, I was moving from devfs to udev. Even today
I've found I don't have /dev/raw1394 created (seems udev doesn't do it automagically
for everyone, according to Google) ... at that time you see I've booted into singleuser
mode and I think it was to comment out from /etc/fstab both partitions for cache and vicepa.
I'm not sure this case is worth the investigation ... I just want to say devices might
have not exist at that very moment. Would that be the reason to crash in that code?


-- 
Martin Mokrejs
Email: 'bW9rcmVqc21Acmlib3NvbWUubmF0dXIuY3VuaS5jeg==\n'.decode('base64')
GPG key is at http://www.natur.cuni.cz/~mmokrejs