View Issue Details

IDProjectCategoryView StatusLast Update
0003382OpenFOAMBugpublic2019-11-14 20:25
Reportersunil_jain Assigned Tohenry  
PrioritynormalSeveritycrashReproducibilityrandom
Status closedResolutionunable to reproduce 
PlatformUnixOSOtherOS Version(please specify)
Summary0003382: Open Foam Software crash randonly on Intel Skylake processors but runs fine on sandybrifge perocessors.
DescriptionOpenFoam software crash logs

2018-09-09 17:14:34 [179203.697285] simd exception: 0000 [#1] SMP
2018-09-09 17:14:34 [179203.701527] Modules linked in: squashfs loop 8021q garp mrp stp llc nvidia_uvm(POE) nvidia(POE) xfs skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi irqbypass crc32_pclmul ghash_cl_]
2018-09-09 17:14:34 [179203.784359] CPU: 2 PID: 159455 Comm: shuangTwoPhaseE Tainted: P OE ------------ T 3.10.0-862.9.1.el7.x86_64 #1
2018-09-09 17:14:34 [179203.795389] Hardware name: Dell Inc. PowerEdge R740/06G98X, BIOS 1.4.8 05/21/2018
2018-09-09 17:14:34 [179203.802958] task: ffff995c1aee8fd0 ti: ffff995c1988c000 task.ti: ffff995c1988c000
2018-09-09 17:14:34 [179203.810539] RIP: 0010:[<ffffffffbe121791>] [<ffffffffbe121791>] apic_timer_interrupt+0x141/0x170
2018-09-09 17:14:34 [179203.819515] RSP: 0000:ffff995c1da46200 EFLAGS: 00010082
2018-09-09 17:14:34 [179203.824928] RAX: ffff995c1988ff70 RBX: 0000000001a95e00 RCX: 0000000000000090
2018-09-09 17:14:34 [179203.832146] RDX: 0000000000000000 RSI: ffff995c1da46200 RDI: ffff995c1988ff70
2018-09-09 17:14:34 [179203.839364] RBP: 00007ffd8b8ba848 R08: 0000000000000c40 R09: 0000000000000031
2018-09-09 17:14:34 [179203.846591] R10: 0000000000000000 R11: 0000000000e72148 R12: 0000000001c4e770
2018-09-09 17:14:34 [179203.853827] R13: 0000000000000007 R14: 00000000011935b0 R15: 0000000000000038
2018-09-09 17:14:34 [179203.861040] FS: 00002ad83f7afa00(0000) GS:ffff995c1da40000(0000) knlGS:0000000000000000
2018-09-09 17:14:34 [179203.869213] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2018-09-09 17:14:34 [179203.875042] CR2: 0000000002a18000 CR3: 00000017963f8000 CR4: 00000000007607e0
2018-09-09 17:14:34 [179203.882274] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
2018-09-09 17:14:34 [179203.889495] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
2018-09-09 17:14:34 [179203.896714] PKRU: 55555554
2018-09-09 17:14:34 [179203.899530] Call Trace:
2018-09-09 17:14:34 [179203.902065] Code: 48 39 cc 77 2f 48 8d 81 00 fe ff ff 48 39 e0 77 23 57 48 29 e1 65 48 8b 3c 25 78 0e 01 00 48 83 c7 28 48 29 cf 48 89 f8 48 89 e6 <f3> a4 48 89 c4 5f 48 89 e6 65 ff 04 26
2018-09-09 17:14:34 [179203.922628] RIP [<ffffffffbe121791>] apic_timer_interrupt+0x141/0x170
2018-09-09 17:14:34 [179203.929259] RSP <ffff995c1da46200>
2018-09-09 17:14:34 [179203.933970] ---[ end trace 3912e5e8b3b86da4 ]---
2018-09-09 17:14:34 [179203.984039] Kernel panic - not syncing: Fatal exception
2018-09-09 17:14:34 [179203.989451] Kernel Offset: 0x3ca00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
 
And a crashdump here:
crash> bt

PID: 138341 TASK: ffff9fd7eb3c6eb0 CPU: 27 COMMAND: "shuangTwoPhaseE"
#0 [ffff9ff02ee6bc38] machine_kexec at ffffffff938629da
#1 [ffff9ff02ee6bc98] __crash_kexec at ffffffff93916692
#2 [ffff9ff02ee6bd68] crash_kexec at ffffffff93916780
#3 [ffff9ff02ee6bd80] oops_end at ffffffff93f1d738
#4 [ffff9ff02ee6bda8] die at ffffffff9382f96b
#5 [ffff9ff02ee6bdd8] math_error at ffffffff9382cca8
#6 [ffff9ff02ee6be98] do_simd_coprocessor_error at ffffffff9382cec8
#7 [ffff9ff02ee6bec0] simd_coprocessor_error at ffffffff93f28c9e
#8 [ffff9ff02ee6bf48] apic_timer_interrupt at ffffffff93f26791
[...]
Additional InformationThis issue is seen on Servers with Skylake processors from HP/Dell but works fine on SandyBridge Procesors
TagsNo tags attached.

Activities

henry

2019-11-11 08:37

manager   ~0010876

We have a dual Xeon Skylake machine here and have used it for more than a year without any problems. Can you provide enough information so that we can reproduce the problem here?

tniemi

2019-11-11 12:54

reporter   ~0010884

I can also confirm that we have run OpenFOAM on several different Skylakes without issues.

henry

2019-11-14 20:25

manager   ~0010906

Too little information is provided about the system on which the problems exists or how to reproduce it on another system.

Issue History

Date Modified Username Field Change
2019-11-11 05:24 sunil_jain New Issue
2019-11-11 08:37 henry Note Added: 0010876
2019-11-11 12:54 tniemi Note Added: 0010884
2019-11-14 20:25 henry Assigned To => henry
2019-11-14 20:25 henry Status new => closed
2019-11-14 20:25 henry Resolution open => unable to reproduce
2019-11-14 20:25 henry Note Added: 0010906