View Issue Details

IDProjectCategoryView StatusLast Update
0003898OpenFOAMBugpublic2022-10-27 10:36
Reporterwyldckat Assigned Towyldckat  
PrioritynormalSeveritycrashReproducibilitysometimes
Status closedResolutionno change required 
Product Version10 
Summary0003898: Segmentation fault when using function entries due to incorrect global initialization
DescriptionI was intrigued about the summary conclusion for issue #3887 that it was a bug in CentOS 7.9, so I did a bit more digging into this issue.

The diagnosis was that in the file "src/OpenFOAM/db/dictionary/entry/entryIO.C", in line 159:

                 && functionName == functionEntries::inputSyntaxEntry::typeName

this is what results in the segmentation fault when '#includeFunc' or similar, either from running 'blockMesh' or even 'foamDictionary'.

The reason seems to be because the static variable "functionEntries::inputSyntaxEntry::typeName" is not always initialized on-time when using certain GCC builds, as the case with GCC builds provided by SCL.

If we change that line to this:

                 && functionName == functionEntries::inputSyntaxEntry::typeName_()

It solves this issue.

The annoying part now is figuring out why exactly the initializations are happening out-of-order from the expected way, when it works just fine with many of the GCC builds that people have been using. So my best guess is that it's a GCC feature that is turned on by default in some builds in the builds at SCL, not commonly used in other builds of GCC.
Steps To Reproduce1. Built OpenFOAM 10 from a git clone source code, as normal user on CentOS 7.9, using GCC 11 from SCL.
2. Copied the tutorial case 'tutorials/heatTransfer/chtMultiRegionFoam/reverseBurner' to the run folder.
3. Run:

    gdb blockMesh

4. Then:

    run
    bt

5. Not enough information, so added '-g' to 'src/OpenFOAM/Make/options' and ran 'wmake -j' within 'src/OpenFOAM'.
6. Did as steps 3 and 4.
7. Gotten the indication that the crash was originating from line 159.
TagsNo tags attached.

Relationships

related to 0003887 closedhenry Segmentation fault when #includeFunc included 

Activities

tniemi

2022-10-07 17:48

reporter   ~0012767

This reminded me of this old bug report https://bugs.openfoam.org/view.php?id=2645

Taking account that CentOS tries to be very backwards compatible and ancient, it might be possible that their gcc still processes the files in reverse order. If you want to investigate this, you could try to swap $(functionEntries)/inputSyntaxEntry/inputSyntaxEntry.C to occur before $(entry)/entryIO.C in src/OpenFOAM/Make/files and see if it makes a difference.

Generally speaking the order of static initialization is not strictly defined across files/translation units, so problems like this can be hit occasionally.

wyldckat

2022-10-07 18:12

updater   ~0012768

@tniemi: Many thanks!
I've tried it just now, but it still gives the same issue.

I've checked on the wmake output to confirm that I changed the order correctly and the order is updated to what you indicated.

tniemi

2022-10-07 18:29

reporter   ~0012769

Ok. As a matter of fact, in this case the default order (entryIO.C before inputSyntaxEntry.C) would actually mean that a compiler that follows the current convention would actually always process entryIO.C first and then inputSyntaxEntry.C, which would mean there should be problems. Yet these do not happen....

Your proposed change would only need data from inputSyntaxEntry.H which gets included into entryIO.C, so it should be more safer. If it works with other compilers as well, it would probably be a good change.

wyldckat

2022-10-07 18:43

updater   ~0012770

Quoting from file 'src/OpenFOAM/db/dictionary/functionEntries/inputSyntaxEntry/inputSyntaxEntry.C' - can be seen here https://github.com/OpenFOAM/OpenFOAM-10/blob/fdc522691c352db24630e49c2bf7decfbe8316b2/src/OpenFOAM/db/dictionary/functionEntries/inputSyntaxEntry/inputSyntaxEntry.C#L31

    const Foam::word Foam::functionEntries::inputSyntaxEntry::typeName
    (
        Foam::functionEntries::inputSyntaxEntry::typeName_()
    );

I'm simply unrolling manually the origin of the value defined in 'typeName', so it should still work as intended with all compilers.
The downside is the very slight drop in performance when parsing through that line in 'entryIO.C'.

henry

2022-10-07 19:04

manager   ~0012771

Try moving the Foam::functionEntries::inputSyntaxEntry::typeName definition into entryIO.C

wyldckat

2022-10-10 13:55

updater   ~0012784

@Henry: Unfortunately that did not fix the issue.
I had tried on Friday to place that definition in the 'global.Cver' file and "wmake" within "src/OpenFOAM", but the same issue occurred.

I've built again the OpenFOAM library with '-g' and the current gdb backtrace gives me:

  #0 std::operator==<char> (__lhs="includeFunc", __rhs=<error reading variable: Cannot access memory at address 0xffffffffffffffe8>, __lhs="includeFunc",
      __rhs=<error reading variable: Cannot access memory at address 0xffffffffffffffe8>) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/basic_string.h:3963
  #1 Foam::entry::New (parentDict=..., is=...) at db/dictionary/entry/entryIO.C:167

It seems more and more like this issue occurs due default GCC flags or absence of spcific default flags... I'm testing now on GCC 11 SCL with the following default flags from GCC 9 on Ubuntu 20.04:

    -fasynchronous-unwind-tables -fstack-protector-strong -Wformat -Wformat-security -fstack-clash-protection -fcf-protection

I'll update here as soon as I can get results.

wyldckat

2022-10-11 11:27

updater   ~0012787

So I have some good news and some annoying news:
1- It's not a compiler bug, in itself.

2- It is somewhat of a bug in the headers provided in SCL's GCC folder "/opt/rh/devtoolset-11/root/usr/include/c++/11/".

3- Therefore there is no clear way to fix this, unless it is fixed upstream on the SCL's provided GCC compilations. They seem to apply certain patches that are not part of GCC's own stack, among which one of them breaks this.

For now, there is a workaround, as proposed in the Description of this report, but it does indeed seem to be some kind of bug in SCL's GCC patching. I'll try to report this to them.

@Henry: Please feel free do undo the change made in commit 56023e97fb7d2f40b136e0d340c35a9d36e49640 at OpenFOAM-dev: https://github.com/OpenFOAM/OpenFOAM-dev/commit/56023e97fb7d2f40b136e0d340c35a9d36e49640

tniemi

2022-10-11 11:40

reporter   ~0012789

Good that you were able to figure this out. I was also just testing this with CentOS docker image and the behavior of the compiler is bizarre. Your workaround works, but if I eg. try to define a local static function which returns functionEntries::inputSyntaxEntry::typeName_() it crashes. To me it seems that the SCL toolset is clearly broken.

wyldckat

2022-10-11 12:01

updater   ~0012790

@tniemi: I didn't go into much detail for my previous comment, but essentially I used the GCC main 'include' folder for 11.2.1 from SCL, in a custom build of GCC 11.3.0, and reproduced the same exact error.

If I use the custom GCC 11.3.0 default 'include' folder, everything works just fine.

SCL is using some patches of their own that clearly are attempting something specific, either for security or for something else, but I have yet to find proper records for those patches. I'm reporting this issue now at https://bugzilla.redhat.com - so that they can revise said patches.

wyldckat

2022-10-11 12:23

updater   ~0012791

Reported at https://bugzilla.redhat.com/show_bug.cgi?id=2133780

tniemi

2022-10-11 13:07

reporter   ~0012792

It is possible that this problem is related to backward compatibility issues and if so, it might be unfixable in CentOS 7. Googling reveals that there are various issues related to C++11 conformance and string/list handling:
https://stackoverflow.com/questions/73842752/is-this-a-bug-in-gcc-11-2-1-related-to-glibcxx-use-cxx11-abi
https://bugzilla.redhat.com/show_bug.cgi?id=1546704

Basically what I gather is that SCL compilers (no matter the version) don't offer full C++11 support in RHEL7 in order to support RHEL6. So I guess the recommendation would be to always use custom built gcc's with CentOS 7.

tniemi

2022-10-13 12:23

reporter   ~0012806

I tried setting D_GLIBCXX_USE_CXX11_ABI=0 in ubuntu gcc to replicate the behaviour of CentOS 7 and I got the segmentation fault. So it seems like this is the issue, SCL compilers force to use pre C++11 ABI and this does not play well with OF.

Now for this specific case the problem can be avoided by calling "functionEntries::inputSyntaxEntry::typeName_()", but it is unclear whether this is the only problematic place.

tniemi

2022-10-26 21:25

reporter   ~0012835

They now closed the bug report in redhat bugzilla, because C++11 ABI won't be supported on CentOS 7. It is not clear to me why using the old ABI causes problems, but somehow it does in this case. It might be related to relying on some undefined behaviour regarding static initialization order, but considering that normal gccs and clang work fine I don't think there is an issue. I also see no reason why somebody would use pre C++11 ABI these days unless specifically linking against ancient binaries. So this is quite much confined to CentOS 7 / RHEL 7 and their pre-built compilers.

If the code is changed as suggested in the description, the seqfault goes away and SCL compilers might work fine. However, at least I would personally prefer to use a self built compiler on CentOS 7.

wyldckat

2022-10-27 10:36

updater   ~0012836

Building a custom GCC isn't always the simplest thing to do, hence wanting to use SCL's GCC.
Having the workaround documented for those that might want this capability should be good enough.

It seems very strange to me that something as simple as an initialization failing due to an ABI issue... but everything points to GCC's own stack not properly supporting this specific use scenario, which isn't commonly triggered unless forced to go through the olden ways...

Anyway, closing this as no fix required in OpenFOAM itself.

Issue History

Date Modified Username Field Change
2022-10-07 16:15 wyldckat New Issue
2022-10-07 17:48 tniemi Note Added: 0012767
2022-10-07 18:12 wyldckat Note Added: 0012768
2022-10-07 18:29 tniemi Note Added: 0012769
2022-10-07 18:43 wyldckat Note Added: 0012770
2022-10-07 19:04 henry Note Added: 0012771
2022-10-10 13:55 wyldckat Note Added: 0012784
2022-10-11 11:19 wyldckat Relationship added related to 0003887
2022-10-11 11:27 wyldckat Note Added: 0012787
2022-10-11 11:40 tniemi Note Added: 0012789
2022-10-11 12:01 wyldckat Note Added: 0012790
2022-10-11 12:23 wyldckat Note Added: 0012791
2022-10-11 13:07 tniemi Note Added: 0012792
2022-10-13 12:23 tniemi Note Added: 0012806
2022-10-26 21:25 tniemi Note Added: 0012835
2022-10-27 10:36 wyldckat Note Added: 0012836
2022-10-27 10:36 wyldckat Assigned To => wyldckat
2022-10-27 10:36 wyldckat Status new => closed
2022-10-27 10:36 wyldckat Resolution open => no change required