View Issue Details

IDProjectCategoryView StatusLast Update
0003724OpenFOAMBugpublic2021-09-03 19:36
ReporterpeksaAssigned Tohenry 
PrioritynormalSeverityminorReproducibilityalways
Status resolvedResolutionfixed 
PlatformGNU/LinuxOSUbuntuOS Version15.04
Product Versiondev 
Fixed in Version 
Summary0003724: IOobject construction may hang
DescriptionDear developers,

The following issue description is directly related to my earlier issue on mapFieldsPar hanging (0003722),
in which I hastily proposed a solution that seemed to fix a symptom but not the fundamental reason.

After using the recent mapFieldsPar resolution in commit 458e9281e163a355f724ee82cc6a4ec89fb6a65d
I noticed that it actually failed to find the right sourceTime instance in parallel case, i.e. the
time instance is not looked from the processor directories but only from the main case. My earlier
test didn't yield this undesired behavior.

This drove me to look into the original implementation and quite quickly I found out that the initial
undesired hanging behavior is rising when constructing a Time object.

In order to investigate whether this is a general issue in object reconstruction,
I have modified the test utility Test-IOField.C for you to reproduce the issue.

Actually, Test-IOField is not functional at the moment because
it includes IOobject::objectPath(bool) functions without the boolean argument. After fixing this
issue (see script) one can compile it and build an example to reproduce the IO hanging.

When running the example in parallel, it should hang while reading one of the io objects.
I debugged the issue deeper into code and here it appears while IOField object is constructed,
and when the headerOk() function is called, which further inquires filePath() function from
regIOobject and even further from the IOobject classes. IOobject::filePath seems to be the source
of problems here. Furthermore, I did similar debugging for the initial mapFieldsPar problem and
the hanging happened in the same place.

I haven't been able to come up with a solution yet but wanted to report my findings here.

Let me know if you need some extra information.
Steps To ReproduceRun the attached shell script which compiles Test-IOField.C and tries to run an example case with it in parallel. Run should hang and not complete.
TagsNo tags attached.

Activities

peksa

2021-09-01 21:34

reporter  

buildIOCase.sh (610 bytes)

henry

2021-09-01 21:46

manager   ~0012177

mapFieldsPar is basically broken in several ways which is why I reinstated mapFields. It would make sense to avoid using mapFieldsPar until we can secure funding to rewrite it.

peksa

2021-09-02 06:25

reporter   ~0012178

I agree with you in terms of mapFieldsPar. I wonder if other parallel utilities may be influenced by this issue as well.

henry

2021-09-02 07:57

manager   ~0012179

I have updated Test-IOField:

commit 3554f2140e4dbcd547b39867c7e4760bbb8377e2 (HEAD -> master, origin/master, origin/HEAD)
Author: Henry Weller <http://cfd.direct>
Date: Thu Sep 2 07:55:49 2021 +0100

    Test-IOField: Updated and improved to use typeIOobject

We are not aware of any other parallel utilities which do not work correctly, do you know of any?

peksa

2021-09-02 18:39

reporter   ~0012180

Hey thanks for pointing out the correct usage of typeIOobject in this context. Now the test is executed succesfully.

I have tried to understand why the initial mapFieldsPar createTimes.H implementation lead to the hanging behaviour but I have not been able to reproduce the IOobject induced issue outside of that case yet. In addition, I tested some other parallel utilities having similar features and they all worked correctly as expected.

peksa

2021-09-02 19:22

reporter   ~0012181

Ok, I literally by a mistake got the initial mapFieldsPar working by changing the "startFrom" entry to "latestTime" instead of "startTime". So if you revert to commit d002a4de500c96542d89160d7539e03ed9f1eef4 (before the latest mapFieldsPar fix) and run again the attached shell script which again does parallel mapping between cavity cases but now with latestTime entry instead of startTime=0.

I have no idea why this works but what I quickly see is that Time object initiation has setControl() and setTime() functions which depend on the setting. The fix I proposed earlier has an issue that it cannot read the time values under processor folders while the older implementation with this entry change does things correctly. Weird but as you said, there are other problems as well...

buildMapFieldsParDebugCase.sh (1,774 bytes)

henry

2021-09-02 19:47

manager   ~0012182

Try this:

commit e6fdd180e8dc73e6914073908f159e58d622bd1d (HEAD -> master, origin/master, origin/HEAD)
Author: Henry Weller <http://cfd.direct>
Date: Thu Sep 2 19:45:14 2021 +0100

    mapFieldsPar: Corrected handling of argList and reverted change to createTimes.H

peksa

2021-09-03 06:29

reporter   ~0012183

Tested the commit and everything seems to work now. Thanks!

Issue History

Date Modified Username Field Change
2021-09-01 21:34 peksa New Issue
2021-09-01 21:34 peksa File Added: buildIOCase.sh
2021-09-01 21:46 henry Note Added: 0012177
2021-09-02 06:25 peksa Note Added: 0012178
2021-09-02 07:57 henry Note Added: 0012179
2021-09-02 18:39 peksa Note Added: 0012180
2021-09-02 19:22 peksa File Added: buildMapFieldsParDebugCase.sh
2021-09-02 19:22 peksa Note Added: 0012181
2021-09-02 19:47 henry Note Added: 0012182
2021-09-03 06:29 peksa Note Added: 0012183
2021-09-03 19:36 henry Assigned To => henry
2021-09-03 19:36 henry Status new => resolved
2021-09-03 19:36 henry Resolution open => fixed