View Issue Details

IDProjectCategoryView StatusLast Update
0001330OpenFOAMBugpublic2016-04-07 17:44
Reporteruser826Assigned Tohenry  
PrioritynormalSeverityminorReproducibilityhave not tried
Status resolvedResolutionfixed 
Fixed in Versiondev 
Summary0001330: Scalability issue: Exchange of message sizes
DescriptionIn the framework the EU project PRACE (http://www.prace-ri.eu/) we have been working with OpenFOAM 2.2.x for carrying out FSI simulations for aircraft designs.

In the course of our work we identified a scalability issue due to the exchange of buffer sizes, which is done in Pstream::exchange() (more precicely in combineReduce()). Our findings are described in detail in Section 4.4 of the following Whitepaper: http://www.prace-ri.eu/IMG/pdf/wp172.pdf

Note that we implemented a workaround, using directly MPI_Alltoall() instead of combineReduce() for buffer size communication, resulting in a significant scalability improvement. Attached is the unified diff for this workaround, (based on OpenFoam 2.2.x, commit 6c399a448855e18d335202434b4304203de65).

 
Additional InformationNote that the workaround applies to just one of the overloads of the method PstreamBuffers::finishedSends() (the one without the sizes output parameter).

It has not been investigated in which cases the second overload, which replicates the complete matrix of message sizes on all MPI ranks, is really needed and how far it can be optimized by a similar approach, using appropriate collective MPI routines directly.

TagsNo tags attached.

Activities

user826

2014-06-22 16:50

 

changes.diff (7,320 bytes)   
diff --git a/etc/bashrc b/etc/bashrc
index 0f0ff50..9ce7bc3 100644
--- a/etc/bashrc
+++ b/etc/bashrc
@@ -63,7 +63,7 @@ foamCompiler=system
 
 #- Compiler:
 #    WM_COMPILER = Gcc | Gcc45 | Gcc46 | Gcc47 | Clang | Icc (Intel icc)
-export WM_COMPILER=Gcc
+export WM_COMPILER=Icc
 unset WM_COMPILER_ARCH WM_COMPILER_LIB_ARCH
 
 #- Architecture:
@@ -76,12 +76,12 @@ export WM_PRECISION_OPTION=DP
 
 #- Optimised, debug, profiling:
 #    WM_COMPILE_OPTION = Opt | Debug | Prof
-export WM_COMPILE_OPTION=Opt
+export WM_COMPILE_OPTION=Debug
 
 #- MPI implementation:
 #    WM_MPLIB = SYSTEMOPENMPI | OPENMPI | MPICH | MPICH-GM | HPMPI
 #               | GAMMA | MPI | QSMPI | SGIMPI
-export WM_MPLIB=OPENMPI
+export WM_MPLIB=SGIMPI
 
 #- Operating System:
 #    WM_OSTYPE = POSIX | ???
diff --git a/src/OpenFOAM/db/IOstreams/Pstreams/Pstream.H b/src/OpenFOAM/db/IOstreams/Pstreams/Pstream.H
index c26e0d7..3d2f3d9 100644
--- a/src/OpenFOAM/db/IOstreams/Pstreams/Pstream.H
+++ b/src/OpenFOAM/db/IOstreams/Pstreams/Pstream.H
@@ -294,6 +294,14 @@ public:
                 const bool block = true
             );
 
+            template<class Container, class T>
+            static void exchange
+            (
+                const List<Container >&,
+                List<Container >&,
+                const int tag = UPstream::msgType(),
+                const bool block = true
+            );
 };
 
 
diff --git a/src/OpenFOAM/db/IOstreams/Pstreams/PstreamBuffers.C b/src/OpenFOAM/db/IOstreams/Pstreams/PstreamBuffers.C
index 24eec63..35b4170 100644
--- a/src/OpenFOAM/db/IOstreams/Pstreams/PstreamBuffers.C
+++ b/src/OpenFOAM/db/IOstreams/Pstreams/PstreamBuffers.C
@@ -88,7 +88,6 @@ void Foam::PstreamBuffers::finishedSends(const bool block)
         (
             sendBuf_,
             recvBuf_,
-            sizes,
             tag_,
             block
         );
diff --git a/src/OpenFOAM/db/IOstreams/Pstreams/UPstream.H b/src/OpenFOAM/db/IOstreams/Pstreams/UPstream.H
index 1ad4aa9..d8d46bc 100644
--- a/src/OpenFOAM/db/IOstreams/Pstreams/UPstream.H
+++ b/src/OpenFOAM/db/IOstreams/Pstreams/UPstream.H
@@ -376,7 +376,7 @@ public:
         //- Abort program
         static void abort();
 
-
+        static void alltoall(int* sendCounts, int* recvCounts);
 };
 
 
diff --git a/src/OpenFOAM/db/IOstreams/Pstreams/exchange.C b/src/OpenFOAM/db/IOstreams/Pstreams/exchange.C
index 8c0cb4b..62a9419 100644
--- a/src/OpenFOAM/db/IOstreams/Pstreams/exchange.C
+++ b/src/OpenFOAM/db/IOstreams/Pstreams/exchange.C
@@ -38,7 +38,6 @@ namespace Foam
 
 // * * * * * * * * * * * * * * * Member Functions  * * * * * * * * * * * * * //
 
-//template<template<class> class ListType, class T>
 template<class Container, class T>
 void Pstream::exchange
 (
@@ -149,6 +148,128 @@ void Pstream::exchange
     recvBufs[Pstream::myProcNo()] = sendBufs[Pstream::myProcNo()];
 }
 
+// My version without sizes argument and alltoall
+//template<template<class> class ListType, class T>
+template<class Container, class T>
+void Pstream::exchange
+(
+    const List<Container>& sendBufs,
+    List<Container>& recvBufs,
+    const int tag,
+    const bool block
+)
+{
+    if (!contiguous<T>())
+    {
+        FatalErrorIn
+        (
+            "Pstream::exchange(..)"
+        )   << "Continuous data only." << Foam::abort(FatalError);
+    }
+
+    if (sendBufs.size() != UPstream::nProcs())
+    {
+        FatalErrorIn
+        (
+            "Pstream::exchange(..)"
+        )   << "Size of list:" << sendBufs.size()
+            << " does not equal the number of processors:"
+            << UPstream::nProcs()
+            << Foam::abort(FatalError);
+    }
+
+    /*sizes.setSize(UPstream::nProcs());
+    labelList& nsTransPs = sizes[UPstream::myProcNo()];
+    nsTransPs.setSize(UPstream::nProcs());*/
+    int nProcs = UPstream::nProcs();
+    int *sendCounts = new int[nProcs];
+    int *recvCounts = new int[nProcs];
+
+    forAll(sendBufs, procI)
+    {
+    	//nsTransPs[procI] = sendBufs[procI].size();
+    	sendCounts[procI] = sendBufs[procI].size();
+    }
+
+    // Send sizes across. Note: blocks.
+    //combineReduce(sizes, UPstream::listEq(), tag);
+    UPstream::alltoall(sendCounts, recvCounts);
+
+    if (Pstream::parRun())
+    {
+        label startOfRequests = Pstream::nRequests();
+
+        // Set up receives
+        // ~~~~~~~~~~~~~~~
+
+        recvBufs.setSize(sendBufs.size());
+
+        //forAll(sizes, procI)
+        for(int procI = 0; procI < nProcs; procI++)
+        {
+            //label nRecv = sizes[procI][UPstream::myProcNo()];
+            label nRecv = recvCounts[procI];
+
+            if (procI != Pstream::myProcNo() && nRecv > 0)
+            {
+                recvBufs[procI].setSize(nRecv);
+                UIPstream::read
+                (
+                    UPstream::nonBlocking,
+                    procI,
+                    reinterpret_cast<char*>(recvBufs[procI].begin()),
+                    nRecv*sizeof(T),
+                    tag
+                );
+            }
+        }
+
+
+        // Set up sends
+        // ~~~~~~~~~~~~
+
+        forAll(sendBufs, procI)
+        {
+            if (procI != Pstream::myProcNo() && sendBufs[procI].size() > 0)
+            {
+                if
+                (
+                   !UOPstream::write
+                    (
+                        UPstream::nonBlocking,
+                        procI,
+                        reinterpret_cast<const char*>(sendBufs[procI].begin()),
+                        sendBufs[procI].size()*sizeof(T),
+                        tag
+                    )
+                )
+                {
+                    FatalErrorIn("Pstream::exchange(..)")
+                        << "Cannot send outgoing message. "
+                        << "to:" << procI << " nBytes:"
+                        << label(sendBufs[procI].size()*sizeof(T))
+                        << Foam::abort(FatalError);
+                }
+            }
+        }
+
+
+        // Wait for all to finish
+        // ~~~~~~~~~~~~~~~~~~~~~~
+
+        if (block)
+        {
+            Pstream::waitRequests(startOfRequests);
+        }
+    }
+
+    delete[] sendCounts;
+    delete[] recvCounts;
+
+    // Do myself
+    recvBufs[Pstream::myProcNo()] = sendBufs[Pstream::myProcNo()];
+}
+
 
 // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
 
diff --git a/src/Pstream/dummy/UPstream.C b/src/Pstream/dummy/UPstream.C
index 183a86c..5a95ac6 100644
--- a/src/Pstream/dummy/UPstream.C
+++ b/src/Pstream/dummy/UPstream.C
@@ -104,5 +104,7 @@ bool Foam::UPstream::finishedRequest(const label i)
     return false;
 }
 
+void Foam::UPstream::alltoall(int* sendCounts, int* recvCounts) {
+}
 
 // ************************************************************************* //
diff --git a/src/Pstream/mpi/UPstream.C b/src/Pstream/mpi/UPstream.C
index bf7af5d..b95d789 100644
--- a/src/Pstream/mpi/UPstream.C
+++ b/src/Pstream/mpi/UPstream.C
@@ -378,5 +378,9 @@ bool Foam::UPstream::finishedRequest(const label i)
     return flag != 0;
 }
 
+void Foam::UPstream::alltoall(int* sendCounts, int* recvCounts) {
+    MPI_Alltoall(sendCounts, 1, MPI_INT, recvCounts, 1, MPI_INT, MPI_COMM_WORLD);
+}
+
 
 // ************************************************************************* //
changes.diff (7,320 bytes)   

user4

2014-06-24 12:12

  ~0003137

Thanks for the detailed analysis. Have you tried any other mpi implementations? With the use of MPI_Alltoall what is the next bottleneck?

Mattijs

user826

2014-06-24 13:01

  ~0003138

Last edited: 2014-06-24 13:34

Dear Mattijs!

No, we haven't tried other MPI implementations. We only used Bullxmpi 1.1.16 (a derivate of OpenMPI).

Considering our specific testcase, the simple beam example with uniform motion diffusion (which corresponds in the above mentioned Whitepaper to Figure 5 on the right), when using MPI_Alltoall within one of the two overloads for PstreamBuffers::finishedSends(), 40% of the scaling overhead from 2048 to 4096 processes (i.e. the difference between actual runtime and theoretical runtime on optimal scaling) is caused by the second (unmodified) overload of PstreamBuffers::finishedSends() (and more precisely again by combineReduce()).

I could also send you the corresponding HPCToolkit profiles (~7MB) via Email in case you are interested.

Thomas

user4

2014-06-24 14:14

  ~0003139

Hi Thomas,

that might be interesting. I'm m.janssens at the company email.

MattijsJ

2016-04-07 16:48

reporter   ~0006098

Resolved in commit 56668b24066c32665026b0613109f8627a7c7789

Many thanks for the analysis. Please re-open if you come across other bottlenecks.

Issue History

Date Modified Username Field Change
2014-06-22 16:50 user826 New Issue
2014-06-22 16:50 user826 File Added: changes.diff
2014-06-24 12:12 user4 Note Added: 0003137
2014-06-24 13:01 user826 Note Added: 0003138
2014-06-24 13:34 user826 Note Edited: 0003138
2014-06-24 14:14 user4 Note Added: 0003139
2016-04-07 16:48 MattijsJ Note Added: 0006098
2016-04-07 17:44 henry Status new => resolved
2016-04-07 17:44 henry Fixed in Version => dev
2016-04-07 17:44 henry Resolution open => fixed
2016-04-07 17:44 henry Assigned To => henry