Each C source file can be thought of as a "C-Class", or the best code structure I could think of for near-OOP in C.
Each "object" consists of a structure and methods or functions that operate on the items in that structure -
collected in one file with one header. The header file provides a void pointer to the structure so the implementation
may be changed without affecting other parts of the application. See the very simple
TestMpiIocCApp.h
below for example; it provides a
typedef
for the user of the class and a few simple methods for constructing the object, starting the application, and
destroying the object when the application is finished, as well as checking the rank and size.
When any of the "C-Class" functions are invoked, the first thing the function does (except for the "constructor") is examine the supplied object to see if it's the correct type. This is done by having the structure contain an integer that is unique to objects of this type and checking it whenever one of these external-facing functions is invoked. All functions not listed in the header are static, and no unnecessary conversions are performed.
TestMpiIocC.c
TestMpiIocCApp
code. This "main" constructs the test application object, starts it running, and destroys it when the program
finishes. After constructing the application, the program could check licensing including making sure the number of
cores on which it is running doesn't exceed a license limit. Other than this, there are probably no necessary
changes required here except perhaps a name-change.
TestMpiIocCApp.h
TestMpiIocCApp.c
TestMpiIocCMsg.h
TestMpiIocCMsg.c
MpiIocCAppLayer.h
MpiIocCAppLayer.c
MpiIocCMpiAssist.h
MpiIocCMpiAssist.c
MICMA_DATATYPE_PREFIX
takes the structure name and number of components, the second one
MICMA_DATATYPE_ITEM
is used for each of the items in the structure that must be transferred between nodes, and the third one
MICMA_DATATYPE_SUFFIX
finishes the type definition. The second one calls the
MICMA_setComponentInfo
routine which is available in the header file, but otherwise not intended to be used directly. These functions will
be used by all the message types created for the application. It's used by the Terminate and Capabilities messages
(see below) as well as the Test message (see above).
MpiIocCBTree.h
MpiIocCBTree.c
MpiIocCMsgTerminate.h
MpiIocCMsgTerminate.c
MpiIocCMsgCapabilities.h
MpiIocCMsgCapabilities.c
TestMpiIocC
main begins by creating an object of the test application using
the "constructor" function, it then starts the application, and when the application is finished and control is
returned to the main, it calls the "destructor" function and exits. Any time after the constructor is called, the
user application may access the rank and size.
TA_startupCallback
which allows the user application to
examine the capabilities of all the nodes, determine how to initially distribute the work, &c. and send the
first messages that begin the communications. A test message is broadcast to all the nodes and control reverts to
the abstraction layer.
Keep in mind that because the nodes are all running asynchronously, the printed messages explaining what's happening
may not be in a recognizable order. Run the application with this command:
mpiexec -n 3 bin/TestMpiIocCApp
and it may produce this print:
(0:3) MICMC_dump (capabilities) message id:737179 tag:59 INFO:: cores:8 name:Eshnunna upTime:889736 load:(60992,46752,36704) ram:33599250432 (1:3) MICMC_dump (capabilities) message id:737179 tag:59 INFO:: cores:8 name:Eshnunna upTime:889736 load:(60992,46752,36704) ram:33599250432 (2:3) MICMC_dump (capabilities) message id:737179 tag:59 INFO:: cores:8 name:Eshnunna upTime:889736 load:(60992,46752,36704) ram:33599250432 (main) TA_startupCallback Application starting. (1:3) TMICM_dump (test message) message id:238971 tag:23 (1:3) MICMT_dump (terminate) message id:898379 (2:3) TMICM_dump (test message) message id:238971 tag:23 (2:3) MICMT_dump (terminate) message id:898379 (0:3) TMICM_dump (test message) message id:238971 tag:23 (main) TA_testMsgCallback Application ending.
The capabilities messages may arrive in any order, even if they arrived in numerical order for the test shown here. Once all the capabilities are received, the "Application starting." message will appear, and then the remainder of the messages - in whatever order happens. The test application could use the capabilities messages to see how many instances are running on the individual CPU / blades by looking at the name of the node ("Eshnunna", in this case). The application may then divide the available RAM based on how many instances are running on this node and sharing memory and cores. The application may designate one node to be the read (and buffer) or write node for the other instances running on this same CPU / blade - to reduce message overhead, for example.
In this example, rank=1 (see the first column information for the "(rank:size)" identifier) received the test message, sent it back to the control node, and then received the terminate message. Rank=2 also received the test message, sent that back to the control node and then received the terminate message. The control node received the test message from somewhere (not shown - could've been from either rank=1 or rank=2, but probably from rank=1) and then sent the terminate message to the other nodes.
This system was built with cmake, and the
CMakeLists.txt
files are here and here. You will have to adjust them for your own environment.