WEST Work Manager
Introduction
WWMGR is the parallel task distribution framework originally included as part of the WEMD source. It was extracted to permit independent development, and (more importantly) independent testing. A number of different schemes can be selected at run-time for distributing work across multiple cores/nodes, as follows:
Name |
Implementation |
Multi-Core |
Multi-Node |
Appropriate For |
---|---|---|---|---|
serial |
None |
No |
No |
Testing, minimizing overhead when dynamics is inexpensive |
threads |
Python “threading” module |
Yes |
No |
Dynamics propagated by external executables, large amounts of data transferred per segment |
processes |
Python “multiprocessing” module |
Yes |
No |
Dynamics propagated by Python routines, modest amounts of data transferred per segment |
mpi |
mpi4py compiled and linked against system MPI |
Yes |
Yes |
Distributing calculations across multiple nodes. Start with this on your cluster of choice. |
zmq |
Yes |
Yes |
Distributing calculations across multiple nodes. Use this if MPI does not work properly on your cluster (particularly for spawning child processes). |
Environment variables
For controlling task distribution
While the original WEMD work managers were controlled by command-line options and entries in wemd.cfg, the new work manager is controlled using command-line options or environment variables (much like OpenMP). These variables are as follow:
Variable |
Applicable to |
Default |
Meaning |
---|---|---|---|
WM_WORK_MANAGER |
(none) |
processes |
Use the given task distribution system: “serial”, “threads”, “processes”, or “zmq” |
WM_N_WORKERS |
threads, processes, zmq |
number of cores in machine |
Use this number of workers. In the case of zmq, use this many workers on the current machine only (can be set independently on different nodes). |
WM_ZMQ_MODE |
zmq |
server |
Start as a server (“server”) or a client (“client”). Servers coordinate a given calculation, and clients execute tasks related to that calculation. |
WM_ZMQ_TASK_TIMEOUT |
zmq |
60 |
Time (in seconds) after which a worker will be considered hung, terminated, and restarted. This must be updated for long-running dynamics segments. Set to zero to disable hang checks entirely. |
WM_ZMQ_TASK_ENDPOINT |
zmq |
Random port |
Master distributes tasks at this address |
WM_ZMQ_RESULT_ENDPOINT |
zmq |
Random port |
Master receives task results at this address | |
WM_ZMQ_ANNOUNCE_ENDPOINT |
zmq |
Random port |
Master publishes announcements (such as “shut down now”) at this address |
WM_ZMQ_SERVER_INFO |
zmq |
|
A file describing the above endpoints can be found here (to ease cluster-wide startup) |
For passing information to workers
One environment variable is made available by multi-process work managers (processes and ZMQ) to help clients configure themselves (e.g. select an appropriate GPU on a multi-GPU node):
Variable |
Applicable to |
Meaning |
---|---|---|
WM_PROCESS_INDEX |
processes, zmq |
Contains an integer, 0 based, identifying the process among the set of processes started on a given node. |
The ZeroMQ work manager for clusters
The ZeroMQ (“zmq”) work manager can be used for both single-machine and cluster-wide communication. Communication occurs over sockets using the ZeroMQ messaging protocol. Within nodes, Unix sockets are used for efficient communication, while between nodes, TCP sockets are used. This also minimizes the number of open sockets on the master node.
The quick and dirty guide to using this on a cluster is as follows:
source env.sh
export WM_WORK_MANAGER=zmq
export WM_ZMQ_COMM_MODE=tcp
export WM_ZMQ_SERVER_INFO=$WEST_SIM_ROOT/wemd_server_info.json
w_run &
# manually run w_run on each client node, as appropriate for your batch system
# e.g. qrsh -inherit for Grid Engine, or maybe just simple SSH
for host in $(cat $TMPDIR/machines | sort | uniq); do
qrsh -inherit -V $host $PWD/node-ltc1.sh &
done