WEST Work Manager

Introduction

WWMGR is the parallel task distribution framework originally included as part of the WEMD source. It was extracted to permit independent development, and (more importantly) independent testing. A number of different schemes can be selected at run-time for distributing work across multiple cores/nodes, as follows:

Name

Implementation

Multi-Core

Multi-Node

Appropriate For

serial

None

No

No

Testing, minimizing overhead when dynamics is inexpensive

threads

Python “threading” module

Yes

No

Dynamics propagated by external executables, large amounts of data transferred per segment

processes

Python “multiprocessing” module

Yes

No

Dynamics propagated by Python routines, modest amounts of data transferred per segment

mpi

mpi4py compiled and linked against system MPI

Yes

Yes

Distributing calculations across multiple nodes. Start with this on your cluster of choice.

zmq

ZeroMQ and PyZMQ

Yes

Yes

Distributing calculations across multiple nodes. Use this if MPI does not work properly on your cluster (particularly for spawning child processes).

Environment variables

For controlling task distribution

While the original WEMD work managers were controlled by command-line options and entries in wemd.cfg, the new work manager is controlled using command-line options or environment variables (much like OpenMP). These variables are as follow:

Variable

Applicable to

Default

Meaning

WM_WORK_MANAGER

(none)

processes

Use the given task distribution system: “serial”, “threads”, “processes”, or “zmq”

WM_N_WORKERS

threads, processes, zmq

number of cores in machine

Use this number of workers. In the case of zmq, use this many workers on the current machine only (can be set independently on different nodes).

WM_ZMQ_MODE

zmq

server

Start as a server (“server”) or a client (“client”). Servers coordinate a given calculation, and clients execute tasks related to that calculation.

WM_ZMQ_TASK_TIMEOUT

zmq

60

Time (in seconds) after which a worker will be considered hung, terminated, and restarted. This must be updated for long-running dynamics segments. Set to zero to disable hang checks entirely.

WM_ZMQ_TASK_ENDPOINT

zmq

Random port

Master distributes tasks at this address

WM_ZMQ_RESULT_ENDPOINT

zmq

Random port

Master receives task results at this address |

WM_ZMQ_ANNOUNCE_ENDPOINT

zmq

Random port

Master publishes announcements (such as “shut down now”) at this address

WM_ZMQ_SERVER_INFO

zmq

zmq_server_info_PID_ID.json (where PID is a process ID and ID is a nearly random hex number)

A file describing the above endpoints can be found here (to ease cluster-wide startup)

For passing information to workers

One environment variable is made available by multi-process work managers (processes and ZMQ) to help clients configure themselves (e.g. select an appropriate GPU on a multi-GPU node):

Variable

Applicable to

Meaning

WM_PROCESS_INDEX

processes, zmq

Contains an integer, 0 based, identifying the process among the set of processes started on a given node.

The ZeroMQ work manager for clusters

The ZeroMQ (“zmq”) work manager can be used for both single-machine and cluster-wide communication. Communication occurs over sockets using the ZeroMQ messaging protocol. Within nodes, Unix sockets are used for efficient communication, while between nodes, TCP sockets are used. This also minimizes the number of open sockets on the master node.

The quick and dirty guide to using this on a cluster is as follows:

source env.sh
export WM_WORK_MANAGER=zmq
export WM_ZMQ_COMM_MODE=tcp
export WM_ZMQ_SERVER_INFO=$WEST_SIM_ROOT/wemd_server_info.json

w_run &

# manually run w_run on each client node, as appropriate for your batch system
# e.g. qrsh -inherit for Grid Engine, or maybe just simple SSH

for host in $(cat $TMPDIR/machines | sort | uniq); do
   qrsh -inherit -V $host $PWD/node-ltc1.sh &
done