DescriptionNull Message Parallel Scheduler
At LLNL we have been working on additions to NS-3 to improve the parallel performance using a null message based scheduler. Enclosed is a patch and general overview comments are enclosed below to help provide the context for what was changed.
I’ve tested with the existing parallel examples and some large problems sets we are running locally. On some of our larger tests we have observed 5x speedups.
This is a patch to add a new parallel scheduling algorithm based on null messages, a common parallel DES scheduling algorithm. The null message scheduler has better scaling properties when running on large numbers of nodes since it does not require a global communication. Null message replaces the global time synchronization with one based only on communication with neighborsNote that null message will not be faster for all network topologies, for example in a fully coupled topology (all MPI tasks have parallel P2P links with all other MPI Tasks) null message will perform slower than the existing algorithm. Keeping both algorithms available is desirable.
The design of the changes was based on the existing set of classes with modifications to allow more than one parallel scheduler implementation. The original design used a singleton like object, the MpiInterface class, to provide the interface from NS3 packages to the communication layer. In order to add a second algorithm option, MpiInterface was refactored along the lines of the Strategy pattern. A new singleton, the ParallelController, was introduced to delegate the calls to the concrete algorithm selection. An abstract base class ParallelCommunicationsInteface was added for concrete implementations to inherit from. Class diagram is shown below.
The existing parallel scheduler classes were renamed to “GrantedTimeWindowSimulationImpl” and “GrantedTime WindowMpiInterface” from the more generic “DistributedSimulationImpl” and MpiInterface in an attempt to reflect the choice of synchronization algorithms more clearly. NullMessageSimulationImpl and NullMessageMpiInterface are the corresponding null message additions.
The algorithm selection for which ParallelController strategy to use is done in the Enable method of the ParallelController and is based off of the SimulatorImplementationType global value so that the communications interface used corresponds to the Simulator being used. The parallel examples have been modified to allow this selection. If this patch is accepted, parallel user application codes will have to do some minor class renaming and ensure that the SimulatorImplementationType value is set before the Enable call.
The design should hopefully be sufficiently generic enough to allow for additional communication layers/algorithms. A new Scheduler and ParallelCommunicationsInterface implementations are required for each new algorithm.
The existing MpiReceiver class was retained as it was general enough to use for both implementations.
Patch Set 1 #
Total comments: 14
Patch Set 2 : Null message scheduler with code changes based on review comments. #MessagesTotal messages: 4
|