http://codereview.appspot.com/109068/diff/2034/2035 File examples/mpi/Makefile (right): http://codereview.appspot.com/109068/diff/2034/2035#newcode4 examples/mpi/Makefile:4: # Change the NS3BASE below to the path of ...
Here are the updates after Faker's comments. http://codereview.appspot.com/109068/diff/2034/2035 File examples/mpi/Makefile (right): http://codereview.appspot.com/109068/diff/2034/2035#newcode4 examples/mpi/Makefile:4: # Change ...
I hope that I answered all your questions. If you need more clarifications please don't ...
3 months ago
I hope that I answered all your questions. If you need more clarifications
please don't hesitate.
http://codereview.appspot.com/109068/diff/2034/2043
File src/common/packet.cc (right):
http://codereview.appspot.com/109068/diff/2034/2043#newcode187
src/common/packet.cc:187: Packet::Packet (uint32_t mpiRank, uint32_t uid,
uint32_t size)
Yes I know probably not you fault but actually those constructors have a lot of
common code that can be factorized into an init function which will be called by
all the constructors, it's not that the actual code is bad, it's just that there
is a lot of copy and paste between the different versions of the constructor.
http://codereview.appspot.com/109068/diff/2034/2043#newcode623
src/common/packet.cc:623: //XXX
commented code should not be included in code base. If it is commented, either
we uncomment it or we delete. What happens with commented code is that usually,
it is left there for ages and no one remembers why it is there, and it ends up
as a garbage unuseful code.
summary of main questions/comments: - it seemed to me that Packet/PacketMetadata might not need to ...
2 months, 3 weeks ago
summary of main questions/comments:
- it seemed to me that Packet/PacketMetadata might not need to be touched-- can
you please have a look at that issue in the review below?
- src/simulator module would acquire dependencies on src/common and src/devices
if mpi-interface.cc is left in src/simulator. Should mpi-interface.cc be in
src/devices or a new src/mpi directory?
- what happens if users do not interconnect nodes with MPI-capable interfaces,
due to misconfiguration or lack of understanding of the MPI applicability-- how
do programs behave (for instance, a Wifi channel with nodes on different
systems)?
- what should be the behavior when global routing and MPI are both enabled by
user? Perhaps this should be documented more than the one-line "continue" in
the code.
- needs manual chapter or better documentation, before release (not necessary to
merge, though)
I am not repeating the other comments left by Craig about build system, etc.
with which I agree.
http://codereview.appspot.com/109068/diff/6010/5043
File examples/mpi/nms-udp-nix.cc (right):
http://codereview.appspot.com/109068/diff/6010/5043#newcode53
examples/mpi/nms-udp-nix.cc:53: #endif
I wonder whether these guards are really necessary in the example programs. Is
the goal to make them build even if there is no MPI?
http://codereview.appspot.com/109068/diff/6010/5043#newcode418
examples/mpi/nms-udp-nix.cc:418: else if (rank == z%systemCount)
Is there a reason that the example instantiates all objects on all systems, and
then just turns some of the applications on selectively? Would a better example
be a program that instead only instantiated the necessary objects on each
system?
http://codereview.appspot.com/109068/diff/6010/5043#newcode541
examples/mpi/nms-udp-nix.cc:541: #if 0
Suggest to put this block of code into a conditional
if (tracing)
{
...
}
and enable via command line.
see examples/wireless/wifi-simple-adhoc-grid.cc, for instance.
http://codereview.appspot.com/109068/diff/6010/5044
File examples/mpi/test-distributed.cc (right):
http://codereview.appspot.com/109068/diff/6010/5044#newcode215
examples/mpi/test-distributed.cc:215: Address sinkLocalAddress(InetSocketAddress
(Ipv4Address::GetAny (), port));
by the way, I found some stray missing spaces throughout the whole patchset. I
wonder whether the GNU indent with -prs option can be used to catch these.
http://codereview.appspot.com/109068/diff/6010/5048
File src/common/packet-metadata.h (right):
http://codereview.appspot.com/109068/diff/6010/5048#newcode156
src/common/packet-metadata.h:156: uint32_t GetMpiRank (void) const;
This method is not used by any code. I wonder whether changes to PacketMetadata
or Packet are really needed for MPI. Does Packet really need to be associated
with a system ID at all times? It seems to me to try for some refactoring that
avoids touching class Packet or PacketMetadata. More on this below...
http://codereview.appspot.com/109068/diff/6010/5055
File src/devices/point-to-point/point-to-point-remote-channel.cc (right):
http://codereview.appspot.com/109068/diff/6010/5055#newcode78
src/devices/point-to-point/point-to-point-remote-channel.cc:78:
MPIInterface::SendPacket(p, rxTime, dst->GetNode()->GetId(), dst->GetIfIndex());
Here, you can call src->GetNode ()->GetSystemId () to get the system ID and then
pass as argument to SendPacket(). Would that avoid packet needing to carry
system ID?
http://codereview.appspot.com/109068/diff/6010/5062
File src/routing/global-routing/global-route-manager-impl.cc (right):
http://codereview.appspot.com/109068/diff/6010/5062#newcode552
src/routing/global-routing/global-route-manager-impl.cc:552: if
(node->GetSystemId() != MPIInterface::Rank()) continue;
In the example that instantiated every object everywhere and then only turned on
applications on the relevant system ID, wouldn't global routing still work in
that case?
Need to think about how global routing should interact with MPI. Should global
routing be extended to handle MPI? Should it be disabled by default when MPI is
enabled, forcing user to think about whether it will work for the coded
scenario? The above code will cause the global routing to partially complete
but not return an error.
http://codereview.appspot.com/109068/diff/6010/5066
File src/simulator/distributed-simulator-impl.h (right):
http://codereview.appspot.com/109068/diff/6010/5066#newcode61
src/simulator/distributed-simulator-impl.h:61:
doxygen missing
http://codereview.appspot.com/109068/diff/6010/5067
File src/simulator/mpi-interface.cc (right):
http://codereview.appspot.com/109068/diff/6010/5067#newcode28
src/simulator/mpi-interface.cc:28: #include "ns3/point-to-point-net-device.h"
This class shouldn't be in src/simulator if it depends on
point-to-point-net-device.h. Even making this class depend on class
Packet/Buffer introduces a new dependency on src/simulator (depends on
src/common) that wasn't there before.
I have two main general comments: 1) It's really not cool that my previous comments ...
2 months, 3 weeks ago
I have two main general comments:
1) It's really not cool that my previous comments have not been addressed before
submitting this patch for review.
2) I do not understand the need for the per-packet system id: I am not really
against it for now but I want to understand what bug it is fixing before ack-ing
it. i.e., what happens if you remove this part of the submitted patch ?
http://codereview.appspot.com/109068/diff/6010/5046
File src/common/buffer.h (right):
http://codereview.appspot.com/109068/diff/6010/5046#newcode499
src/common/buffer.h:499: Buffer CreateLimitedCopy (void) const;
rename to Serialize, add Deserialize and GetSerializedSize and use the same
signatures as the signatures I suggested for Packet.
I am almost certain I already suggested this previously too.
http://codereview.appspot.com/109068/diff/6010/5050
File src/common/packet.h (right):
http://codereview.appspot.com/109068/diff/6010/5050#newcode441
src/common/packet.h:441: Buffer SerializeForMpi (void) const;
I already suggested at least twice by email the alternate correct API for this.
Here is the relevant extract of my email:
--------------------------------------------
uint32_t Packet::GetSerializedSize (void) const;
// returns zero if input buffer was too small.
uint32_t Packet::Serialize (uint8_t *buffer, uint32_t size) const;
// returns zero if input buffer did not contain a complete message.
uint32_t Packet::Deserialize (uint8_t *buffer, uint32_t size);
You need to get hold of a tx buffer big enough but, at least, you know when and
where it is allocated and you can implement a per-mpi-request allocation policy
if you wish
to.
--------------------------------------------
I fail to see the point of code reviews if the resulting suggestions are
ignored/not acted upon. This takes a lot of time for everyone so, whenever you
re-submit a version of your code which does not address the
previously-identified issues, you make it less likely we will re-review the same
code with the same problems.
http://codereview.appspot.com/109068/diff/6010/5054
File src/devices/point-to-point/point-to-point-channel.h (right):
http://codereview.appspot.com/109068/diff/6010/5054#newcode100
src/devices/point-to-point/point-to-point-channel.h:100: protected:
That protected keyword leaves a really bad taste in my mouth. You could
trivially add a 'dst' argument to TransmitStart and leave the private in.
http://codereview.appspot.com/109068/diff/6010/5064
File src/simulator/default-simulator-impl.h (right):
http://codereview.appspot.com/109068/diff/6010/5064#newcode63
src/simulator/default-simulator-impl.h:63: protected:
Please, don't try to reuse this class for your simulator implementation.
Instead, you can copy/paste default-simulator-impl.* into
distributed-simulator-impl* and modify it to fit your needs. Yes, this brings
code duplication but it makes it easier to manage separately these separate
functionalities.
http://codereview.appspot.com/109068/diff/6010/5067
File src/simulator/mpi-interface.cc (right):
http://codereview.appspot.com/109068/diff/6010/5067#newcode26
src/simulator/mpi-interface.cc:26: #include "ns3/node.h"
the simulator module cannot depend on the node module. What we have done to deal
with this in the multithreading work is to move this node-specific code in a
separate module.
http://codereview.appspot.com/109068/diff/6010/5068
File src/simulator/mpi-interface.h (right):
http://codereview.appspot.com/109068/diff/6010/5068#newcode39
src/simulator/mpi-interface.h:39: class SentBuffer {
this class appears to be an implementation detail of this API so, it probably
would be best to defer its definition to the .cc file
http://codereview.appspot.com/109068/diff/6010/5068#newcode58
src/simulator/mpi-interface.h:58: class MPIInterface
To avoid exporting the mpi.h header publicly, I would suggest to use the
singleton and pimpl idioms. Namely:
in the .h:
class DistributedManager
{
public:
static DistributedManager *Get (void);
uint32_t GetRank (void);
...
private:
DistributedManagerImpl *m_impl;
};
in the .cc:
class DistributedManagerImpl {
public:
...
};
DistributedManager *
DistributedManager::Get (void)
{
static DistributedManager manager;
return &manager;
}
And yes, I think that you need to pick a better name than MpiInterface or
DistributedManager.
http://codereview.appspot.com/109068/diff/6010/5070
File src/simulator/simulator.h (right):
http://codereview.appspot.com/109068/diff/6010/5070#newcode618
src/simulator/simulator.h:618: static Ptr<SimulatorImpl>* PeekImpl(void);
what is wrong with SetImplementation and GetImplementation ? These additions
appear to be redundant with them.
Hi all, Sorry for the delay after the comments. We were working on switching the ...
2 months ago
Hi all,
Sorry for the delay after the comments. We were working on switching the
serialization API, according to Mathieu's comments. Since it has been a while,
I will provide a brief summary of the changes and the remaining issues.
*******
Changes
*******
-- Various small fixes according to the comments (code style, typos, etc, which
I responded to in the comments at the bottom of this reply)
-- Buffer API. Switched the serialization/deserialization methods to use a raw
buffer. Added methods to buffer, nix-vector, packet-metadata, and packet to
support this
-- Changed distributed-simulator-impl to use simulator-impl rather than
default-simulator-impl (per Mathieu's comment). In order to do this, we
introduced a pure virtual method in simulator-impl for GetSystemId. This method
was added to default-simulator-impl, distributed-simulator-impl, and
realtime-simulator-impl
*******
Issues
*******
-- Building with WAF. I want to do this, and I've tried several times to get
something to work. I know it is possible, but I'm just not that good at using
WAF. I need to re-start the discussion on ns-developers for help with this.
-- Dependency issues. Tom and Mathieu commented on dependencies in
src/simulator/distributed-simulator-impl and/or mpi-interface. We are open to
moving these classes wherever they fit best.
-- Rank in packet-metadata. We are still not sure why UID in packet-metadata
fits, but rank does not. Tom has pointed out that we could get away with taking
it out for now, since it isn't used. However, in the future, it will be
necessary. So I feel like we should figure out where it needs to go, especially
now that we aren't constrained by a tight release schedule (since we missed
ns-3.7).
-- new assert in node.cc causing distributed examples to seg. fault
(NS_ASSERT_MSG (Simulator::GetContext () == GetId (),...). I don't even know
what this does, so I'm not sure if it is a problem with our code, or if it is a
problem with the assert. All I know is if I take it out, the examples run fine.
-- stability issues. Craig sent an email about his concerns regarding stability
in MPI. I think there was a problem with an example script that he created
because he was trying to run MPI with non-point-to-point links. Currently, this
MPI code only works for point-to-point links. We will have to stress this in
the manual or tutorial when we get to that point.
http://codereview.appspot.com/109068/diff/6010/5041
File examples/mpi/Makefile (right):
http://codereview.appspot.com/109068/diff/6010/5041#newcode1
examples/mpi/Makefile:1: #
#################################################################
On 2009/11/13 20:20:02, craigdo wrote:
> I'm sure it's been mentioned, but this should be done in waf
Agreed.
http://codereview.appspot.com/109068/diff/6010/5042
File examples/mpi/README (right):
http://codereview.appspot.com/109068/diff/6010/5042#newcode9
examples/mpi/README:9: 1) Ensure that MPI is installed, as well as mpic++. In
Ubuntu
On 2009/11/13 20:20:02, craigdo wrote:
> Maybe mention Fedora conflict and workaround here.
Done.
http://codereview.appspot.com/109068/diff/6010/5042#newcode13
examples/mpi/README:13: 2) Set the CXX environment variable to the path of
mpic++.
On 2009/11/13 20:20:02, craigdo wrote:
> This should be a waf thing (and remove the makefile)
Agreed.
http://codereview.appspot.com/109068/diff/6010/5042#newcode17
examples/mpi/README:17: 3) Configure ns-3 with the --with-mpi option
On 2009/11/13 20:20:02, craigdo wrote:
> --with-xxx is typically used to provide a path, as in --with-nsc=../nsc
>
> I think this should be --enable-mpi for consistency
Done.
http://codereview.appspot.com/109068/diff/6010/5042#newcode23
examples/mpi/README:23: 4) Build ns-3 with the --with-mpi option
On 2009/11/13 20:20:02, craigdo wrote:
> Why do you have to build with an option. In every other case, the
configuration
> is sticky.
You are right...I forgot about this.
http://codereview.appspot.com/109068/diff/6010/5042#newcode40
examples/mpi/README:40: - LD_LIBRARY_PATH should be set to the debug
On 2009/11/13 20:20:02, craigdo wrote:
> If you use ./waf --run or ./waf shell this is taken care of for you.
Right again :)
http://codereview.appspot.com/109068/diff/6010/5042#newcode42
examples/mpi/README:42: Ex: export
'LD_LIBRARY_PATH=/home/user/repos/ns-3-distributed/build/debug:/usr/lib/openmpi/1.2.7-gcc/lib'
On 2009/11/13 20:20:02, craigdo wrote:
> We have avoided setting the LD_LIBRARY_PATH this way. What happens if you
have
> multiple ns-3 builds?
>
> Should really use ./waf --run or ./waf shell instead
Agreed.
http://codereview.appspot.com/109068/diff/6010/5042#newcode53
examples/mpi/README:53: 3) Edit the Makefile in the examples/mpi directory. The
On 2009/11/13 20:20:02, craigdo wrote:
> Noooooo. Please, no Makefile that I have to edit!
:)
http://codereview.appspot.com/109068/diff/6010/5042#newcode59
examples/mpi/README:59: 5) The programs are now ready to run with mpirun. Here
are
On 2009/11/13 20:20:02, craigdo wrote:
> I think it would be better to keep this in the ns-3 waf world. How about
>
> ./waf --run "mpirun -np2 ./test-dirstributed ..."
Ok, I definitely agree with this and would like it to work this way. The
problem is...I don't know how.
http://codereview.appspot.com/109068/diff/6010/5042#newcode63
examples/mpi/README:63: mpirun -np 4 -machinefile mpihosts ./nms-udp-nix --LAN=2
--CN=2 --NIX=1
On 2009/11/13 20:20:02, craigdo wrote:
> If you provide these parameters, the simulation errors out complaining about
> number of CNs.
>
> You also might want to warn people that this might take an hour or two to run
> depending on their machine.
Whoops, number of logical processors needs to be >= number of campus networks.
Fixed.
http://codereview.appspot.com/109068/diff/6010/5043
File examples/mpi/nms-udp-nix.cc (right):
http://codereview.appspot.com/109068/diff/6010/5043#newcode25
examples/mpi/nms-udp-nix.cc:25: * (c) 2009, GTech Systems, Inc. - Alfred Park
<park@gtech-systems.com>
On 2009/11/13 20:20:02, craigdo wrote:
> Shouldn't copyright go up top?
Done.
http://codereview.appspot.com/109068/diff/6010/5043#newcode53
examples/mpi/nms-udp-nix.cc:53: #endif
On 2009/11/16 05:17:31, Tom Henderson wrote:
> I wonder whether these guards are really necessary in the example programs.
Is
> the goal to make them build even if there is no MPI?
I think the issue involves mpi.h possibly not existing. By checking for
NS3_MPI, we know that the user has enabled MPI, and WAF has checked to make sure
that they have MPI installed. The goal was not to build if there is no MPI but
to not fail if MPI is not enabled.
You are right though, these examples don't need to be built at all if MPI is not
enabled. I imagine this can be handled in the wscript somehow.
http://codereview.appspot.com/109068/diff/6010/5043#newcode418
examples/mpi/nms-udp-nix.cc:418: else if (rank == z%systemCount)
On 2009/11/16 05:17:31, Tom Henderson wrote:
> Is there a reason that the example instantiates all objects on all systems,
and
> then just turns some of the applications on selectively? Would a better
example
> be a program that instead only instantiated the necessary objects on each
> system?
Yes, the most efficient design would only instantiate objects on the appropriate
system. However, for ease-of-use with existing routing protocols, the full
topology is created on each system. Only the applications are system-specific.
This way a routing protocol like global routing can work with MPI.
http://codereview.appspot.com/109068/diff/6010/5043#newcode541
examples/mpi/nms-udp-nix.cc:541: #if 0
On 2009/11/16 05:17:31, Tom Henderson wrote:
> Suggest to put this block of code into a conditional
> if (tracing)
> {
> ...
> }
>
> and enable via command line.
>
> see examples/wireless/wifi-simple-adhoc-grid.cc, for instance.
Done.
http://codereview.appspot.com/109068/diff/6010/5044
File examples/mpi/test-distributed.cc (right):
http://codereview.appspot.com/109068/diff/6010/5044#newcode72
examples/mpi/test-distributed.cc:72: // Must have 2 and only 2 LPs
On 2009/11/13 20:20:02, craigdo wrote:
> How about "2 and only 2 Logical Processors (LPs)
>
> Do you expect people to know MPI jargon?
Done.
http://codereview.appspot.com/109068/diff/6010/5044#newcode95
examples/mpi/test-distributed.cc:95: Ptr<Node> node = CreateObject<Node> (0);
On 2009/11/13 20:20:02, craigdo wrote:
> Shouldn't system ID be an attribute? Node containers could use an object
> factory and take attributes in a setter (point-to-point-helper, many others
> already do this) as in,
>
> NodeContainer nodes;
> nodes.SetAttribute ("SystemId", UintegerValue (1));
> nodes.Create (4);
>
I was using the API that already existed. Should it be changed?
http://codereview.appspot.com/109068/diff/6010/5044#newcode103
examples/mpi/test-distributed.cc:103: Ptr<Node> routerNode1 = CreateObject<Node>
(0);
On 2009/11/13 20:20:02, craigdo wrote:
> This is handy. Could also use more verbose CreateObjectWithAttributes<Node>
> ("SystemId", UintegerValue (0)) for clarity if systemId were an attribute.
I'm guessing you would like it to be :)
http://codereview.appspot.com/109068/diff/6010/5044#newcode215
examples/mpi/test-distributed.cc:215: Address sinkLocalAddress(InetSocketAddress
(Ipv4Address::GetAny (), port));
On 2009/11/16 05:17:31, Tom Henderson wrote:
> by the way, I found some stray missing spaces throughout the whole patchset.
I
> wonder whether the GNU indent with -prs option can be used to catch these.
I tried using GNU indent once but failed. I will look into it again. It
doesn't look like "prs" is the option though. That puts a space after open
parenthesis and before close ( like this ). I'll try using "-gnu --no-tabs".
http://codereview.appspot.com/109068/diff/6010/5045
File src/common/buffer.cc (right):
http://codereview.appspot.com/109068/diff/6010/5045#newcode656
src/common/buffer.cc:656: Buffer::CreateLimitedCopy (void) const
On 2009/11/13 20:20:02, craigdo wrote:
> I don't think this method name is very clear. I tend to think truncation when
I
> see this.
>
> Buffer::CreateCompressedCopy?
> Buffer::CreateCopyWithoutPadding?
> Buffer::CopyWithoutZeroPadding?
> Buffer::CopyEncodeZeroPadding?
> Buffer::CopyCompress?
We have removed this method completely, since we are using the API that Mathieu
suggested.
http://codereview.appspot.com/109068/diff/6010/5046
File src/common/buffer.h (right):
http://codereview.appspot.com/109068/diff/6010/5046#newcode499
src/common/buffer.h:499: Buffer CreateLimitedCopy (void) const;
On 2009/11/13 20:20:02, craigdo wrote:
> There is nothing really limited about this copy. The padding is just encoded
> differently, right?
Again, this method has been removed.
http://codereview.appspot.com/109068/diff/6010/5051
File src/contrib/net-anim/point-to-point-grid-helper.cc (right):
http://codereview.appspot.com/109068/diff/6010/5051#newcode76
src/contrib/net-anim/point-to-point-grid-helper.cc:76:
PointToPointGridHelper::PointToPointGridHelper (uint32_t nRows,
On 2009/11/13 20:20:02, craigdo wrote:
> Could make this a compile-time error by moving the ifdef around the entire
> method?
I just deleted this method, because I haven't had the chance to test it enough.
http://codereview.appspot.com/109068/diff/6010/5054
File src/devices/point-to-point/point-to-point-channel.h (right):
http://codereview.appspot.com/109068/diff/6010/5054#newcode100
src/devices/point-to-point/point-to-point-channel.h:100: protected:
On 2009/11/17 14:53:19, Mathieu Lacage wrote:
> That protected keyword leaves a really bad taste in my mouth. You could
> trivially add a 'dst' argument to TransmitStart and leave the private in.
Done.
http://codereview.appspot.com/109068/diff/6010/5055
File src/devices/point-to-point/point-to-point-remote-channel.cc (right):
http://codereview.appspot.com/109068/diff/6010/5055#newcode77
src/devices/point-to-point/point-to-point-remote-channel.cc:77: #ifdef NS3_MPI
On 2009/11/13 20:20:02, craigdo wrote:
> Shouldn't this fatal error if it won't work rather than just not do something
> silently?
Done.
http://codereview.appspot.com/109068/diff/6010/5062
File src/routing/global-routing/global-route-manager-impl.cc (right):
http://codereview.appspot.com/109068/diff/6010/5062#newcode552
src/routing/global-routing/global-route-manager-impl.cc:552: if
(node->GetSystemId() != MPIInterface::Rank()) continue;
On 2009/11/16 05:17:31, Tom Henderson wrote:
> In the example that instantiated every object everywhere and then only turned
on
> applications on the relevant system ID, wouldn't global routing still work in
> that case?
>
> Need to think about how global routing should interact with MPI. Should
global
> routing be extended to handle MPI? Should it be disabled by default when MPI
is
> enabled, forcing user to think about whether it will work for the coded
> scenario? The above code will cause the global routing to partially complete
> but not return an error.
Yes, global routing will work with MPI. Since the full topology is created on
every logical processor, global routing computes routes for any node assigned to
the logical processor. It does not compute routes from nodes that are not
associated with this logical processor, since packets will never originate from
these nodes on this LP.
http://codereview.appspot.com/109068/diff/6010/5063
File src/simulator/default-simulator-impl.cc (right):
http://codereview.appspot.com/109068/diff/6010/5063#newcode103
src/simulator/default-simulator-impl.cc:103: uint32_t
DefaultSimulatorImpl::GetSystemId () const
On 2009/11/13 20:20:02, craigdo wrote:
> Why not in distributed version?
This method has been changed to pure virtual in the simulator-impl base class.
DefaultSimulator, DistributedSimulator, and RealTimeSimultor have been modified
to use this method. It should return zero by default (non-distributed case).
http://codereview.appspot.com/109068/diff/6010/5064
File src/simulator/default-simulator-impl.h (right):
http://codereview.appspot.com/109068/diff/6010/5064#newcode63
src/simulator/default-simulator-impl.h:63: protected:
On 2009/11/17 14:53:19, Mathieu Lacage wrote:
> Please, don't try to reuse this class for your simulator implementation.
> Instead, you can copy/paste default-simulator-impl.* into
> distributed-simulator-impl* and modify it to fit your needs. Yes, this brings
> code duplication but it makes it easier to manage separately these separate
> functionalities.
Done.
http://codereview.appspot.com/109068/diff/6010/5065
File src/simulator/distributed-simulator-impl.cc (right):
http://codereview.appspot.com/109068/diff/6010/5065#newcode41
src/simulator/distributed-simulator-impl.cc:41: using namespace std;
On 2009/11/13 20:20:02, craigdo wrote:
> We typically don't do this.
Done.
http://codereview.appspot.com/109068/diff/6010/5067
File src/simulator/mpi-interface.cc (right):
http://codereview.appspot.com/109068/diff/6010/5067#newcode38
src/simulator/mpi-interface.cc:38: using namespace std;
On 2009/11/13 20:20:02, craigdo wrote:
> We typically don't do this.
Done.
http://codereview.appspot.com/109068/diff/6010/5070
File src/simulator/simulator.h (right):
http://codereview.appspot.com/109068/diff/6010/5070#newcode618
src/simulator/simulator.h:618: static Ptr<SimulatorImpl>* PeekImpl(void);
On 2009/11/17 14:53:19, Mathieu Lacage wrote:
> what is wrong with SetImplementation and GetImplementation ? These additions
> appear to be redundant with them.
Done.
http://codereview.appspot.com/109068/diff/6010/5072
File wscript (right):
http://codereview.appspot.com/109068/diff/6010/5072#newcode197
wscript:197: opt.add_option('--with-mpi',
On 2009/11/13 20:20:02, craigdo wrote:
> I think this should be --enable-mpi for consistency
Done.
> -- stability issues. Craig sent an email about his concerns regarding > stability in ...
2 months ago
> -- stability issues. Craig sent an email about his concerns regarding
> stability in MPI. I think there was a problem with an example script
> that he created because he was trying to run MPI with non-point-to-
> point
> links. Currently, this MPI code only works for point-to-point links.
> We will have to stress this in the manual or tutorial when we get to
> that point.
Do you mean that you cannot use CSMA or Wifi at all -- even within the same
rank!? I would have expected this to be a primary use case.
It means that the net devices that directly communicate (sharing a single WIFI or CSMA ...
2 months ago
It means that the net devices that directly communicate (sharing a
single WIFI
or CSMA channel) cannot be split across ranks. Clearly, we could do a
distributed
sim with a bunch of separate wifi domains connected via a wired
infrastructure of
some kind. The MPI ranks would split across the wired (point to
point) links.
George
--------------------------------------------------
George Riley
Associate Professor
Georgia Tech
Electrical and Computer Engineering
riley@ece.gatech.edu
404-894-4767
Klaus Advanced Computing Building
Room 3360
266 Ferst Drive
Atlanta, Georgia 30332-0765
ECE6110 Web page:
http://users.ece.gatech.edu/~riley/ece6110/
ECE3090 Web page:
http://users.ece.gatech.edu/~riley/ece3090/
On Dec 10, 2009, at 2:05 PM, <craigdo@ee.washington.edu> wrote:
>> -- stability issues. Craig sent an email about his concerns
>> regarding
>> stability in MPI. I think there was a problem with an example script
>> that he created because he was trying to run MPI with non-point-to-
>> point
>> links. Currently, this MPI code only works for point-to-point links.
>> We will have to stress this in the manual or tutorial when we get to
>> that point.
>
> Do you mean that you cannot use CSMA or Wifi at all -- even within
> the same
> rank!? I would have expected this to be a primary use case.
>
>
>
>
>
Rearranged responses a little for clarity ... > >> -- stability issues. Craig sent an ...
2 months ago
Rearranged responses a little for clarity ...
> >> -- stability issues. Craig sent an email about his concerns
> >> regarding
> >> stability in MPI. I think there was a problem with an example
> script
> >> that he created because he was trying to run MPI with non-point-to-
> >> point
> >> links. Currently, this MPI code only works for point-to-point
> links.
> > Do you mean that you cannot use CSMA or Wifi at all -- even within
> > the same
> > rank!? I would have expected this to be a primary use case.
> It means that the net devices that directly communicate (sharing a
> single WIFI
> or CSMA channel) cannot be split across ranks. Clearly, we could do a
> distributed
> sim with a bunch of separate wifi domains connected via a wired
> infrastructure of
> some kind. The MPI ranks would split across the wired (point to
> point) links.
Okay, that makes more sense. Your last two sentences describe what I tried.
I took mixed-wireless.cc and changed the wireless backbone (N backbone
routers) to a single MPI point-to-point link and tired to run the result,
which then looked like:
//
// rank : rank
// :
// +--------------------------------------------------------+
// | MPI Point to Point Link |
// +--------------------------------------------------------+
// | : |
// | : |
// | : |
// +--------+ : +--------+
// wired LAN | mobile | : wired LAN | mobile |
// -----------| router | : -----------| router |
// +--------+ : +--------+
// | : |
// +----------------+ : +----------------+
// | 802.11 | : | 802.11 |
// | infra net | : | infra net |
// | K-1 hosts | : | K-1 hosts |
// +----------------+ : +----------------+
// :
//
I got asserts way down in Buffer which concerned me greatly.
-- Craig
joshpelkey@gmail.com wrote: > Hi all, > > Sorry for the delay after the comments. We ...
2 months ago
joshpelkey@gmail.com wrote:
> Hi all,
>
> Sorry for the delay after the comments. We were working on switching
> the serialization API, according to Mathieu's comments. Since it has
> been a while, I will provide a brief summary of the changes and the
> remaining issues.
>
> *******
> Changes
> *******
>
> -- Various small fixes according to the comments (code style, typos,
> etc, which I responded to in the comments at the bottom of this reply)
>
> -- Buffer API. Switched the serialization/deserialization methods to
> use a raw buffer. Added methods to buffer, nix-vector, packet-metadata,
> and packet to support this
>
> -- Changed distributed-simulator-impl to use simulator-impl rather than
> default-simulator-impl (per Mathieu's comment). In order to do this, we
> introduced a pure virtual method in simulator-impl for GetSystemId.
> This method was added to default-simulator-impl,
> distributed-simulator-impl, and realtime-simulator-impl
>
> *******
> Issues
> *******
>
> -- Building with WAF. I want to do this, and I've tried several times
> to get something to work. I know it is possible, but I'm just not that
> good at using WAF. I need to re-start the discussion on ns-developers
> for help with this.
>
> -- Dependency issues. Tom and Mathieu commented on dependencies in
> src/simulator/distributed-simulator-impl and/or mpi-interface. We are
> open to moving these classes wherever they fit best.
Suggest src/distributed or src/mpi (maybe the former in case non-mpi
distributed code comes into the simulator at some point)
>
> -- Rank in packet-metadata. We are still not sure why UID in
> packet-metadata fits, but rank does not. Tom has pointed out that we
> could get away with taking it out for now, since it isn't used.
> However, in the future, it will be necessary.
Can you describe where it will be necessary? If so, can rank be
higher-order bits of a (enlarged) 64-bit uid?
> http://codereview.appspot.com/109068/diff/6010/5043#newcode53
> examples/mpi/nms-udp-nix.cc:53: #endif
> On 2009/11/16 05:17:31, Tom Henderson wrote:
>> I wonder whether these guards are really necessary in the example
> programs. Is
>> the goal to make them build even if there is no MPI?
>
> I think the issue involves mpi.h possibly not existing. By checking for
> NS3_MPI, we know that the user has enabled MPI, and WAF has checked to
> make sure that they have MPI installed. The goal was not to build if
> there is no MPI but to not fail if MPI is not enabled.
>
> You are right though, these examples don't need to be built at all if
> MPI is not enabled. I imagine this can be handled in the wscript
> somehow.
Yes, I think waf can handle it.
>
> http://codereview.appspot.com/109068/diff/6010/5043#newcode418
> examples/mpi/nms-udp-nix.cc:418: else if (rank == z%systemCount)
> On 2009/11/16 05:17:31, Tom Henderson wrote:
>> Is there a reason that the example instantiates all objects on all
> systems, and
>> then just turns some of the applications on selectively? Would a
> better example
>> be a program that instead only instantiated the necessary objects on
> each
>> system?
>
> Yes, the most efficient design would only instantiate objects on the
> appropriate system. However, for ease-of-use with existing routing
> protocols, the full topology is created on each system. Only the
> applications are system-specific. This way a routing protocol like
> global routing can work with MPI.
guidance for users along these lines is probably useful. If you are not
trying to run a god's-eye routing protocol, why configure it like that?
>
> http://codereview.appspot.com/109068/diff/6010/5044#newcode215
> examples/mpi/test-distributed.cc:215: Address
> sinkLocalAddress(InetSocketAddress (Ipv4Address::GetAny (), port));
> On 2009/11/16 05:17:31, Tom Henderson wrote:
>> by the way, I found some stray missing spaces throughout the whole
> patchset. I
>> wonder whether the GNU indent with -prs option can be used to catch
> these.
>
> I tried using GNU indent once but failed. I will look into it again.
> It doesn't look like "prs" is the option though. That puts a space
> after open parenthesis and before close ( like this ). I'll try using
> "-gnu --no-tabs".
indent -pcs is the closest I could find, but it messes up templates:
Ptr < Node > ...
> http://codereview.appspot.com/109068/diff/6010/5062
> File src/routing/global-routing/global-route-manager-impl.cc (right):
>
> http://codereview.appspot.com/109068/diff/6010/5062#newcode552
> src/routing/global-routing/global-route-manager-impl.cc:552: if
> (node->GetSystemId() != MPIInterface::Rank()) continue;
> On 2009/11/16 05:17:31, Tom Henderson wrote:
>> In the example that instantiated every object everywhere and then only
> turned on
>> applications on the relevant system ID, wouldn't global routing still
> work in
>> that case?
>
>> Need to think about how global routing should interact with MPI.
> Should global
>> routing be extended to handle MPI? Should it be disabled by default
> when MPI is
>> enabled, forcing user to think about whether it will work for the
> coded
>> scenario? The above code will cause the global routing to partially
> complete
>> but not return an error.
>
> Yes, global routing will work with MPI. Since the full topology is
> created on every logical processor, global routing computes routes for
> any node assigned to the logical processor. It does not compute routes
> from nodes that are not associated with this logical processor, since
> packets will never originate from these nodes on this LP.
OK
We've changed uid in packet-metadata to a uint64_t. The rank is now stored in the ...
1 month, 3 weeks ago
We've changed uid in packet-metadata to a uint64_t. The rank is now stored in
the upper 32-bits of this uid. Since we changed the uid in packet-metadata to
64-bit, we also changed the packetUid in the ExtraItem structure in
packet-metadata to 64-bit. Not 100% sure if this was needed.
http://codereview.appspot.com/109068/diff/13046/13058
File src/common/packet.h (right):
http://codereview.appspot.com/109068/diff/13046/13058#newcode36
src/common/packet.h:36: #include "ns3/mpi-interface.h"
On 2009/12/14 13:44:33, Mathieu Lacage wrote:
> un-needed
Done.
http://codereview.appspot.com/109068/diff/13046/13058#newcode241
src/common/packet.h:241: Packet (uint32_t mpiRank, uint32_t uid, uint32_t size);
On 2009/12/14 13:44:33, Mathieu Lacage wrote:
> I am not sure I understand who calls this function but I suspect that to
> maintain a per-packet rank id, you need the class who creates a packet to do
> something special.
>
> You could get away with this requirement and make it transparent to you and
> everyone else if you made the uid allocation code in packet.cc call
> Simulator::GetSystemId () and mix it with its 32bit uid to produce a 64bit uid
> which is stored in PacketMetadata.
I like this idea, but I don't think it will work for what has to be done. This
new constructor is called after a packet has been sent to another rank and
immediately before packet deserialization. So if Simulator::GetSystemId () were
called, the wrong rank would be returned, since we want the rank of the
originating simulator. This information is only contained in the serialized
buffer, which is where we grab it from (along with the original uid) and pass it
to this new constructor.
http://codereview.appspot.com/109068/diff/13046/13058#newcode396
src/common/packet.h:396: uint32_t GetMpiRank (void) const;
On 2009/12/14 13:44:33, Mathieu Lacage wrote:
> If you really need a rank identifier for each packet, I think that it would
make
> sense to follow tom's suggestion, that is:
> - make GetUid return a uint64_t
> - make GetUid return the rank in the top 32 bits of the uid
I've removed this method and removed rank from packet-metadata. It is now
stored in the upper 32-bits as suggested.
http://codereview.appspot.com/109068/diff/13046/13071
File src/simulator/mpi-interface.cc (right):
http://codereview.appspot.com/109068/diff/13046/13071#newcode250
src/simulator/mpi-interface.cc:250: Simulator::Schedule (rxTime -
Simulator::Now(),
On 2009/12/14 13:44:33, Mathieu Lacage wrote:
> Call ScheduleWithContext here to avoid the NS_ASSERT in the node.cc code.
Thank you. Fixed.
http://codereview.appspot.com/109068/diff/13046/13075
File src/simulator/simulator-impl.h (right):
http://codereview.appspot.com/109068/diff/13046/13075#newcode56
src/simulator/simulator-impl.h:56: virtual uint32_t GetSystemId () const = 0;
On 2009/12/14 13:44:33, Mathieu Lacage wrote:
> If you do this, this method should also be exported from simulator.h
Done.
On Tue, Dec 15, 2009 at 6:10 PM, <joshpelkey@gmail.com> wrote: > > I am not ...
1 month, 3 weeks ago
On Tue, Dec 15, 2009 at 6:10 PM, <joshpelkey@gmail.com> wrote:
>
> I am not sure I understand who calls this function but I suspect that
>>
> to
>
>> maintain a per-packet rank id, you need the class who creates a packet
>>
> to do
>
>> something special.
>>
>
> You could get away with this requirement and make it transparent to
>>
> you and
>
>> everyone else if you made the uid allocation code in packet.cc call
>> Simulator::GetSystemId () and mix it with its 32bit uid to produce a
>>
> 64bit uid
>
>> which is stored in PacketMetadata.
>>
>
> I like this idea, but I don't think it will work for what has to be
> done. This new constructor is called after a packet has been sent to
> another rank and immediately before packet deserialization. So if
> Simulator::GetSystemId () were called, the wrong rank would be returned,
> since we want the rank of the originating simulator. This information
> is only contained in the serialized buffer, which is where we grab it
> from (along with the original uid) and pass it to this new constructor.
Presumably, you would create the new buffer from a serialized buffer in this
case with something like this:
Ptr<Packet> p = Create<Packet> ();
p->Deserialize (buffer, size);
In which case you can make Deserialize read the uid from the buffer and
ignore the one it got from the constructor.
Of course, you could also do this:
Ptr<Packet> p = Create<Packet> (buffer, size,
someMagicArgumentToCallTheRightConstructor);
in which case Deserialize is not needed anymore as public API.
You would need to create a special constructor with an extra argument of the
right type to make sure that the above calls your new constructor.
Mathieu
--
Mathieu Lacage <mathieu.lacage@gmail.com>
> > > Presumably, you would create the new buffer from a serialized buffer > ...
1 month, 3 weeks ago
>
>
> Presumably, you would create the new buffer from a serialized buffer
> in this case with something like this:
>
> Ptr<Packet> p = Create<Packet> ();
> p->Deserialize (buffer, size);
>
> In which case you can make Deserialize read the uid from the buffer
> and ignore the one it got from the constructor.
This won't work either, as we don't want UID's "disappearing".
Ignoring the one automatically
assigned in the constructor uses up one UID. That process would
create normal packets
2, 3, 5, 6, ... etc. What happened to 4? It got used up when
receiving the MPI packet.
George
>
> Of course, you could also do this:
>
> Ptr<Packet> p = Create<Packet> (buffer, size,
> someMagicArgumentToCallTheRightConstructor);
>
> in which case Deserialize is not needed anymore as public API.
>
> You would need to create a special constructor with an extra
> argument of the right type to make sure that the above calls your
> new constructor.
>
>
> Mathieu
> --
> Mathieu Lacage <mathieu.lacage@gmail.com>