Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(662)

Issue 6344063: TypeId: add hashing and map lookup (Closed)

Can't Edit
Can't Publish+Mail
Start Review
Created:
8 years, 9 months ago by Peter Barnes
Modified:
6 years, 3 months ago
CC:
ns-3-reviews_googlegroups.com
Visibility:
Public.

Description

This is the second of three of patches to address Bug 582, "Tags are not serialized and deserialized from Packet::Serialize and Packet::Deserialize" https://www.nsnam.org/bugzilla/show_bug.cgi?id=582 <repeat of Hash discussion> One issue is to serialize the *type* of the derived Tag, when the Packet only has a PacketTagList to work from. This serialization has to produce a consistent result on every compute node in a parallel cluster, even one with heterogeneous nodes. This can be particularly tricky if the different instances of ns3 don't create TypeId's in the same order, which results in a single type (by name) being represented by different TypeId values (integers) on different nodes. One approach, partially implemented by Jeffrey Young at Georgia Tech, is to serialize the TypeId name (a string). This seems unattractive because it takes time and space, typically much more space than the tag data itself (which is limited to 20 bytes). The alternative approach proposed here is to add a consistent 32-bit hash to the TypeId, then pass that to identify the tag type. To simplify review, I've split this into three patches: 1. Add class Hash: generic hash function interface, with two implementations. 2. Add hashes to TypeId (this review). 3. Use TypeId hash to serialize Tags. </repeat> This patch does three things: * Add a hash value to the TypeId; allow lookup by hash. * Add two maps to IidManager, to speed lookups by name or hash. Microbenchmarks show this is significantly faster with our 400 registered TypeIds. * Add a TypeId test suite. Note: this is to be applied *on top of* the Hash patch, http://codereview.appspot.com/6357056/

Patch Set 1 #

Total comments: 13

Patch Set 2 : Revised patch #

Unified diffs Side-by-side diffs Delta from patch set Stats (+553 lines, -56 lines) Patch
M src/core/model/type-id.h View 1 5 chunks +35 lines, -5 lines 0 comments Download
M src/core/model/type-id.cc View 1 21 chunks +196 lines, -51 lines 0 comments Download
A src/core/test/type-id-test-suite.cc View 1 1 chunk +321 lines, -0 lines 0 comments Download
M src/core/wscript View 1 1 chunk +1 line, -0 lines 0 comments Download

Messages

Total messages: 4
Jeff Y
Hi Peter, This looks good. I had two minor comments: 1) Possibly use "TypeId" instead ...
8 years, 8 months ago (2012-07-23 19:13:11 UTC) #1
Tom Henderson
http://codereview.appspot.com/6344063/diff/1/src/core/test/type-id-test-suite.cc File src/core/test/type-id-test-suite.cc (right): http://codereview.appspot.com/6344063/diff/1/src/core/test/type-id-test-suite.cc#newcode169 src/core/test/type-id-test-suite.cc:169: // Performance test we have a performance test suite ...
8 years, 8 months ago (2012-07-25 13:25:25 UTC) #2
Mathieu Lacage
ack http://codereview.appspot.com/6344063/diff/1/src/core/model/type-id.cc File src/core/model/type-id.cc (right): http://codereview.appspot.com/6344063/diff/1/src/core/model/type-id.cc#newcode125 src/core/model/type-id.cc:125: // is only 2 x 10^-4. The probability ...
8 years, 8 months ago (2012-07-28 08:01:30 UTC) #3
Peter Barnes
8 years, 4 months ago (2012-11-29 06:57:36 UTC) #4
http://codereview.appspot.com/6344063/diff/1/src/core/model/type-id.cc
File src/core/model/type-id.cc (right):

http://codereview.appspot.com/6344063/diff/1/src/core/model/type-id.cc#newcod...
src/core/model/type-id.cc:125: // is only 2 x 10^-4.  The probability for a
three-fold collision is
On 2012/07/28 08:01:30, Mathieu Lacage wrote:
> should be 10^-21, no ?

Do you mean n/k = 2^-21?  That's the average occupancy per bucket.  We need the
expected number of collisions.  

Referring to:
http://www.math.dartmouth.edu/archive/m19w03/public_html/Section6-5.pdf
 
Number of buckets = k = 2^bits

    E(collisions) = n - k + k (1 - 1/k)^n

where n = number of entries in table.

This form is numerically unstable for low collision rates.  Expanding for large
k we get

              1    n        1    n        1    n
    E(c)  =  --- (   )  -  --- (   )  +  --- (   )  -  ...
              k    2       k^2   3       k^3   4

              1    n   [        n - 2     (n - 2)(n - 3)         ]
          =  --- (   ) |  1  -  -----  +  --------------  - ...  |
              k    2   [         3 k          12 k^2             ]

Using the first two terms, k = 2^31, and n = 2^10, I calculate E(c) = 2.4 10^-4.

For the three-fold probability, I take n/k E(c) = 1.2 10^-10

I'll update the numbers.

http://codereview.appspot.com/6344063/diff/1/src/core/model/type-id.h
File src/core/model/type-id.h (right):

http://codereview.appspot.com/6344063/diff/1/src/core/model/type-id.h#newcode75
src/core/model/type-id.h:75: typedef Hash32_t hash_value_type;
On 2012/07/28 08:01:30, Mathieu Lacage wrote:
> oh, come on....

I was waiting for that ;) 

I removed this, then realized collisions can be dealt with via a trivial fourth
method:  switch to 64-bit hashes.  To do this, you have to change all the
uint32_t type names *that refer to hashes* (there are many) and not get any
other uint32_t's (there are also many).

On balance, it seems to me that hash_value_type is a reasonable future-proofing
compromise:  it clearly indicates value semantics, and makes it trivial to
switch to 64-bit hashes if needed in the future.
Sign in to reply to this message.

Powered by Google App Engine
RSS Feeds Recent Issues | This issue
This is Rietveld f62528b