Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(187)

Issue 4631059: code review 4631059: runtime: replace Semacquire/Semrelease implementation (Closed)

Can't Edit
Can't Publish+Mail
Start Review
Created:
9 years, 4 months ago by dvyukov
Modified:
9 years, 3 months ago
Reviewers:
CC:
rsc, golang-dev
Visibility:
Public.

Description

runtime: replace Semacquire/Semrelease implementation 1. The implementation uses distributed hash table of waitlists instead of a centralized one. It significantly improves scalability for uncontended semaphores. 2. The implementation provides wait-free fast-path for signalers. 3. The implementation uses less locks (1 lock/unlock instead of 5 for Semacquire). 4. runtime·ready() call is moved out of critical section. 5. Semacquire() does not call semwake(). Benchmark results on HP Z600 (2 x Xeon E5620, 8 HT cores, 2.40GHz) are as follows: benchmark old ns/op new ns/op delta runtime_test.BenchmarkSemaUncontended 58.20 36.30 -37.63% runtime_test.BenchmarkSemaUncontended-2 199.00 18.30 -90.80% runtime_test.BenchmarkSemaUncontended-4 327.00 9.20 -97.19% runtime_test.BenchmarkSemaUncontended-8 491.00 5.32 -98.92% runtime_test.BenchmarkSemaUncontended-16 946.00 4.18 -99.56% runtime_test.BenchmarkSemaSyntNonblock 59.00 36.80 -37.63% runtime_test.BenchmarkSemaSyntNonblock-2 167.00 138.00 -17.37% runtime_test.BenchmarkSemaSyntNonblock-4 333.00 129.00 -61.26% runtime_test.BenchmarkSemaSyntNonblock-8 464.00 130.00 -71.98% runtime_test.BenchmarkSemaSyntNonblock-16 1015.00 136.00 -86.60% runtime_test.BenchmarkSemaSyntBlock 58.80 36.70 -37.59% runtime_test.BenchmarkSemaSyntBlock-2 294.00 149.00 -49.32% runtime_test.BenchmarkSemaSyntBlock-4 333.00 177.00 -46.85% runtime_test.BenchmarkSemaSyntBlock-8 471.00 221.00 -53.08% runtime_test.BenchmarkSemaSyntBlock-16 990.00 227.00 -77.07% runtime_test.BenchmarkSemaWorkNonblock 829.00 832.00 +0.36% runtime_test.BenchmarkSemaWorkNonblock-2 425.00 419.00 -1.41% runtime_test.BenchmarkSemaWorkNonblock-4 308.00 220.00 -28.57% runtime_test.BenchmarkSemaWorkNonblock-8 394.00 147.00 -62.69% runtime_test.BenchmarkSemaWorkNonblock-16 1510.00 149.00 -90.13% runtime_test.BenchmarkSemaWorkBlock 828.00 813.00 -1.81% runtime_test.BenchmarkSemaWorkBlock-2 428.00 436.00 +1.87% runtime_test.BenchmarkSemaWorkBlock-4 232.00 219.00 -5.60% runtime_test.BenchmarkSemaWorkBlock-8 392.00 251.00 -35.97% runtime_test.BenchmarkSemaWorkBlock-16 1524.00 298.00 -80.45% sync_test.BenchmarkMutexUncontended 24.10 24.00 -0.41% sync_test.BenchmarkMutexUncontended-2 12.00 12.00 +0.00% sync_test.BenchmarkMutexUncontended-4 6.25 6.17 -1.28% sync_test.BenchmarkMutexUncontended-8 3.43 3.34 -2.62% sync_test.BenchmarkMutexUncontended-16 2.34 2.32 -0.85% sync_test.BenchmarkMutex 24.70 24.70 +0.00% sync_test.BenchmarkMutex-2 208.00 99.50 -52.16% sync_test.BenchmarkMutex-4 2744.00 256.00 -90.67% sync_test.BenchmarkMutex-8 5137.00 556.00 -89.18% sync_test.BenchmarkMutex-16 5368.00 1284.00 -76.08% sync_test.BenchmarkMutexSlack 24.70 25.00 +1.21% sync_test.BenchmarkMutexSlack-2 1094.00 186.00 -83.00% sync_test.BenchmarkMutexSlack-4 3430.00 402.00 -88.28% sync_test.BenchmarkMutexSlack-8 5051.00 1066.00 -78.90% sync_test.BenchmarkMutexSlack-16 6806.00 1363.00 -79.97% sync_test.BenchmarkMutexWork 793.00 792.00 -0.13% sync_test.BenchmarkMutexWork-2 398.00 398.00 +0.00% sync_test.BenchmarkMutexWork-4 1441.00 308.00 -78.63% sync_test.BenchmarkMutexWork-8 8532.00 847.00 -90.07% sync_test.BenchmarkMutexWork-16 8225.00 2760.00 -66.44% sync_test.BenchmarkMutexWorkSlack 793.00 793.00 +0.00% sync_test.BenchmarkMutexWorkSlack-2 418.00 414.00 -0.96% sync_test.BenchmarkMutexWorkSlack-4 4481.00 480.00 -89.29% sync_test.BenchmarkMutexWorkSlack-8 6317.00 1598.00 -74.70% sync_test.BenchmarkMutexWorkSlack-16 9111.00 3038.00 -66.66%

Patch Set 1 #

Patch Set 2 : diff -r 6f1145ee588d https://go.googlecode.com/hg/ #

Patch Set 3 : diff -r 6f1145ee588d https://go.googlecode.com/hg/ #

Total comments: 12

Patch Set 4 : diff -r 6f1145ee588d https://go.googlecode.com/hg/ #

Patch Set 5 : diff -r 6f1145ee588d https://go.googlecode.com/hg/ #

Patch Set 6 : diff -r 9e53a1312e25 https://go.googlecode.com/hg/ #

Total comments: 18

Patch Set 7 : diff -r 607e0f74161f https://go.googlecode.com/hg/ #

Patch Set 8 : diff -r 607e0f74161f https://go.googlecode.com/hg/ #

Patch Set 9 : diff -r 607e0f74161f https://go.googlecode.com/hg/ #

Patch Set 10 : diff -r 607e0f74161f https://go.googlecode.com/hg/ #

Total comments: 4

Patch Set 11 : diff -r 607e0f74161f https://go.googlecode.com/hg/ #

Total comments: 1

Patch Set 12 : diff -r 607e0f74161f https://go.googlecode.com/hg/ #

Unified diffs Side-by-side diffs Delta from patch set Stats (+134 lines, -99 lines) Patch
A src/pkg/runtime/386/atomic.c View 1 2 3 4 1 chunk +12 lines, -0 lines 0 comments Download
M src/pkg/runtime/Makefile View 1 2 3 4 5 6 1 chunk +1 line, -0 lines 0 comments Download
A src/pkg/runtime/amd64/atomic.c View 1 2 3 4 1 chunk +12 lines, -0 lines 0 comments Download
A src/pkg/runtime/arm/atomic.c View 1 2 3 4 5 6 1 chunk +12 lines, -0 lines 0 comments Download
M src/pkg/runtime/runtime.h View 1 2 3 1 chunk +3 lines, -0 lines 0 comments Download
M src/pkg/runtime/sema.goc View 1 2 3 4 5 6 7 8 9 10 11 2 chunks +94 lines, -99 lines 0 comments Download

Messages

Total messages: 42
dvyukov
Hello golang-dev@googlegroups.com (cc: dvyukov@google.com, golang-dev@googlegroups.com), I'd like you to review this change to https://go.googlecode.com/hg/
9 years, 4 months ago (2011-06-22 14:21:33 UTC) #1
dvyukov
Here is comparison graph for uncontended semaphore: http://chart.googleapis.com/chart?cht=lc&chs=600x500&chxt=x,y&chxr=0,1,16,1|1,0,427184&chma=10,10,10,10&chdlp=b&chdl=ideal|sem-distr|sem-distr_base&chg=6.666667,5&chco=AAAAAA,000000,000000&chls=1|3|2,2,2&chd=t%3A6.25,12.50,18.75,25.00,31.25,37.50,43.75,50.00,56.25,62.50,68.75,75.00,81.25,87.50,93.75,100.00|6.25,11.72,17.97,24.01,30.23,31.35,39.03,42.95,47.02,46.43,50.74,51.80,52.49,52.95,54.32,54.90|3.83,1.06,1.04,0.83,0.74,0.47,0.42,0.35,0.32,0.30,0.29,0.21,0.19,0.18,0.18,0.17 Here is comparison graph for contended nonblocking ...
9 years, 4 months ago (2011-06-22 14:35:27 UTC) #2
dvyukov
Here is comparison graph for uncontended semaphore: http://goo.gl/yZcE4 Here is comparison graph for contended nonblocking ...
9 years, 4 months ago (2011-06-22 14:42:13 UTC) #3
rsc
can we get the benchmarks checked in first? that way people can update to before ...
9 years, 4 months ago (2011-06-22 14:50:29 UTC) #4
dvyukov
On 2011/06/22 14:50:29, rsc wrote: > can we get the benchmarks checked in first? > ...
9 years, 4 months ago (2011-06-22 14:56:57 UTC) #5
rsc
I'm thrilled to see the performance improvements here. I'm less thrilled about the fact that ...
9 years, 4 months ago (2011-06-22 15:05:41 UTC) #6
rsc
On Wed, Jun 22, 2011 at 10:42, <dvyukov@google.com> wrote: > Here is comparison graph for ...
9 years, 4 months ago (2011-06-22 15:07:12 UTC) #7
dvyukov
On 2011/06/22 15:07:12, rsc wrote: > On Wed, Jun 22, 2011 at 10:42, <mailto:dvyukov@google.com> wrote: ...
9 years, 4 months ago (2011-06-22 15:34:19 UTC) #8
dvyukov
http://codereview.appspot.com/4631059/diff/4001/src/pkg/runtime/386/asm.s File src/pkg/runtime/386/asm.s (right): http://codereview.appspot.com/4631059/diff/4001/src/pkg/runtime/386/asm.s#newcode321 src/pkg/runtime/386/asm.s:321: TEXT runtime·load(SB), 7, $0 On 2011/06/22 15:05:41, rsc wrote: ...
9 years, 4 months ago (2011-06-22 15:51:53 UTC) #9
rsc
>> It's not clear to me why this is in assembly. >> Why not implement ...
9 years, 4 months ago (2011-06-22 16:06:56 UTC) #10
dvyukov
On 2011/06/22 15:05:41, rsc wrote: > I'm thrilled to see the performance improvements here. > ...
9 years, 4 months ago (2011-06-22 16:07:02 UTC) #11
rsc
>> Reading diffs is a central part of code review and how >> we make ...
9 years, 4 months ago (2011-06-22 16:11:09 UTC) #12
dvyukov
On 2011/06/22 16:06:56, rsc wrote: > >> It's not clear to me why this is ...
9 years, 4 months ago (2011-06-22 16:17:49 UTC) #13
dvyukov
On 2011/06/22 16:11:09, rsc wrote: > >> Reading diffs is a central part of code ...
9 years, 4 months ago (2011-06-22 16:19:01 UTC) #14
rsc
>> You can't, because the C compiler you are using has no asm builtin. > ...
9 years, 4 months ago (2011-06-22 16:24:00 UTC) #15
dvyukov
Hello rsc@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
9 years, 4 months ago (2011-06-22 16:41:53 UTC) #16
dvyukov
On 2011/06/22 16:24:00, rsc wrote: > >> You can't, because the C compiler you are ...
9 years, 4 months ago (2011-06-22 16:41:54 UTC) #17
dvyukov
Hello rsc@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
9 years, 4 months ago (2011-06-22 16:44:17 UTC) #18
dvyukov
Hello rsc@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
9 years, 3 months ago (2011-06-28 13:28:40 UTC) #19
rsc
http://codereview.appspot.com/4631059/diff/4/src/pkg/runtime/Makefile File src/pkg/runtime/Makefile (right): http://codereview.appspot.com/4631059/diff/4/src/pkg/runtime/Makefile#newcode84 src/pkg/runtime/Makefile:84: atomic.$O\ move up (sort) http://codereview.appspot.com/4631059/diff/4/src/pkg/runtime/arm/atomic.c File src/pkg/runtime/arm/atomic.c (right): http://codereview.appspot.com/4631059/diff/4/src/pkg/runtime/arm/atomic.c#newcode9 ...
9 years, 3 months ago (2011-06-28 14:08:51 UTC) #20
dvyukov
http://codereview.appspot.com/4631059/diff/4/src/pkg/runtime/Makefile File src/pkg/runtime/Makefile (right): http://codereview.appspot.com/4631059/diff/4/src/pkg/runtime/Makefile#newcode84 src/pkg/runtime/Makefile:84: atomic.$O\ On 2011/06/28 14:08:51, rsc wrote: > move up ...
9 years, 3 months ago (2011-06-28 14:55:18 UTC) #21
dvyukov
Hello rsc@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
9 years, 3 months ago (2011-06-28 14:56:18 UTC) #22
rsc
On Tue, Jun 28, 2011 at 10:55, <dvyukov@google.com> wrote: > Well, yes, it works. But ...
9 years, 3 months ago (2011-06-28 15:10:08 UTC) #23
dvyukov
Hello rsc@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
9 years, 3 months ago (2011-06-28 15:33:37 UTC) #24
rsc
Also the racy cnt access looks the same. http://codereview.appspot.com/4631059/diff/4/src/pkg/runtime/sema.goc File src/pkg/runtime/sema.goc (right): http://codereview.appspot.com/4631059/diff/4/src/pkg/runtime/sema.goc#newcode141 src/pkg/runtime/sema.goc:141: // ...
9 years, 3 months ago (2011-06-28 15:39:46 UTC) #25
dvyukov
On 2011/06/28 15:10:08, rsc wrote: > On Tue, Jun 28, 2011 at 10:55, <mailto:dvyukov@google.com> wrote: ...
9 years, 3 months ago (2011-06-28 15:40:13 UTC) #26
dvyukov
Hello rsc@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
9 years, 3 months ago (2011-06-28 15:43:04 UTC) #27
dvyukov
On 2011/06/28 15:39:46, rsc wrote: > Also the racy cnt access looks the same. Please ...
9 years, 3 months ago (2011-06-28 15:45:17 UTC) #28
rsc
> Since root->cnt is manipulated in a lock-free manner on fast-path, it > should be ...
9 years, 3 months ago (2011-06-28 15:51:39 UTC) #29
dvyukov
On 2011/06/28 15:51:39, rsc wrote: > > Since root->cnt is manipulated in a lock-free manner ...
9 years, 3 months ago (2011-06-28 16:40:06 UTC) #30
rsc
On Tue, Jun 28, 2011 at 12:40, <dvyukov@google.com> wrote: > On 2011/06/28 15:51:39, rsc wrote: ...
9 years, 3 months ago (2011-06-28 16:43:51 UTC) #31
dvyukov
Hello rsc@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
9 years, 3 months ago (2011-06-28 17:22:08 UTC) #32
dvyukov
On 2011/06/28 16:43:51, rsc wrote: > semqueue/semdequeue are adding a waiter. > cnt is the ...
9 years, 3 months ago (2011-06-28 17:22:11 UTC) #33
rsc
http://codereview.appspot.com/4631059/diff/24002/src/pkg/runtime/sema.goc File src/pkg/runtime/sema.goc (right): http://codereview.appspot.com/4631059/diff/24002/src/pkg/runtime/sema.goc#newcode120 src/pkg/runtime/sema.goc:120: runtime·xadd(&root->nwait, 1); There's an important but undocumented ordering here. ...
9 years, 3 months ago (2011-06-28 17:30:46 UTC) #34
dvyukov
Hello rsc@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
9 years, 3 months ago (2011-06-28 18:18:52 UTC) #35
dvyukov
On 2011/06/28 17:30:46, rsc wrote: > http://codereview.appspot.com/4631059/diff/24002/src/pkg/runtime/sema.goc > File src/pkg/runtime/sema.goc (right): > > http://codereview.appspot.com/4631059/diff/24002/src/pkg/runtime/sema.goc#newcode120 > ...
9 years, 3 months ago (2011-06-28 18:21:43 UTC) #36
rsc
http://codereview.appspot.com/4631059/diff/7004/src/pkg/runtime/sema.goc File src/pkg/runtime/sema.goc (right): http://codereview.appspot.com/4631059/diff/7004/src/pkg/runtime/sema.goc#newcode120 src/pkg/runtime/sema.goc:120: // root->nwait increment followed by the load of *addr ...
9 years, 3 months ago (2011-06-28 18:27:12 UTC) #37
dvyukov
Hello rsc@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
9 years, 3 months ago (2011-06-28 18:45:03 UTC) #38
rsc
LGTM Thanks for iterating. Do the numbers in the CL description need to be updated?
9 years, 3 months ago (2011-06-28 18:49:52 UTC) #39
dvyukov
On 2011/06/28 18:49:52, rsc wrote: > LGTM > > Thanks for iterating. Do the numbers ...
9 years, 3 months ago (2011-06-28 19:03:16 UTC) #40
rsc
LGTM
9 years, 3 months ago (2011-06-28 19:06:42 UTC) #41
rsc
9 years, 3 months ago (2011-06-28 19:10:21 UTC) #42
*** Submitted as http://code.google.com/p/go/source/detail?r=0d277395434f ***

runtime: replace Semacquire/Semrelease implementation
1. The implementation uses distributed hash table of waitlists instead of a
centralized one.
  It significantly improves scalability for uncontended semaphores.
2. The implementation provides wait-free fast-path for signalers.
3. The implementation uses less locks (1 lock/unlock instead of 5 for
Semacquire).
4. runtime·ready() call is moved out of critical section.
5. Semacquire() does not call semwake().
Benchmark results on HP Z600 (2 x Xeon E5620, 8 HT cores, 2.40GHz)
are as follows:
benchmark                                        old ns/op    new ns/op    delta
runtime_test.BenchmarkSemaUncontended                58.20        36.30  -37.63%
runtime_test.BenchmarkSemaUncontended-2             199.00        18.30  -90.80%
runtime_test.BenchmarkSemaUncontended-4             327.00         9.20  -97.19%
runtime_test.BenchmarkSemaUncontended-8             491.00         5.32  -98.92%
runtime_test.BenchmarkSemaUncontended-16            946.00         4.18  -99.56%

runtime_test.BenchmarkSemaSyntNonblock               59.00        36.80  -37.63%
runtime_test.BenchmarkSemaSyntNonblock-2            167.00       138.00  -17.37%
runtime_test.BenchmarkSemaSyntNonblock-4            333.00       129.00  -61.26%
runtime_test.BenchmarkSemaSyntNonblock-8            464.00       130.00  -71.98%
runtime_test.BenchmarkSemaSyntNonblock-16          1015.00       136.00  -86.60%

runtime_test.BenchmarkSemaSyntBlock                  58.80        36.70  -37.59%
runtime_test.BenchmarkSemaSyntBlock-2               294.00       149.00  -49.32%
runtime_test.BenchmarkSemaSyntBlock-4               333.00       177.00  -46.85%
runtime_test.BenchmarkSemaSyntBlock-8               471.00       221.00  -53.08%
runtime_test.BenchmarkSemaSyntBlock-16              990.00       227.00  -77.07%

runtime_test.BenchmarkSemaWorkNonblock              829.00       832.00   +0.36%
runtime_test.BenchmarkSemaWorkNonblock-2            425.00       419.00   -1.41%
runtime_test.BenchmarkSemaWorkNonblock-4            308.00       220.00  -28.57%
runtime_test.BenchmarkSemaWorkNonblock-8            394.00       147.00  -62.69%
runtime_test.BenchmarkSemaWorkNonblock-16          1510.00       149.00  -90.13%

runtime_test.BenchmarkSemaWorkBlock                 828.00       813.00   -1.81%
runtime_test.BenchmarkSemaWorkBlock-2               428.00       436.00   +1.87%
runtime_test.BenchmarkSemaWorkBlock-4               232.00       219.00   -5.60%
runtime_test.BenchmarkSemaWorkBlock-8               392.00       251.00  -35.97%
runtime_test.BenchmarkSemaWorkBlock-16             1524.00       298.00  -80.45%

sync_test.BenchmarkMutexUncontended                  24.10        24.00   -0.41%
sync_test.BenchmarkMutexUncontended-2                12.00        12.00   +0.00%
sync_test.BenchmarkMutexUncontended-4                 6.25         6.17   -1.28%
sync_test.BenchmarkMutexUncontended-8                 3.43         3.34   -2.62%
sync_test.BenchmarkMutexUncontended-16                2.34         2.32   -0.85%

sync_test.BenchmarkMutex                             24.70        24.70   +0.00%
sync_test.BenchmarkMutex-2                          208.00        99.50  -52.16%
sync_test.BenchmarkMutex-4                         2744.00       256.00  -90.67%
sync_test.BenchmarkMutex-8                         5137.00       556.00  -89.18%
sync_test.BenchmarkMutex-16                        5368.00      1284.00  -76.08%

sync_test.BenchmarkMutexSlack                        24.70        25.00   +1.21%
sync_test.BenchmarkMutexSlack-2                    1094.00       186.00  -83.00%
sync_test.BenchmarkMutexSlack-4                    3430.00       402.00  -88.28%
sync_test.BenchmarkMutexSlack-8                    5051.00      1066.00  -78.90%
sync_test.BenchmarkMutexSlack-16                   6806.00      1363.00  -79.97%

sync_test.BenchmarkMutexWork                        793.00       792.00   -0.13%
sync_test.BenchmarkMutexWork-2                      398.00       398.00   +0.00%
sync_test.BenchmarkMutexWork-4                     1441.00       308.00  -78.63%
sync_test.BenchmarkMutexWork-8                     8532.00       847.00  -90.07%
sync_test.BenchmarkMutexWork-16                    8225.00      2760.00  -66.44%

sync_test.BenchmarkMutexWorkSlack                   793.00       793.00   +0.00%
sync_test.BenchmarkMutexWorkSlack-2                 418.00       414.00   -0.96%
sync_test.BenchmarkMutexWorkSlack-4                4481.00       480.00  -89.29%
sync_test.BenchmarkMutexWorkSlack-8                6317.00      1598.00  -74.70%
sync_test.BenchmarkMutexWorkSlack-16               9111.00      3038.00  -66.66%

R=rsc
CC=golang-dev
http://codereview.appspot.com/4631059

Committer: Russ Cox <rsc@golang.org>
Sign in to reply to this message.

Powered by Google App Engine
RSS Feeds Recent Issues | This issue
This is Rietveld f62528b