LEFT | RIGHT |
1 Intended workflow is as follows. For example, you want to do some changes to· | 1 Intended workflow is as follows. For example, you want to do some changes to· |
2 sync.Once implementation and estimate it's impact on performance/scalability. | 2 sync.Once implementation and estimate its impact on performance/scalability. |
3 First, run the benchmark for sync.Once: | 3 First, run the benchmark for sync.Once: |
4 $ nice -20 ./scale -bench.run=once -bench.time=3 -bench.out=once.res -bench.proc
s=16 | 4 $ nice -20 ./scale -bench.run=once -bench.time=3 -bench.out=once.res -bench.proc
s=4 |
5 once 5535 136 114 120 120
132 123 132 110 127 142 129 120
118 124 125 | 5 once 5535 136 114 120 |
6 The numbers refer to throughput per millisecond for 1, 2, 3... number of worker
threads (GOMAXPROCS). | 6 The numbers refer to throughput per millisecond for 1-4 number |
7 You can visualise the results with: | 7 of worker threads (GOMAXPROCS) respectively. |
| 8 You can visualize the results with: |
8 $ ./scale -draw.file=once.res | 9 $ ./scale -draw.file=once.res |
9 http://chart.googleapis.com/chart?cht=lc&chs=600x500&chxt=x,y&chxr=0,1,16,1%7C1,
0,88560&chma=10,10,10,10&chdlp=b&chdl=ideal%7Conce&chg=6.666667,5&chco=AAAAAA,00
0000&chls=1%7C3&chd=t%3A6.25,12.50,18.75,25.00,31.25,37.50,43.75,50.00,56.25,62.
50,68.75,75.00,81.25,87.50,93.75,100.00%7C6.25,0.15,0.13,0.14,0.14,0.15,0.14,0.1
5,0.12,0.14,0.16,0.15,0.14,0.13,0.14,0.14 | |
10 Then, do the changes, and re-run the benchmark: | 10 Then, do the changes, and re-run the benchmark: |
11 $ nice -20 ./scale -bench.run=once -bench.time=3 -bench.out=once2.res -bench.pro
cs=16 | 11 $ nice -20 ./scale -bench.run=once -bench.time=3 -bench.out=once2.res -bench.pro
cs=4 |
12 once 5083 186 103 121 107
127 106 133 124 122 98 120 112
128 100 125 | 12 once 5083 186 103 121 |
13 Then, build the comparasion graph: | 13 Then, build the comparison graph: |
14 $ ./scale -draw.file=once2.res -draw.base=once.res | 14 $ ./scale -draw.file=once2.res -draw.base=once.res |
15 http://chart.googleapis.com/chart?cht=lc&chs=600x500&chxt=x,y&chxr=0,1,16,1%7C1,
0,88560&chma=10,10,10,10&chdlp=b&chdl=ideal%7Conce%7Conce_base&chg=6.666667,5&ch
co=AAAAAA,000000,000000&chls=1%7C3%7C2,2,2&chd=t%3A6.25,12.50,18.75,25.00,31.25,
37.50,43.75,50.00,56.25,62.50,68.75,75.00,81.25,87.50,93.75,100.00%7C5.74,0.21,0
.12,0.14,0.12,0.14,0.12,0.15,0.14,0.14,0.11,0.14,0.13,0.14,0.11,0.14%7C6.25,0.15
,0.13,0.14,0.14,0.15,0.14,0.15,0.12,0.14,0.16,0.15,0.14,0.13,0.14,0.14 | |
16 | 15 |
| 16 List of all benchmarks can be obtained as: |
| 17 $ ./scale -bench.procs=1 -bench.time=0 -bench.out=/dev/null |
17 | 18 |
18 BENCHMARK LIST: | |
19 goroutine-distr | |
20 goroutine-centr | |
21 Spawns a lot of goroutines concurrently. goroutine-centr test uses singl
e atomic | |
22 variable to track when all the goroutines are created and executed (this
single | |
23 var can hinder scalability). goroutine-distr test uses distributed varia
bles | |
24 to track completion (it does not create scalability bottleneck, but puts
additional | |
25 pressure onto GC). | |
26 Expected to scale almost linearly. | |
27 | |
28 sem-nonblock | |
29 sem-nonblock-slack | |
30 sem-block | |
31 sem-block-slack | |
32 sem-nonblock-work | |
33 sem-nonblock-slack-work | |
34 sem-block-work | |
35 sem-block-slack-work | |
36 Tests scalability of a semaphore under contention. nonblock test stresse
s fast-path, | |
37 that is, no goroutines are blocked on the semaphore. nonblock-slack is t
he same as | |
38 nonblock, but creates more goroutines than there are worker threads (Ms)
. block test | |
39 stresses slow-path, that is, some threads get blocked on the semaphore.
block-slack | |
40 is the same as block, but creates more goroutines than there are worker
threads (Ms). | |
41 work means some amount of local work. | |
42 Is not expected to scale. | |
43 | |
44 sem-distr | |
45 Tests scalability of a semaphore without contention. Each goroutine repe
atedly | |
46 releases and acquires a private semaphore. | |
47 Expected to scale almost linearly. | |
48 | |
49 chan-ring-unbuf | |
50 chan-ring-buf10 | |
51 chan-ring-buf100 | |
52 chan-ring-buf1000 | |
53 Tests channels scalability in single-producer/single-consumer mode. A se
t of goroutines is | |
54 arranged into "a ring" connected by channels, a number of small messages
is passed | |
55 through the ring. There are 4 variations: unbuf/buf10/buf100/buf1000 tha
t relate to | |
56 unbuffered channels and buffered channels of size 10/100/1000, respectiv
ely. | |
57 Buffered variations are expected to scale rather good. | |
58 | |
59 chan-pc-unbuf | |
60 chan-pc-buf10 | |
61 chan-pc-buf100 | |
62 chan-pc-buf1000 | |
63 chan-pc-unbuf-work | |
64 chan-pc-buf10-work | |
65 chan-pc-buf100-work | |
66 chan-pc-buf1000-work | |
67 Tests channels scalability in multi-procuder/multi-consumer mode. | |
68 N producer gorounites and N consumer goroutines stress single channel. | |
69 work means some amount of local work before/after item production/consum
tion. | |
70 Is not expected to scale. | |
71 | |
72 pingpong-unbuf | |
73 pingpong-unbuf-slack | |
74 pingpong-buf | |
75 pingpong-buf-slack | |
76 Tests channels scalability in single-producer/single-consumer mode. | |
77 slack means that there are more goroutines than worker threads. | |
78 | |
79 mutex-centr | |
80 mutex-centr-slack | |
81 mutex-centr-work | |
82 mutex-centr-slack-work | |
83 Tests sync.Mutex scalability under contention. | |
84 slack means that there are more goroutines than worker threads. | |
85 work means some amount of local work ouside of critical section. | |
86 Is not expected to scale. | |
87 | |
88 mutex-distr | |
89 Tests scalability of sync.Mutex without contention. | |
90 Expected to scale almost linearly. | |
91 | |
92 rwmutex-w1 | |
93 rwmutex-w10 | |
94 rwmutex-w1-slack | |
95 rwmutex-w10-slack | |
96 rwmutex-w1-work | |
97 rwmutex-w10-work | |
98 rwmutex-w1-slack-work | |
99 rwmutex-w10-slack-work | |
100 Tests sync.RWMutex scalability under contention. | |
101 w1/10 means 1%/10% of writes respectively. | |
102 slack means that there are more goroutines than worker threads. | |
103 work means some amount of local work in reader critical section. | |
104 Is not expected to scale. | |
105 | |
106 rwmutex-distr | |
107 Tests scalability of sync.RWMutex without contention. | |
108 Expected to scale almost linearly. | |
109 | |
110 condvar-nonblock | |
111 condvar-block | |
112 condvar-block-slack | |
113 Tests sync.Mutex scalability under contention. | |
114 slack means that there are more goroutines than worker threads. | |
115 Is not expected to scale. | |
116 | |
117 condvar-distr | |
118 Tests scalability of sync.WaitGroup without contention. | |
119 Expected to scale almost linearly. | |
120 | |
121 once | |
122 Tests sync.Once fast-path scalability under contention. A set of gorouti
nes· | |
123 repeatedly execute sync.Once.Do() on a single object. | |
124 Expected to scale almost linearly. | |
125 | |
126 waitgroup-centr | |
127 waitgroup-centr-work | |
128 Tests scalability of sync.WaitGroup under contention. | |
129 work means some amount of local work. | |
130 | |
131 waitgroup-distr | |
132 Tests scalability of sync.WaitGroup without contention. | |
133 | |
134 matmult-chan32 | |
135 matmult-chan16 | |
136 matmult-chan8 | |
137 matmult-wg32 | |
138 matmult-wg16 | |
139 matmult-wg8 | |
140 matmult-sync32 | |
141 matmult-sync16 | |
142 matmult-sync8 | |
143 Tests scheduler scalability on matrix multiplication workload with fork/
join style | |
144 parallelism. There are 3 variations as to how to join child goroutines:
by means of | |
145 channels, by means of sync.WaitGroup or by means of direct shared memory
notifications. | |
146 There are also 3 grain sizes: 32, 16 and 8, that relate to leaf tasks th
at do | |
147 32x32, 16x16 and 8x8 matrix multiplication, respectively. | |
148 Expected to scale almost linearly. | |
149 | |
LEFT | RIGHT |