|
|
Created:
11 years, 4 months ago by funny.falcon Modified:
11 years, 4 months ago Reviewers:
dvyukov CC:
golang-dev, dvyukov, bradfitz, r Visibility:
Public. |
Descriptiontime: make timers heap 4-ary
This slightly improves performance when a lot of timers are present
$ misc/benchcmp ../old_timers_m.txt ../new_timers_m.txt
benchmark old ns/op new ns/op delta
BenchmarkAfterFunc 6884 6605 -4.05%
BenchmarkAfterFunc-2 4473 4144 -7.36%
BenchmarkAfterFunc-3 8601 6185 -28.09%
BenchmarkAfterFunc-4 9378 8773 -6.45%
BenchmarkAfter 7237 7278 +0.57%
BenchmarkAfter-2 4638 3923 -15.42%
BenchmarkAfter-3 8751 6239 -28.71%
BenchmarkAfter-4 9223 8737 -5.27%
BenchmarkStop 603 496 -17.74%
BenchmarkStop-2 795 577 -27.42%
BenchmarkStop-3 982 680 -30.75%
BenchmarkStop-4 1164 739 -36.51%
BenchmarkSimultaneousAfterFunc 657 593 -9.74%
BenchmarkSimultaneousAfterFunc-2 816 757 -7.23%
BenchmarkSimultaneousAfterFunc-3 844 830 -1.66%
BenchmarkSimultaneousAfterFunc-4 785 771 -1.78%
BenchmarkStartStop 238 239 +0.42%
BenchmarkStartStop-2 249 234 -6.02%
BenchmarkStartStop-3 271 268 -1.11%
BenchmarkStartStop-4 293 295 +0.68%
Patch Set 1 #Patch Set 2 : diff -r 9b8afda16f02 https://code.google.com/p/go #Patch Set 3 : diff -r 9b8afda16f02 https://code.google.com/p/go #Patch Set 4 : diff -r 9b8afda16f02 https://code.google.com/p/go #Patch Set 5 : diff -r 9b8afda16f02 https://code.google.com/p/go #Patch Set 6 : diff -r 9b8afda16f02 https://code.google.com/p/go #Patch Set 7 : diff -r 9b8afda16f02 https://code.google.com/p/go #
Total comments: 18
Patch Set 8 : diff -r 8bf13ae02c8e https://code.google.com/p/go #
Total comments: 2
Patch Set 9 : diff -r 8bf13ae02c8e https://code.google.com/p/go #Patch Set 10 : diff -r 8bf13ae02c8e https://code.google.com/p/go #
Total comments: 5
Patch Set 11 : diff -r 8bf13ae02c8e https://code.google.com/p/go #
Total comments: 2
Patch Set 12 : diff -r 8bf13ae02c8e https://code.google.com/p/go #
Total comments: 2
MessagesTotal messages: 27
Hello golang-dev@googlegroups.com (cc: golang-dev@googlegroups.com), I'd like you to review this change to https://code.google.com/p/go
Sign in to reply to this message.
please add benchmark on what you measured performance
Sign in to reply to this message.
Hello golang-dev@googlegroups.com, dvyukov@google.com (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
Hello golang-dev@googlegroups.com, dvyukov@google.com (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
Hello golang-dev@googlegroups.com, dvyukov@google.com (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
Hello golang-dev@googlegroups.com, dvyukov@google.com (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
R=dvyukov
Sign in to reply to this message.
https://codereview.appspot.com/13094043/diff/21001/src/pkg/time/sleep_test.go File src/pkg/time/sleep_test.go (right): https://codereview.appspot.com/13094043/diff/21001/src/pkg/time/sleep_test.go... src/pkg/time/sleep_test.go:110: func BenchmarkAfterFuncWithSmallGarbage(b *testing.B) { you still have lots of duplication in these 4 funcs, I think it will be better to do: func BenchmarkAfterFuncWithSmallGarbage(b *testing.B) { benchmarkAfterWithGarbage(b, BenchmarkAfterFunc, 128) } // 3 similar functions func benchmarkAfterWithGarbage(b *testing.B, f func(b *testing.B), n int) { // complete test impl w/o separate garbage/clearGarbage } https://codereview.appspot.com/13094043/diff/21001/src/pkg/time/sleep_test.go... src/pkg/time/sleep_test.go:127: garbage := garbage(8192) let's make this larger, say, 128<<10 if one has tens of thousands of tcp connections with 4 timers each (read/write, keepalive, resend, something else), he can easily hit 128K timers https://codereview.appspot.com/13094043/diff/21001/src/pkg/time/sleep_test.go... src/pkg/time/sleep_test.go:148: func BenchmarkSymAfterFunc(b *testing.B) { what does Sym in the name means? https://codereview.appspot.com/13094043/diff/21001/src/pkg/time/sleep_test.go... src/pkg/time/sleep_test.go:152: f = func() { just: AfterFunc(0, wait.Done) https://codereview.appspot.com/13094043/diff/21001/src/pkg/time/sleep_test.go... src/pkg/time/sleep_test.go:176: var wait sync.WaitGroup call the var wg, that's what we use throughout the tests this applies to other tests https://codereview.appspot.com/13094043/diff/21001/src/pkg/time/sleep_test.go... src/pkg/time/sleep_test.go:178: f = func() { you can do this as f := func() {} but you do not need even this, just pass nil to AfterFunc, we do not expect it to fire anyway this applies to other tests https://codereview.appspot.com/13094043/diff/21001/src/pkg/time/sleep_test.go... src/pkg/time/sleep_test.go:180: nth := 8 nth must depend on GOMAXPROCS, otherwise the results will be misleading with GOMAXPROCS=32 either nth := GOMAXPROCS or nth := GOMAXPROCS*8 https://codereview.appspot.com/13094043/diff/21001/src/pkg/time/sleep_test.go... src/pkg/time/sleep_test.go:182: worker := func() { no need to declare separate func, just for ... { go func() { ... }() } https://codereview.appspot.com/13094043/diff/21001/src/pkg/time/sleep_test.go... src/pkg/time/sleep_test.go:183: timers := make([]*Timer, b.N) divide b.N by nth, otherwise the result is misleading
Sign in to reply to this message.
the time.goc part looks good nice speedups!
Sign in to reply to this message.
https://codereview.appspot.com/13094043/diff/21001/src/pkg/time/sleep_test.go File src/pkg/time/sleep_test.go (right): https://codereview.appspot.com/13094043/diff/21001/src/pkg/time/sleep_test.go... src/pkg/time/sleep_test.go:110: func BenchmarkAfterFuncWithSmallGarbage(b *testing.B) { On 2013/08/19 22:59:40, dvyukov wrote: > you still have lots of duplication in these 4 funcs, I think it will be better > to do: > > func BenchmarkAfterFuncWithSmallGarbage(b *testing.B) { > benchmarkAfterWithGarbage(b, BenchmarkAfterFunc, 128) > } > > // 3 similar functions > > func benchmarkAfterWithGarbage(b *testing.B, f func(b *testing.B), n int) { > // complete test impl w/o separate garbage/clearGarbage > } Done. https://codereview.appspot.com/13094043/diff/21001/src/pkg/time/sleep_test.go... src/pkg/time/sleep_test.go:127: garbage := garbage(8192) On 2013/08/19 22:59:40, dvyukov wrote: > let's make this larger, say, 128<<10 > if one has tens of thousands of tcp connections with 4 timers each (read/write, > keepalive, resend, something else), he can easily hit 128K timers Done. https://codereview.appspot.com/13094043/diff/21001/src/pkg/time/sleep_test.go... src/pkg/time/sleep_test.go:152: f = func() { On 2013/08/19 22:59:40, dvyukov wrote: > just: > AfterFunc(0, wait.Done) Done. https://codereview.appspot.com/13094043/diff/21001/src/pkg/time/sleep_test.go... src/pkg/time/sleep_test.go:176: var wait sync.WaitGroup On 2013/08/19 22:59:40, dvyukov wrote: > call the var wg, that's what we use throughout the tests > this applies to other tests Done. https://codereview.appspot.com/13094043/diff/21001/src/pkg/time/sleep_test.go... src/pkg/time/sleep_test.go:178: f = func() { On 2013/08/19 22:59:40, dvyukov wrote: > you can do this as > f := func() {} > but you do not need even this, just pass nil to AfterFunc, we do not expect it > to fire anyway > this applies to other tests Done. https://codereview.appspot.com/13094043/diff/21001/src/pkg/time/sleep_test.go... src/pkg/time/sleep_test.go:180: nth := 8 On 2013/08/19 22:59:40, dvyukov wrote: > nth must depend on GOMAXPROCS, otherwise the results will be misleading with > GOMAXPROCS=32 > either nth := GOMAXPROCS > or nth := GOMAXPROCS*8 I thought that if `go test` is runned with -cpu flag, than benchmarks are runned GOMAXPROCS times in parallel, so that, if nth will be multiply of GOMAXPROCS, then total number of goroutines will be quadratic. Am I wrong? https://codereview.appspot.com/13094043/diff/21001/src/pkg/time/sleep_test.go... src/pkg/time/sleep_test.go:182: worker := func() { On 2013/08/19 22:59:40, dvyukov wrote: > no need to declare separate func, just > for ... { > go func() { > ... > }() > } Done. https://codereview.appspot.com/13094043/diff/21001/src/pkg/time/sleep_test.go... src/pkg/time/sleep_test.go:183: timers := make([]*Timer, b.N) On 2013/08/19 22:59:40, dvyukov wrote: > divide b.N by nth, otherwise the result is misleading Done.
Sign in to reply to this message.
https://codereview.appspot.com/13094043/diff/21001/src/pkg/time/sleep_test.go File src/pkg/time/sleep_test.go (right): https://codereview.appspot.com/13094043/diff/21001/src/pkg/time/sleep_test.go... src/pkg/time/sleep_test.go:180: nth := 8 On 2013/08/20 08:45:16, funny.falcon wrote: > On 2013/08/19 22:59:40, dvyukov wrote: > > nth must depend on GOMAXPROCS, otherwise the results will be misleading with > > GOMAXPROCS=32 > > either nth := GOMAXPROCS > > or nth := GOMAXPROCS*8 > > I thought that if `go test` is runned with -cpu flag, than benchmarks are runned > GOMAXPROCS times in parallel, so that, if nth will be multiply of GOMAXPROCS, > then total number of goroutines will be quadratic. > Am I wrong? -cpu just sets GOMAXPROCS, it always executes a single copy of benchmark
Sign in to reply to this message.
https://codereview.appspot.com/13094043/diff/30001/src/pkg/time/sleep_test.go File src/pkg/time/sleep_test.go (right): https://codereview.appspot.com/13094043/diff/30001/src/pkg/time/sleep_test.go... src/pkg/time/sleep_test.go:162: nth := 8 nth := runtime.GOMAXPROCS(0)
Sign in to reply to this message.
On 2013/08/20 09:12:09, dvyukov wrote: > https://codereview.appspot.com/13094043/diff/21001/src/pkg/time/sleep_test.go > File src/pkg/time/sleep_test.go (right): > > https://codereview.appspot.com/13094043/diff/21001/src/pkg/time/sleep_test.go... > src/pkg/time/sleep_test.go:180: nth := 8 > On 2013/08/20 08:45:16, funny.falcon wrote: > > On 2013/08/19 22:59:40, dvyukov wrote: > > > nth must depend on GOMAXPROCS, otherwise the results will be misleading with > > > GOMAXPROCS=32 > > > either nth := GOMAXPROCS > > > or nth := GOMAXPROCS*8 > > > > I thought that if `go test` is runned with -cpu flag, than benchmarks are > runned > > GOMAXPROCS times in parallel, so that, if nth will be multiply of GOMAXPROCS, > > then total number of goroutines will be quadratic. > > Am I wrong? > > -cpu just sets GOMAXPROCS, it always executes a single copy of benchmark Ah, ok. So that I remade tests spawning goroutines proportional to GOMAXPROCS. Could you look at?
Sign in to reply to this message.
https://codereview.appspot.com/13094043/diff/30001/src/pkg/time/sleep_test.go File src/pkg/time/sleep_test.go (right): https://codereview.appspot.com/13094043/diff/30001/src/pkg/time/sleep_test.go... src/pkg/time/sleep_test.go:162: nth := 8 On 2013/08/20 09:29:28, dvyukov wrote: > nth := runtime.GOMAXPROCS(0) Done.
Sign in to reply to this message.
https://codereview.appspot.com/13094043/diff/37001/src/pkg/time/sleep_test.go File src/pkg/time/sleep_test.go (right): https://codereview.appspot.com/13094043/diff/37001/src/pkg/time/sleep_test.go... src/pkg/time/sleep_test.go:72: func benchmark(b *testing.B, bench func(n int), garbage_size int, go_koef int) { do not use underscores, use camelCase garbageSize looks too long as compared to what other code here uses nt and ng (number of timers and number of goroutines per proc) would be fine https://codereview.appspot.com/13094043/diff/37001/src/pkg/time/sleep_test.go... src/pkg/time/sleep_test.go:81: if nth == 1 { do not need to special case general case is fine for large N https://codereview.appspot.com/13094043/diff/37001/src/pkg/time/sleep_test.go... src/pkg/time/sleep_test.go:89: bench((b.N + i) / nth) this has potential very unpleasant problem with load balancing. you give each goroutine the same amount of work, if one goroutine is delayed for some reason (e.g. a background process evicted the thread), then it will finish later, and all other goroutines will be idle during that period. so the result is the time to run X interations with 1 goroutine, Y interations with 2 goroutines, Z interations with 3 goroutines and so on, where X, Y and Z are unknown and different each time. this produces senseless results. I usually do it along the lines of: const Batch = 1000 P := runtime.GOMAXPROCS(0) N := int32(b.N / Batch) for p := 0; p < P; p++ { go func() { for atomic.AddInt32(&N, -1) >= 0 { bench(Batch) } }() } but it will change semantics of Simultaneous benchmarks, currently they create and destroy b.N timers, with this code they will create 1000 timers b.N/1000 times. I don't whether they will measure what you want them to measure in this case or not. https://codereview.appspot.com/13094043/diff/37001/src/pkg/time/sleep_test.go... src/pkg/time/sleep_test.go:104: i := n why do you need this copy? https://codereview.appspot.com/13094043/diff/37001/src/pkg/time/sleep_test.go... src/pkg/time/sleep_test.go:164: func BenchmarkStopWithBigGarbage(b *testing.B) { do we actually need that many different benchmarks? can you think of a better, more orthogonal benchmark set? I just usually benchmark with -benchtime=3s -cpu=1,2,4,8,16,32, and it will take closer to 10min to benchmark it all...
Sign in to reply to this message.
please fill in CLA as described here: http://golang.org/doc/contribute.html#tmp_17
Sign in to reply to this message.
This CL adds too many benchmarks that will likely take too much time.
Sign in to reply to this message.
On 2013/08/20 10:58:49, dvyukov wrote: > please fill in CLA as described here: > http://golang.org/doc/contribute.html#tmp_17 I've signed a hour ago, but got no notification yet. It is ok? Bench's number were reduced (only tests with garbage remains). I've remade benchmark using your pattern for parallel benchmarking. Remade benchmark shows degradation with SimultaneousAfterFunc, so that I optimize functions in time.goc a bit.
Sign in to reply to this message.
I've processed the CLA. On Tue, Aug 20, 2013 at 6:12 AM, <funny.falcon@gmail.com> wrote: > On 2013/08/20 10:58:49, dvyukov wrote: > >> please fill in CLA as described here: >> http://golang.org/doc/**contribute.html#tmp_17<http://golang.org/doc/contribu... >> > > I've signed a hour ago, but got no notification yet. It is ok? > > Bench's number were reduced (only tests with garbage remains). > > I've remade benchmark using your pattern for parallel benchmarking. > Remade benchmark shows degradation with SimultaneousAfterFunc, so that > I optimize functions in time.goc a bit. > > > https://codereview.appspot.**com/13094043/<https://codereview.appspot.com/130... >
Sign in to reply to this message.
On 2013/08/20 15:22:11, bradfitz wrote: > I've processed the CLA. Thanks!
Sign in to reply to this message.
I think this is the last nitpick, and we are ready to submit https://codereview.appspot.com/13094043/diff/46001/src/pkg/time/sleep_test.go File src/pkg/time/sleep_test.go (right): https://codereview.appspot.com/13094043/diff/46001/src/pkg/time/sleep_test.go... src/pkg/time/sleep_test.go:104: benchmark(b, 5000, func(n int) { please reduce this value to at least 1000 on my machine iteration of this benchmark takes ~10us, so 5000 iterations take ~50ms, this too large scheduling granularity https://codereview.appspot.com/13094043/diff/46001/src/pkg/time/sleep_test.go... src/pkg/time/sleep_test.go:130: benchmark(b, 40000, func(n int) { and similarly here, this is huge we already have 128K garbage timers, so benchmarking in 128K-129K range should be reasonably close to 128K-168K
Sign in to reply to this message.
Hello golang-dev@googlegroups.com, dvyukov@google.com, bradfitz@golang.org, r@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
https://codereview.appspot.com/13094043/diff/52001/src/pkg/runtime/time.goc File src/pkg/runtime/time.goc (right): https://codereview.appspot.com/13094043/diff/52001/src/pkg/runtime/time.goc#n... src/pkg/runtime/time.goc:240: t[p] = tmp; don't you mess the heap here? t[p].i must be equal to p i think you need to swap lines 240,241 and the same in siftdown
Sign in to reply to this message.
https://codereview.appspot.com/13094043/diff/52001/src/pkg/runtime/time.goc File src/pkg/runtime/time.goc (right): https://codereview.appspot.com/13094043/diff/52001/src/pkg/runtime/time.goc#n... src/pkg/runtime/time.goc:240: t[p] = tmp; On 2013/08/21 12:09:42, dvyukov wrote: > don't you mess the heap here? > t[p].i must be equal to p > i think you need to swap lines 240,241 > and the same in siftdown I could not understand what confuse you: - in original variant, tmp were always a pointer to a currently sifted up/down timer, - in this variant it is as well, but set before loop enter - after execution of line t[p] = tmp; following two lines are equivalent t[p]->i = p; tmp->i = p; so that, I choose to use second variant to simplify compiler's work. - switching lines 240<->241 doesn't change any semantic in single threaded environment - timer's heap is protected by a mutex. So, what do you mean exactly?
Sign in to reply to this message.
On 2013/08/21 14:14:10, funny.falcon wrote: > https://codereview.appspot.com/13094043/diff/52001/src/pkg/runtime/time.goc > File src/pkg/runtime/time.goc (right): > > https://codereview.appspot.com/13094043/diff/52001/src/pkg/runtime/time.goc#n... > src/pkg/runtime/time.goc:240: t[p] = tmp; > On 2013/08/21 12:09:42, dvyukov wrote: > > don't you mess the heap here? > > t[p].i must be equal to p > > i think you need to swap lines 240,241 > > and the same in siftdown > > I could not understand what confuse you: > - in original variant, tmp were always a pointer to a currently sifted up/down > timer, - in this variant it is as well, but set before loop enter > - after execution of line > t[p] = tmp; > following two lines are equivalent > t[p]->i = p; > tmp->i = p; > so that, I choose to use second variant to simplify compiler's work. > - switching lines 240<->241 doesn't change any semantic in single threaded > environment > - timer's heap is protected by a mutex. > > So, what do you mean exactly? you're right, sorry so some reason I assumed that tmp holds Timer by value
Sign in to reply to this message.
LGTM
Sign in to reply to this message.
*** Submitted as https://code.google.com/p/go/source/detail?r=cddeaea05b11 *** time: make timers heap 4-ary This slightly improves performance when a lot of timers are present $ misc/benchcmp ../old_timers_m.txt ../new_timers_m.txt benchmark old ns/op new ns/op delta BenchmarkAfterFunc 6884 6605 -4.05% BenchmarkAfterFunc-2 4473 4144 -7.36% BenchmarkAfterFunc-3 8601 6185 -28.09% BenchmarkAfterFunc-4 9378 8773 -6.45% BenchmarkAfter 7237 7278 +0.57% BenchmarkAfter-2 4638 3923 -15.42% BenchmarkAfter-3 8751 6239 -28.71% BenchmarkAfter-4 9223 8737 -5.27% BenchmarkStop 603 496 -17.74% BenchmarkStop-2 795 577 -27.42% BenchmarkStop-3 982 680 -30.75% BenchmarkStop-4 1164 739 -36.51% BenchmarkSimultaneousAfterFunc 657 593 -9.74% BenchmarkSimultaneousAfterFunc-2 816 757 -7.23% BenchmarkSimultaneousAfterFunc-3 844 830 -1.66% BenchmarkSimultaneousAfterFunc-4 785 771 -1.78% BenchmarkStartStop 238 239 +0.42% BenchmarkStartStop-2 249 234 -6.02% BenchmarkStartStop-3 271 268 -1.11% BenchmarkStartStop-4 293 295 +0.68% R=golang-dev, dvyukov, bradfitz, r CC=golang-dev https://codereview.appspot.com/13094043 Committer: Dmitriy Vyukov <dvyukov@google.com>
Sign in to reply to this message.
|