|
|
Created:
10 years, 7 months ago by atom Modified:
10 years, 6 months ago Reviewers:
CC:
rsc, remyoudompheng, dsymonds, minux1, iant, iant2, golang-dev Visibility:
Public. |
Descriptionruntime: use reflect·call() to enter the function gc()
Garbage collection code (to be merged later) is calling functions
which have many local variables. This increases the probability that
the stack capacity won't be big enough to hold the local variables.
So, start gc() on a bigger stack to eliminate a potentially large number
of calls to runtime·morestack().
Patch Set 1 #Patch Set 2 : diff -r fc4e67d82024 https://go.googlecode.com/hg/ #Patch Set 3 : diff -r fc4e67d82024 https://go.googlecode.com/hg/ #
Total comments: 2
Patch Set 4 : diff -r fc4e67d82024 https://go.googlecode.com/hg/ #Patch Set 5 : diff -r 9544314de8e8 https://go.googlecode.com/hg/ #
Total comments: 6
Patch Set 6 : diff -r 616adac0bb59 https://go.googlecode.com/hg/ #Patch Set 7 : diff -r 9050012c9765 https://go.googlecode.com/hg/ #
MessagesTotal messages: 24
Hello rsc, remyoudompheng (cc: golang-dev@googlegroups.com), I'd like you to review this change to https://go.googlecode.com/hg/
Sign in to reply to this message.
Do you have performance numbers?
Sign in to reply to this message.
On 2012/11/13 07:40:56, dsymonds wrote: > Do you have performance numbers? The probability of stack reallocations during GC is low, but it can occur in regular Go programs. The worst case I encountered is test/bench/garbage/peano.go - the stack reallocations were causing performance issues there. I based my decision to send this CL on the worst case scenario.
Sign in to reply to this message.
On Tue, Nov 13, 2012 at 6:56 PM, <0xE2.0x9A.0x9B@gmail.com> wrote: > On 2012/11/13 07:40:56, dsymonds wrote: >> >> Do you have performance numbers? > > > The probability of stack reallocations during GC is low, but it can > occur in regular Go programs. The worst case I encountered is > test/bench/garbage/peano.go - the stack reallocations were causing > performance issues there. I based my decision to send this CL on the > worst case scenario. So can you post some numbers for the impact on peano?
Sign in to reply to this message.
On 2012/11/13 07:58:42, dsymonds wrote: > On Tue, Nov 13, 2012 at 6:56 PM, <mailto:0xE2.0x9A.0x9B@gmail.com> wrote: > > > On 2012/11/13 07:40:56, dsymonds wrote: > >> > >> Do you have performance numbers? > > > > > > The probability of stack reallocations during GC is low, but it can > > occur in regular Go programs. The worst case I encountered is > > test/bench/garbage/peano.go - the stack reallocations were causing > > performance issues there. I based my decision to send this CL on the > > worst case scenario. > > So can you post some numbers for the impact on peano? I am unable to make exact measurements on amd64 when running peano.go because I am using a virtualized environment (rhcloud.com). On i386, the number of calls to runtime·newstack() per 1 call to gc() is low: it ranges from 0 to 1000. So, the stack allocations are not an issue on i386. When it is about 1000, the measured time spent in runtime·newstack() while in gc() is about 0.04% of total program time. However, on amd64, the number of calls to runtime·newstack() per 1 call to gc() can be high: from 0 to 100000+. The behavior of the peano benchmark is stochastic: the number cab be 0, 1026, 9926, ..., 125470. I am unable to determine how this impacts GC performance. I presume that 100000 stack reallocations lead to a measureable performance degradation. --- Note: I added a new field 'moreframesize_minalloc' to struct M just below the extant field 'moreframesize'. However, on amd64 this positioning appears to be completely incorrect and I have to move the new field lower (for example: below 'racepc').
Sign in to reply to this message.
My benchmark result on linux/amd64 (I've run both the new and old versions 20 times, and select the best/worst case _pair_): worst case: benchmark old ns/op new ns/op delta garbage.BenchmarkPeano 65746437 69322040 +5.44% garbage.BenchmarkPeanoLastPause 6062509 6115177 +0.87% garbage.BenchmarkPeanoPause 2154311 2532654 +17.56% best case: benchmark old ns/op new ns/op delta garbage.BenchmarkPeano 70536831 65711182 -6.84% garbage.BenchmarkPeanoLastPause 6347188 6116066 -3.64% garbage.BenchmarkPeanoPause 2812915 2499263 -11.15% On 2012/11/13 09:47:43, atom wrote: > Note: I added a new field 'moreframesize_minalloc' to struct M just below the > extant field 'moreframesize'. However, on amd64 this positioning appears to be > completely incorrect and I have to move the new field lower (for example: below > 'racepc'). i think you can place the new field just below moreargsize. just remember to run "cmd/dist install pkg/runtime" before "go install -v std" whenever you changed struct M or G; the assembly routines in runtime might use out-of-date offset otherwise. btw, please 'hg sync' and 'hg upload 6846044' as the runtime.h change can't be applied cleanly.
Sign in to reply to this message.
On 2012/11/13 11:57:16, minux wrote: > My benchmark result on linux/amd64 (I've run both the new and old > versions 20 times, and select the best/worst case _pair_): > worst case: > benchmark old ns/op new ns/op delta > garbage.BenchmarkPeano 65746437 69322040 +5.44% > garbage.BenchmarkPeanoLastPause 6062509 6115177 +0.87% > garbage.BenchmarkPeanoPause 2154311 2532654 +17.56% > best case: > benchmark old ns/op new ns/op delta > garbage.BenchmarkPeano 70536831 65711182 -6.84% > garbage.BenchmarkPeanoLastPause 6347188 6116066 -3.64% > garbage.BenchmarkPeanoPause 2812915 2499263 -11.15% Did you apply CL 6114046? I made some modifications to my local copy of CL 6114046, and those are the numbers I reported. I didn't test without CL 6114046. I can publish the diff files if you like. In my opinion, the average out of the 20 runs would be more useful. > btw, please 'hg sync' and 'hg upload 6846044' as the runtime.h change > can't be applied cleanly. Done.
Sign in to reply to this message.
I can't help but feel that this would be better done via some sort of compiler pragma. You are introducing runtime overhead to every call to the garbage collector. But the compiler could easily say at the start of the function "give me at least this much space on the stack" with much less overhead. Much better than that would be some sort of runtime heuristic: if we had to split the stack several times beneath some function, then the next time we enter the function ask for a bigger stack. I have no concrete suggestions for how to implement that, though. https://codereview.appspot.com/6846044/diff/4002/src/pkg/runtime/mgc0.c File src/pkg/runtime/mgc0.c (right): https://codereview.appspot.com/6846044/diff/4002/src/pkg/runtime/mgc0.c#newco... src/pkg/runtime/mgc0.c:1097: // a problem in the past. This test should stay where it was, since that is where it matters.
Sign in to reply to this message.
Hello rsc@golang.org, remyoudompheng@gmail.com, dsymonds@golang.org, minux.ma@gmail.com, iant@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
https://codereview.appspot.com/6846044/diff/4002/src/pkg/runtime/mgc0.c File src/pkg/runtime/mgc0.c (right): https://codereview.appspot.com/6846044/diff/4002/src/pkg/runtime/mgc0.c#newco... src/pkg/runtime/mgc0.c:1097: // a problem in the past. On 2012/11/13 14:28:40, iant wrote: > This test should stay where it was, since that is where it matters. Done.
Sign in to reply to this message.
On 2012/11/13 14:28:40, iant wrote: > I can't help but feel that this would be better done via some sort of compiler > pragma. You are introducing runtime overhead to every call to the garbage > collector. But the compiler could easily say at the start of the function "give > me at least this much space on the stack" with much less overhead. In my opinion it wouldn't be much faster than reflect·call(). > Much better than that would be some sort of runtime heuristic: if we had to > split the stack several times beneath some function, then the next time we enter > the function ask for a bigger stack. I have no concrete suggestions for how to > implement that, though. A problem is that any goroutine can call gc(). The stack of a random goroutine cannot be enlarged if there are many goroutines because over time it may consume too much memory. Though it would be possible when there are few goroutines.
Sign in to reply to this message.
On Tue, Nov 13, 2012 at 7:18 AM, <0xE2.0x9A.0x9B@gmail.com> wrote: > On 2012/11/13 14:28:40, iant wrote: >> >> I can't help but feel that this would be better done via some sort of > > compiler >> >> pragma. You are introducing runtime overhead to every call to the > > garbage >> >> collector. But the compiler could easily say at the start of the > > function "give >> >> me at least this much space on the stack" with much less overhead. > > > In my opinion it wouldn't be much faster than reflect·call(). Perhaps you are right. Hard to know for sure. >> Much better than that would be some sort of runtime heuristic: if we > > had to >> >> split the stack several times beneath some function, then the next > > time we enter >> >> the function ask for a bigger stack. I have no concrete suggestions > > for how to >> >> implement that, though. > > > A problem is that any goroutine can call gc(). The stack of a random > goroutine cannot be enlarged if there are many goroutines because over > time it may consume too much memory. Though it would be possible when > there are few goroutines. If it worked well, it would reduce the number of stack splits per GC to 1. More importantly, it would help in other cases as well. Ian
Sign in to reply to this message.
Hello rsc@golang.org, remyoudompheng@gmail.com, dsymonds@golang.org, minux.ma@gmail.com, iant@golang.org, iant@google.com (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
I would like to obtain information about the progress of merging this CL.
Sign in to reply to this message.
On Mon, Nov 26, 2012 at 4:31 AM, <0xE2.0x9A.0x9B@gmail.com> wrote: > I would like to obtain information about the progress of merging this > CL. I have 83 gmail threads in my rsc@golang.org todo label, dating back to Nov 3. This is one of them. I hope to review many of them this week.
Sign in to reply to this message.
This seems fine for now. I share Ian's reservations, but this is easy to change if we come up with a better idea. I expect that the cost of the reflect.call is tiny compared to a typical gc. https://codereview.appspot.com/6846044/diff/5/src/pkg/runtime/mgc0.c File src/pkg/runtime/mgc0.c (right): https://codereview.appspot.com/6846044/diff/5/src/pkg/runtime/mgc0.c#newcode968 src/pkg/runtime/mgc0.c:968: static void Can you please do static void gc(struct gc_args *args); void runtime.gc(int force) { ... } static void gc(struct gc_args *args) { ... } so that the code reads top to bottom and also to minimize the diff? Thanks. https://codereview.appspot.com/6846044/diff/5/src/pkg/runtime/mgc0.c#newcode977 src/pkg/runtime/mgc0.c:977: // The atomic operations are not atomic if the uint64s This can move into runtime.gc. https://codereview.appspot.com/6846044/diff/5/src/pkg/runtime/mgc0.c#newcode1092 src/pkg/runtime/mgc0.c:1092: runtime·gc(1); Maybe this should be done in runtime.gc instead? I'm a little weirded out by runtime.gc calling gc on a new stack calling runtime.gc calling gc on a new stack.
Sign in to reply to this message.
https://codereview.appspot.com/6846044/diff/5/src/pkg/runtime/mgc0.c File src/pkg/runtime/mgc0.c (right): https://codereview.appspot.com/6846044/diff/5/src/pkg/runtime/mgc0.c#newcode968 src/pkg/runtime/mgc0.c:968: static void On 2012/11/26 20:40:19, rsc wrote: > Can you please do > > static void gc(struct gc_args *args); > > void > runtime.gc(int force) > { > ... > } > > static void > gc(struct gc_args *args) > { > ... > } > > so that the code reads top to bottom and also to minimize the diff? > > Thanks. Done. https://codereview.appspot.com/6846044/diff/5/src/pkg/runtime/mgc0.c#newcode977 src/pkg/runtime/mgc0.c:977: // The atomic operations are not atomic if the uint64s On 2012/11/26 20:40:19, rsc wrote: > This can move into runtime.gc. Done. https://codereview.appspot.com/6846044/diff/5/src/pkg/runtime/mgc0.c#newcode1092 src/pkg/runtime/mgc0.c:1092: runtime·gc(1); On 2012/11/26 20:40:19, rsc wrote: > Maybe this should be done in runtime.gc instead? > I'm a little weirded out by runtime.gc calling gc on a new stack calling > runtime.gc calling gc on a new stack. Done.
Sign in to reply to this message.
Hello rsc@golang.org, remyoudompheng@gmail.com, dsymonds@golang.org, minux.ma@gmail.com, iant@golang.org, iant@google.com (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
LGTM
Sign in to reply to this message.
I tried to patch this in to my tree to submit it, but when I run all.bash the first Go binary (the bootstrap go command) crashes with runtime: memory allocated by OS (0x2210517000) not in usable range [0x2103a6000,0x2903a6000) runtime: out of memory: cannot allocate 585205940224-byte block (1048576 in use) throw: out of memory goroutine 1 [running]: ----- stack segment boundary ----- regexp/syntax.(*parser).factor(0x210438000, 0x210439080, 0xa, 0x10, 0x0, ...) /Users/rsc/g/go/src/pkg/regexp/syntax/parse.go:417 +0x1b74 regexp/syntax.(*parser).collapse(0x210438000, 0x210439010, 0xa, 0xe, 0x210415113, ...) /Users/rsc/g/go/src/pkg/regexp/syntax/parse.go:347 +0x25d regexp/syntax.(*parser).alternate(0x210438000, 0x2103a6701, 0x0, 0x0) /Users/rsc/g/go/src/pkg/regexp/syntax/parse.go:302 +0x1e9 regexp/syntax.(*parser).parseRightParen(0x210438000, 0x6f, 0x6f, 0x290dbe) /Users/rsc/g/go/src/pkg/regexp/syntax/parse.go:1166 +0x89 regexp/syntax.Parse(0x290d80, 0xbb, 0xd4, 0xffffffffffffffff, 0x3721a0, ...) /Users/rsc/g/go/src/pkg/regexp/syntax/parse.go:705 +0x5ca regexp.compile(0x290d80, 0xbb, 0xd4, 0x4, 0xffffffffffffffff, ...) /Users/rsc/g/go/src/pkg/regexp/regexp.go:134 +0x40 regexp.Compile(0x290d80, 0xbb, 0x53d5c, 0x174948, 0x0, ...) /Users/rsc/g/go/src/pkg/regexp/regexp.go:107 +0x3b regexp.MustCompile(0x290d80, 0xbb, 0x2000, 0x210427180, 0xf20bd, ...) /Users/rsc/g/go/src/pkg/regexp/regexp.go:195 +0x38 go/doc.init(0x0, 0x0, 0x0, 0x0, 0x0, ...) /Users/rsc/g/go/src/pkg/go/doc/synopsis.go:-1878 +0x8b main.init() /Users/rsc/g/go/src/cmd/go/vet.go:37 +0x54 goroutine 2 [runnable]: created by runtime.main /Users/rsc/g/go/src/pkg/runtime/proc.c:225 Any ideas? Russ
Sign in to reply to this message.
Hello rsc@golang.org, remyoudompheng@gmail.com, dsymonds@golang.org, minux.ma@gmail.com, iant@golang.org, iant@google.com (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
It appears to be caused by the position of the new field 'moreframesize_minalloc' in struct M. This affects amd64 only. If I move the field lower (below 'racepc') it works fine on amd64. I wasn't able to determine the actual cause of the problem. On 2012/11/27 17:20:56, rsc wrote: > I tried to patch this in to my tree to submit it, but when I run > all.bash the first Go binary (the bootstrap go command) crashes with > > runtime: memory allocated by OS (0x2210517000) not in usable range > [0x2103a6000,0x2903a6000) > runtime: out of memory: cannot allocate 585205940224-byte block (1048576 in > use) > throw: out of memory > > goroutine 1 [running]: > ----- stack segment boundary ----- > regexp/syntax.(*parser).factor(0x210438000, 0x210439080, 0xa, 0x10, 0x0, ...) > /Users/rsc/g/go/src/pkg/regexp/syntax/parse.go:417 +0x1b74 > regexp/syntax.(*parser).collapse(0x210438000, 0x210439010, 0xa, 0xe, > 0x210415113, ...) > /Users/rsc/g/go/src/pkg/regexp/syntax/parse.go:347 +0x25d > regexp/syntax.(*parser).alternate(0x210438000, 0x2103a6701, 0x0, 0x0) > /Users/rsc/g/go/src/pkg/regexp/syntax/parse.go:302 +0x1e9 > regexp/syntax.(*parser).parseRightParen(0x210438000, 0x6f, 0x6f, 0x290dbe) > /Users/rsc/g/go/src/pkg/regexp/syntax/parse.go:1166 +0x89 > regexp/syntax.Parse(0x290d80, 0xbb, 0xd4, 0xffffffffffffffff, 0x3721a0, ...) > /Users/rsc/g/go/src/pkg/regexp/syntax/parse.go:705 +0x5ca > regexp.compile(0x290d80, 0xbb, 0xd4, 0x4, 0xffffffffffffffff, ...) > /Users/rsc/g/go/src/pkg/regexp/regexp.go:134 +0x40 > regexp.Compile(0x290d80, 0xbb, 0x53d5c, 0x174948, 0x0, ...) > /Users/rsc/g/go/src/pkg/regexp/regexp.go:107 +0x3b > regexp.MustCompile(0x290d80, 0xbb, 0x2000, 0x210427180, 0xf20bd, ...) > /Users/rsc/g/go/src/pkg/regexp/regexp.go:195 +0x38 > go/doc.init(0x0, 0x0, 0x0, 0x0, 0x0, ...) > /Users/rsc/g/go/src/pkg/go/doc/synopsis.go:-1878 +0x8b > main.init() > /Users/rsc/g/go/src/cmd/go/vet.go:37 +0x54 > > goroutine 2 [runnable]: > created by runtime.main > /Users/rsc/g/go/src/pkg/runtime/proc.c:225 > > Any ideas? > > Russ
Sign in to reply to this message.
Worked. It's possible that the offset of the tls field is known somewhere deep. I don't mind just moving the field and leaving that mystery for another day. Thanks for the quick fix. Russ
Sign in to reply to this message.
*** Submitted as http://code.google.com/p/go/source/detail?r=7646c94159a1 *** runtime: use reflect·call() to enter the function gc() Garbage collection code (to be merged later) is calling functions which have many local variables. This increases the probability that the stack capacity won't be big enough to hold the local variables. So, start gc() on a bigger stack to eliminate a potentially large number of calls to runtime·morestack(). R=rsc, remyoudompheng, dsymonds, minux.ma, iant, iant CC=golang-dev http://codereview.appspot.com/6846044 Committer: Russ Cox <rsc@golang.org>
Sign in to reply to this message.
|