|
|
Created:
12 years ago by atom Modified:
11 years, 11 months ago Reviewers:
CC:
dvyukov, rsc, bradfitz, dave_cheney.net, golang-dev, minux1, remyoudompheng Visibility:
Public. |
Descriptionruntime: precise garbage collection of channels
This changeset adds a mostly-precise garbage collection of channels.
The garbage collection support code in the linker isn't recognizing
channel types yet.
Fixes issue http://stackoverflow.com/questions/14712586/memory-consumption-skyrocket
Patch Set 1 #
Total comments: 8
Patch Set 2 : diff -r bd83fa6162f2 https://go.googlecode.com/hg/ #
Total comments: 2
Patch Set 3 : diff -r bd83fa6162f2 https://go.googlecode.com/hg/ #
MessagesTotal messages: 38
Hello golang-dev@googlegroups.com (cc: rsc, dfc, dvyukov, minux, remyoudompheng), I'd like you to review this change to https://go.googlecode.com/hg/
Sign in to reply to this message.
https://codereview.appspot.com/7307086/diff/1/src/pkg/runtime/chan.c File src/pkg/runtime/chan.c (right): https://codereview.appspot.com/7307086/diff/1/src/pkg/runtime/chan.c#newcode117 src/pkg/runtime/chan.c:117: runtime·settype(c, (uintptr)t | TypeInfo_Chan); the Hchan itself does not contain interesting pointers, so if element type does not contain pointers, you can mark the whole block as no-pointers. And add a comment before Hchan. see: https://codereview.appspot.com/5250069/diff/4001/src/pkg/runtime/chan.c https://codereview.appspot.com/7307086/diff/1/src/pkg/runtime/mgc0.c File src/pkg/runtime/mgc0.c (right): https://codereview.appspot.com/7307086/diff/1/src/pkg/runtime/mgc0.c#newcode914 src/pkg/runtime/mgc0.c:914: *objbufpos++ = (Obj){(byte*)chan, Hchansize, (uintptr)defaultProg}; there are no interesting pointers
Sign in to reply to this message.
https://codereview.appspot.com/7307086/diff/1/src/pkg/runtime/chan.c File src/pkg/runtime/chan.c (right): https://codereview.appspot.com/7307086/diff/1/src/pkg/runtime/chan.c#newcode117 src/pkg/runtime/chan.c:117: runtime·settype(c, (uintptr)t | TypeInfo_Chan); On 2013/02/10 11:14:58, dvyukov wrote: > the Hchan itself does not contain interesting pointers, so if element type does > not contain pointers, you can mark the whole block as no-pointers. > And add a comment before Hchan. > see: > https://codereview.appspot.com/5250069/diff/4001/src/pkg/runtime/chan.c In my opinion it would be more transparent from code maintenance viewpoint for mgc0.c to use type hchan (defined in zruntime_defs_linux_386.go on my machine). This would make GC slower compared to marking the whole block as no-pointers, but it is faster and more precise than defaultProg. I suggest for the usage of type hchan to be a separate CL. https://codereview.appspot.com/7307086/diff/1/src/pkg/runtime/mgc0.c File src/pkg/runtime/mgc0.c (right): https://codereview.appspot.com/7307086/diff/1/src/pkg/runtime/mgc0.c#newcode914 src/pkg/runtime/mgc0.c:914: *objbufpos++ = (Obj){(byte*)chan, Hchansize, (uintptr)defaultProg}; On 2013/02/10 11:14:58, dvyukov wrote: > there are no interesting pointers Maybe a separate CL: utilize type hchan instead of defaultProg.
Sign in to reply to this message.
https://codereview.appspot.com/7307086/diff/1/src/pkg/runtime/mgc0.c File src/pkg/runtime/mgc0.c (right): https://codereview.appspot.com/7307086/diff/1/src/pkg/runtime/mgc0.c#newcode914 src/pkg/runtime/mgc0.c:914: *objbufpos++ = (Obj){(byte*)chan, Hchansize, (uintptr)defaultProg}; Is there a noPointersProg that we could use here instead? Using type hchan will still introduce more pointers to be scanned unnecessarily. https://codereview.appspot.com/7307086/diff/1/src/pkg/runtime/mgc0.c#newcode920 src/pkg/runtime/mgc0.c:920: // TODO(atom): replace n-Hchansize by the actual number of elements in the buffer. For what it's worth, since it is a circular buffer, that will in general require queueing two different entries.
Sign in to reply to this message.
https://codereview.appspot.com/7307086/diff/1/src/pkg/runtime/mgc0.c File src/pkg/runtime/mgc0.c (right): https://codereview.appspot.com/7307086/diff/1/src/pkg/runtime/mgc0.c#newcode914 src/pkg/runtime/mgc0.c:914: *objbufpos++ = (Obj){(byte*)chan, Hchansize, (uintptr)defaultProg}; On 2013/02/14 21:46:49, rsc wrote: > Is there a noPointersProg that we could use here instead? > Using type hchan will still introduce more pointers to be scanned unnecessarily. If it is acceptable to make the same assumptions as in Dmitry's CL (https://codereview.appspot.com/5250069) then this line can be simply deleted. https://codereview.appspot.com/7307086/diff/1/src/pkg/runtime/mgc0.c#newcode920 src/pkg/runtime/mgc0.c:920: // TODO(atom): replace n-Hchansize by the actual number of elements in the buffer. On 2013/02/14 21:46:49, rsc wrote: > For what it's worth, since it is a circular buffer, that will in general require > queueing two different entries. True.
Sign in to reply to this message.
On Fri, Feb 15, 2013 at 4:40 AM, <0xE2.0x9A.0x9B@gmail.com> wrote: > > https://codereview.appspot.**com/7307086/diff/1/src/pkg/**runtime/mgc0.c<http... > File src/pkg/runtime/mgc0.c (right): > > https://codereview.appspot.**com/7307086/diff/1/src/pkg/** > runtime/mgc0.c#newcode914<https://codereview.appspot.com/7307086/diff/1/src/pkg/runtime/mgc0.c#newcode914> > src/pkg/runtime/mgc0.c:914: *objbufpos++ = (Obj){(byte*)chan, Hchansize, > (uintptr)defaultProg}; > On 2013/02/14 21:46:49, rsc wrote: > >> Is there a noPointersProg that we could use here instead? >> Using type hchan will still introduce more pointers to be scanned >> > unnecessarily. > > If it is acceptable to make the same assumptions as in Dmitry's CL > (https://codereview.appspot.**com/5250069<https://codereview.appspot.com/5250069>) > then this line can be simply > deleted. That sounds fine to me. Dmitriy never mailed that CL so I've never seen it before. What is left for the precise garbage collection once channels are in? Is it complete at that point? I understand there might be performance optimizations left, but as far as correctness? Thanks. Russ
Sign in to reply to this message.
Hello dvyukov@google.com, rsc@golang.org (cc: dave@cheney.net, golang-dev@googlegroups.com, minux.ma@gmail.com, remyoudompheng@gmail.com), Please take another look.
Sign in to reply to this message.
On 2013/02/19 18:29:13, rsc wrote: > On Fri, Feb 15, 2013 at 4:40 AM, <mailto:0xE2.0x9A.0x9B@gmail.com> wrote: > > > > > https://codereview.appspot.**com/7307086/diff/1/src/pkg/**runtime/mgc0.c%3Cht...> > > File src/pkg/runtime/mgc0.c (right): > > > > https://codereview.appspot.**com/7307086/diff/1/src/pkg/** > > > runtime/mgc0.c#newcode914<https://codereview.appspot.com/7307086/diff/1/src/pkg/runtime/mgc0.c#newcode914> > > src/pkg/runtime/mgc0.c:914: *objbufpos++ = (Obj){(byte*)chan, Hchansize, > > (uintptr)defaultProg}; > > On 2013/02/14 21:46:49, rsc wrote: > > > >> Is there a noPointersProg that we could use here instead? > >> Using type hchan will still introduce more pointers to be scanned > >> > > unnecessarily. > > > > If it is acceptable to make the same assumptions as in Dmitry's CL > > > (https://codereview.appspot.**com/5250069%3Chttps://codereview.appspot.com/525...>) > > then this line can be simply > > deleted. > > > That sounds fine to me. Dmitriy never mailed that CL so I've never seen it > before. > > What is left for the precise garbage collection once channels are in? Is it > complete at that point? I understand there might be performance > optimizations left, but as far as correctness? > > Thanks. > Russ Handling of channels by the garbage collector completes the modifications I planned to contribute as far as precision is concerned. The next CL will fill in the GC_REGION case in mgc0.c, which is just a couple of lines of code. There is unsubmitted code left in CL 6114046, but it isn't related to precision.
Sign in to reply to this message.
LGTM https://codereview.appspot.com/7307086/diff/10001/src/pkg/runtime/runtime.h File src/pkg/runtime/runtime.h (right): https://codereview.appspot.com/7307086/diff/10001/src/pkg/runtime/runtime.h#n... src/pkg/runtime/runtime.h:593: extern uint32 Hchansize; This needs to be runtime.Hchansize, to avoid conflict with a C symbol named Hchansize when using cgo.
Sign in to reply to this message.
Hello dvyukov@google.com, rsc@golang.org (cc: dave@cheney.net, golang-dev@googlegroups.com, minux.ma@gmail.com, remyoudompheng@gmail.com), Please take another look.
Sign in to reply to this message.
https://codereview.appspot.com/7307086/diff/10001/src/pkg/runtime/runtime.h File src/pkg/runtime/runtime.h (right): https://codereview.appspot.com/7307086/diff/10001/src/pkg/runtime/runtime.h#n... src/pkg/runtime/runtime.h:593: extern uint32 Hchansize; On 2013/02/19 19:29:03, rsc wrote: > This needs to be runtime.Hchansize, to avoid conflict with a C symbol named > Hchansize when using cgo. Done.
Sign in to reply to this message.
I patched this into my client in preparation for submitting it, but all.bash failed. On further investigation it seems like running 'go test -short net' fails about 20% of the time with seg faults. Russ
Sign in to reply to this message.
Forgot to say I am using darwin/amd64 on OS X 10.8.2. Russ
Sign in to reply to this message.
On 2013/02/19 20:29:49, rsc wrote: > I patched this into my client in preparation for submitting it, but > all.bash failed. On further investigation it seems like running 'go test > -short net' fails about 20% of the time with seg faults. > > Russ I am unable to reproduce the failure on linux/386. Cannot test package "net" on linux/amd64 due to limited permissions. Tests that aren't limited by permissions complete ok.
Sign in to reply to this message.
On Tue, Feb 19, 2013 at 1:05 PM, <0xE2.0x9A.0x9B@gmail.com> wrote: > Cannot test package "net" on linux/amd64 due to limited permissions. > Tests that aren't limited by permissions complete ok. > I don't understand. Which permissions? I don't run as root.
Sign in to reply to this message.
On 2013/02/19 21:07:54, bradfitz wrote: > On Tue, Feb 19, 2013 at 1:05 PM, <mailto:0xE2.0x9A.0x9B@gmail.com> wrote: > > > Cannot test package "net" on linux/amd64 due to limited permissions. > > Tests that aren't limited by permissions complete ok. > > > > I don't understand. Which permissions? I don't run as root. The linux/amd64 machine is a virtualized host (AWS).
Sign in to reply to this message.
On Tue, Feb 19, 2013 at 1:15 PM, <0xE2.0x9A.0x9B@gmail.com> wrote: > On 2013/02/19 21:07:54, bradfitz wrote: > >> On Tue, Feb 19, 2013 at 1:05 PM, <mailto:0xE2.0x9A.0x9B@gmail.**com<0xE2.0x9A.0x9B@gmail.com> >> > >> > wrote: > > > Cannot test package "net" on linux/amd64 due to limited permissions. >> > Tests that aren't limited by permissions complete ok. >> > >> > > I don't understand. Which permissions? I don't run as root. >> > > The linux/amd64 machine is a virtualized host (AWS). > So are most of the builders at http://build.golang.org/ What fails for you on AWS that doesn't fail for us?
Sign in to reply to this message.
On 2013/02/19 21:26:15, bradfitz wrote: > On Tue, Feb 19, 2013 at 1:15 PM, <mailto:0xE2.0x9A.0x9B@gmail.com> wrote: > > > On 2013/02/19 21:07:54, bradfitz wrote: > > > >> On Tue, Feb 19, 2013 at 1:05 PM, > <mailto:0xE2.0x9A.0x9B@gmail.**com<0xE2.0x9A.0x9B@gmail.com> > >> > > >> > > wrote: > > > > > Cannot test package "net" on linux/amd64 due to limited permissions. > >> > Tests that aren't limited by permissions complete ok. > >> > > >> > > > > I don't understand. Which permissions? I don't run as root. > >> > > > > The linux/amd64 machine is a virtualized host (AWS). > > > > So are most of the builders at http://build.golang.org/ > > What fails for you on AWS that doesn't fail for us? [golang-symbol.rhcloud.com net]\> ./net.test -test.short --- FAIL: TestConnAndListener-2 (0.00 seconds) conn_test.go:42: net.Listen failed: listen tcp 127.0.0.1:0: permission denied
Sign in to reply to this message.
On Tue, Feb 19, 2013 at 1:33 PM, <0xE2.0x9A.0x9B@gmail.com> wrote: > > > Cannot test package "net" on linux/amd64 due to limited >> > permissions. > >> >> > Tests that aren't limited by permissions complete ok. >> >> > >> >> >> > >> > I don't understand. Which permissions? I don't run as root. >> >> >> > >> > The linux/amd64 machine is a virtualized host (AWS). >> > >> > > So are most of the builders at http://build.golang.org/ >> > > What fails for you on AWS that doesn't fail for us? >> > > [golang-symbol.rhcloud.com net]\> ./net.test -test.short > --- FAIL: TestConnAndListener-2 (0.00 seconds) > conn_test.go:42: net.Listen failed: listen tcp 127.0.0.1:0: > permission denied > I think that's your distro doing that, not AWS. I just built tip & ran the net tests on AWS on a linux-amd64 Ubuntu Precise instance. It passes. Are you running some "security-hardened" distro+kernel which is paranoid about things like grabbing free ports for listening without saying 'pretty please' first?
Sign in to reply to this message.
On 2013/02/19 22:10:47, bradfitz wrote: > On Tue, Feb 19, 2013 at 1:33 PM, <mailto:0xE2.0x9A.0x9B@gmail.com> wrote: > > > > > Cannot test package "net" on linux/amd64 due to limited > >> > > permissions. > > > >> >> > Tests that aren't limited by permissions complete ok. > >> >> > > >> >> > >> > > >> > I don't understand. Which permissions? I don't run as root. > >> >> > >> > > >> > The linux/amd64 machine is a virtualized host (AWS). > >> > > >> > > > > So are most of the builders at http://build.golang.org/ > >> > > > > What fails for you on AWS that doesn't fail for us? > >> > > > > [golang-symbol.rhcloud.com net]\> ./net.test -test.short > > --- FAIL: TestConnAndListener-2 (0.00 seconds) > > conn_test.go:42: net.Listen failed: listen tcp 127.0.0.1:0: > > permission denied > > > > I think that's your distro doing that, not AWS. > > I just built tip & ran the net tests on AWS on a linux-amd64 Ubuntu Precise > instance. It passes. > > Are you running some "security-hardened" distro+kernel which is paranoid > about things like grabbing free ports for listening without saying 'pretty > please' first? OpenShift appears to be based on selinux.
Sign in to reply to this message.
On 2013/02/19 21:33:48, atom wrote: > On 2013/02/19 21:26:15, bradfitz wrote: > > On Tue, Feb 19, 2013 at 1:15 PM, <mailto:0xE2.0x9A.0x9B@gmail.com> wrote: > > > > > On 2013/02/19 21:07:54, bradfitz wrote: > > > > > >> On Tue, Feb 19, 2013 at 1:05 PM, > > <mailto:0xE2.0x9A.0x9B@gmail.**com<0xE2.0x9A.0x9B@gmail.com> > > >> > > > >> > > > wrote: > > > > > > > Cannot test package "net" on linux/amd64 due to limited permissions. > > >> > Tests that aren't limited by permissions complete ok. > > >> > > > >> > > > > > > I don't understand. Which permissions? I don't run as root. > > >> > > > > > > The linux/amd64 machine is a virtualized host (AWS). > > > > > > > So are most of the builders at http://build.golang.org/ > > > > What fails for you on AWS that doesn't fail for us? > > [golang-symbol.rhcloud.com net]\> ./net.test -test.short > --- FAIL: TestConnAndListener-2 (0.00 seconds) > conn_test.go:42: net.Listen failed: listen tcp 127.0.0.1:0: permission > denied The solution is to replace 127.0.0.1 with the value of OPENSHIFT_INTERNAL_IP.
Sign in to reply to this message.
On 2013/02/19 20:29:49, rsc wrote: > I patched this into my client in preparation for submitting it, but > all.bash failed. On further investigation it seems like running 'go test > -short net' fails about 20% of the time with seg faults. > > Russ I am unable to reproduce the failure (go test -short net) on linux/amd64.
Sign in to reply to this message.
On Wed, Feb 20, 2013 at 2:46 AM, <0xE2.0x9A.0x9B@gmail.com> wrote: > On 2013/02/19 21:33:48, atom wrote: > >> On 2013/02/19 21:26:15, bradfitz wrote: >> > On Tue, Feb 19, 2013 at 1:15 PM, <mailto:0xE2.0x9A.0x9B@gmail.**com<0xE2.0x9A.0x9B@gmail.com> >> > >> > wrote: > >> > >> > > On 2013/02/19 21:07:54, bradfitz wrote: >> > > >> > >> On Tue, Feb 19, 2013 at 1:05 PM, >> > <mailto:0xE2.0x9A.0x9B@gmail.****com<0xE2.0x9A.0x9B@gmail.com> >> >> > >> > >> > >> >> > > wrote: >> > > >> > > > Cannot test package "net" on linux/amd64 due to limited >> > permissions. > >> > >> > Tests that aren't limited by permissions complete ok. >> > >> > >> > >> >> > > >> > > I don't understand. Which permissions? I don't run as root. >> > >> >> > > >> > > The linux/amd64 machine is a virtualized host (AWS). >> > > >> > >> > So are most of the builders at http://build.golang.org/ >> > >> > What fails for you on AWS that doesn't fail for us? >> > > [golang-symbol.rhcloud.com net]\> ./net.test -test.short >> --- FAIL: TestConnAndListener-2 (0.00 seconds) >> conn_test.go:42: net.Listen failed: listen tcp 127.0.0.1:0: >> > permission > >> denied >> > > The solution is to replace 127.0.0.1 with the value of > OPENSHIFT_INTERNAL_IP. > Wow: https://openshift.redhat.com/community/page/openshift-environment-variables You should probably develop Go on a general-purpose Linux distro (there are a few), not one geared towards running "apps".
Sign in to reply to this message.
On 2013/02/19 20:29:49, rsc wrote: > I patched this into my client in preparation for submitting it, but > all.bash failed. On further investigation it seems like running 'go test > -short net' fails about 20% of the time with seg faults. > > Russ Does the failure occur if GOMAXPROCS=1?
Sign in to reply to this message.
Yes. I always run with GOMAXPROCS unset, so it should be 1 already. Sorry I haven't had a chance to debug this yet. Trying to get the func stuff moved along and then I will. Russ
Sign in to reply to this message.
On 2013/02/22 16:51:56, rsc wrote: > Yes. I always run with GOMAXPROCS unset, so it should be 1 already. Sorry I > haven't had a chance to debug this yet. Trying to get the func stuff moved > along and then I will. > > Russ The net package is calling function runtime.GOMAXPROCS(). Therefore the garbage collector sometimes runs with multiple threads while testing the net package. It is necessary to explicitly set GOMAXPROCS to 1. I came across a test case indicating that it may be possible for sweepspan() to start running in some OS thread before the last scanblock() has terminated.
Sign in to reply to this message.
On Fri, Feb 22, 2013 at 9:13 PM, <0xE2.0x9A.0x9B@gmail.com> wrote: > On 2013/02/22 16:51:56, rsc wrote: > >> Yes. I always run with GOMAXPROCS unset, so it should be 1 already. >> > Sorry I > >> haven't had a chance to debug this yet. Trying to get the func stuff >> > moved > >> along and then I will. >> > > Russ >> > > The net package is calling function runtime.GOMAXPROCS(). Therefore the > garbage collector sometimes runs with multiple threads while testing the > net package. It is necessary to explicitly set GOMAXPROCS to 1. > > I came across a test case indicating that it may be possible for > sweepspan() to start running in some OS thread before the last > scanblock() has terminated. > What do you mean? scnablock() returns only when all blocks are completely scanned.
Sign in to reply to this message.
On 2013/02/22 17:17:31, dvyukov wrote: > On Fri, Feb 22, 2013 at 9:13 PM, <mailto:0xE2.0x9A.0x9B@gmail.com> wrote: > > > On 2013/02/22 16:51:56, rsc wrote: > > > >> Yes. I always run with GOMAXPROCS unset, so it should be 1 already. > >> > > Sorry I > > > >> haven't had a chance to debug this yet. Trying to get the func stuff > >> > > moved > > > >> along and then I will. > >> > > > > Russ > >> > > > > The net package is calling function runtime.GOMAXPROCS(). Therefore the > > garbage collector sometimes runs with multiple threads while testing the > > net package. It is necessary to explicitly set GOMAXPROCS to 1. > > > > I came across a test case indicating that it may be possible for > > sweepspan() to start running in some OS thread before the last > > scanblock() has terminated. > > > > > > What do you mean? > scnablock() returns only when all blocks are completely scanned. I am not sure. Maybe I made an error while figuring out how to interpret a certain bug in my code (the code isn't related to this CL).
Sign in to reply to this message.
On 2013/02/22 17:30:33, atom wrote: > On 2013/02/22 17:17:31, dvyukov wrote: > > On Fri, Feb 22, 2013 at 9:13 PM, <mailto:0xE2.0x9A.0x9B@gmail.com> wrote: > > > > > On 2013/02/22 16:51:56, rsc wrote: > > > > > >> Yes. I always run with GOMAXPROCS unset, so it should be 1 already. > > >> > > > Sorry I > > > > > >> haven't had a chance to debug this yet. Trying to get the func stuff > > >> > > > moved > > > > > >> along and then I will. > > >> > > > > > > Russ > > >> > > > > > > The net package is calling function runtime.GOMAXPROCS(). Therefore the > > > garbage collector sometimes runs with multiple threads while testing the > > > net package. It is necessary to explicitly set GOMAXPROCS to 1. > > > > > > I came across a test case indicating that it may be possible for > > > sweepspan() to start running in some OS thread before the last > > > scanblock() has terminated. > > > > > > > > > > > What do you mean? > > scnablock() returns only when all blocks are completely scanned. > > I am not sure. Maybe I made an error while figuring out how to interpret a > certain bug in my code (the code isn't related to this CL). Indeed. I made a mistake while decoding the meaning of my code. Thanks.
Sign in to reply to this message.
On Fri, Feb 22, 2013 at 12:13 PM, <0xE2.0x9A.0x9B@gmail.com> wrote: > The net package is calling function runtime.GOMAXPROCS(). Therefore the > garbage collector sometimes runs with multiple threads while testing the > net package. It is necessary to explicitly set GOMAXPROCS to 1. > Do you mean by editing the net test? I don't see how the environment variable GOMAXPROCS=1 is different from unset. Russ
Sign in to reply to this message.
On Fri, Feb 22, 2013 at 7:23 PM, Russ Cox <rsc@golang.org> wrote: > On Fri, Feb 22, 2013 at 12:13 PM, <0xE2.0x9A.0x9B@gmail.com> wrote: > >> The net package is calling function runtime.GOMAXPROCS(). Therefore the >> garbage collector sometimes runs with multiple threads while testing the >> net package. It is necessary to explicitly set GOMAXPROCS to 1. >> > > Do you mean by editing the net test? I don't see how the environment > variable GOMAXPROCS=1 is different from unset. > You are right, there is no difference. Garbage collector errors have been very subtle so far. I wonder what kind of error it is this time.
Sign in to reply to this message.
Update: I fixed the bug we hit a few weeks ago (issue 4907) but that did not fix the channel crash on my Mac. That's up next.
Sign in to reply to this message.
The tree with this CL patched in did not crash reliably for me on Linux, only on OS X. However, if you also apply https://codereview.appspot.com/7364048 and then do cd $GOROOT/src/pkg/net export GOGC=1 go test -short go test -short go test -short then it crashes for me more than half the time. Russ
Sign in to reply to this message.
With and also without CL 7364048 I am getting SIGSEGV in gdb on linux/386. $ hg identify 408b088723fd+ tip $ gdb ./net.test (gdb) set args -test.short (gdb) run ... Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xb5effb40 (LWP 29586)] 0x08053d08 in scanblock (wbuf=void, wp=void, nobj=void, keepworking=void) at src/pkg/runtime/mgc0.c:755 755 t = iface->tab->type; (gdb) bt #0 0x08053d08 in scanblock (wbuf=void, wp=void, nobj=void, keepworking=void) at src/pkg/runtime/mgc0.c:755 #1 0x08054591 in markroot (i=void) at src/pkg/runtime/mgc0.c:1122 #2 0x08058ca8 in runtime.parfordo (desc=void) at src/pkg/runtime/parfor.c:116 #3 0x08055278 in runtime.gchelper () at src/pkg/runtime/mgc0.c:1588 #4 0x0805a82f in nextgandunlock () at src/pkg/runtime/proc.c:663 #5 0x0805b235 in schedule (gp=void) at src/pkg/runtime/proc.c:1181 #6 0x08064e15 in runtime.mcall (fn=void) at src/pkg/runtime/asm_386.s:195 #7 0x18399000 in ?? () On 2013/02/25 18:36:46, rsc wrote: > The tree with this CL patched in did not crash reliably for me on Linux, > only on OS X. > > However, if you also apply https://codereview.appspot.com/7364048 and then > do > > cd $GOROOT/src/pkg/net > export GOGC=1 > go test -short > go test -short > go test -short > > then it crashes for me more than half the time. > > Russ
Sign in to reply to this message.
Found it. The block size known to the garbage collector may be larger than the block size requested by the allocation of the channel. For an unbuffered channel on amd64, it turns out to be 8 bytes larger. The buffer is assumed to be the rest of the block, but the element size may not neatly divide this leftover. Again on amd64, the problem happens with a chan of interface values and 8 bytes left over. The "second" word of the interface is really some other piece of memory, and if it is non-zero then the collector assumes the first word can be dereferenced using iface->tab->type. I changed the GC_CHAN case to: // There are no heap pointers in struct Hchan, // so we can ignore the leading sizeof(Hchan) bytes. if(!(chantype->elem->kind & KindNoPointers)) { // Channel's buffer follows Hchan immediately in memory. // Size of buffer (cap(c)) is second int in the chan struct. n = ((uintgo*)chan)[1]; if(n > 0) { // TODO(atom): split into two chunks so that only the // in-use part of the circular buffer is scanned. // (Channel routines zero the unused part, so the current // code does not lead to leaks, it's just a little inefficient.) *objbufpos++ = (Obj){(byte*)chan+runtime·Hchansize, n*chantype->elem->size, (uintptr)chantype->elem->gc | PRECISE | LOOP}; if(objbufpos == objbuf_end) flushobjbuf(objbuf, &objbufpos, &wp, &wbuf, &nobj); } } goto next_block; (the difference is the computation of the size to use in the new (Obj) being queued), and now the net test is passing repeatedly. Russ
Sign in to reply to this message.
all.bash is happy now. Submitting, and on to the next bug.
Sign in to reply to this message.
*** Submitted as https://code.google.com/p/go/source/detail?r=1c50db40d078 *** runtime: precise garbage collection of channels This changeset adds a mostly-precise garbage collection of channels. The garbage collection support code in the linker isn't recognizing channel types yet. Fixes issue http://stackoverflow.com/questions/14712586/memory-consumption-skyrocket R=dvyukov, rsc, bradfitz CC=dave, golang-dev, minux.ma, remyoudompheng https://codereview.appspot.com/7307086 Committer: Russ Cox <rsc@golang.org>
Sign in to reply to this message.
On 2013/02/25 20:54:32, rsc wrote: > Found it. The block size known to the garbage collector may be larger than > the block size requested by the allocation of the channel. That is so obvious after giving a thought. Sorry for the inconvenience. > For an > unbuffered channel on amd64, it turns out to be 8 bytes larger. The buffer > is assumed to be the rest of the block, but the element size may not neatly > divide this leftover. Again on amd64, the problem happens with a chan of > interface values and 8 bytes left over. The "second" word of the interface > is really some other piece of memory, and if it is non-zero then the > collector assumes the first word can be dereferenced using iface->tab->type. > > I changed the GC_CHAN case to: > > // There are no heap pointers in struct Hchan, > // so we can ignore the leading sizeof(Hchan) bytes. > if(!(chantype->elem->kind & KindNoPointers)) { > // Channel's buffer follows Hchan immediately in memory. > // Size of buffer (cap(c)) is second int in the chan struct. > n = ((uintgo*)chan)[1]; > if(n > 0) { > // TODO(atom): split into two chunks so that only the > // in-use part of the circular buffer is scanned. > // (Channel routines zero the unused part, so the current > // code does not lead to leaks, it's just a little inefficient.) > *objbufpos++ = (Obj){(byte*)chan+runtime·Hchansize, n*chantype->elem->size, > (uintptr)chantype->elem->gc | PRECISE | LOOP}; > if(objbufpos == objbuf_end) > flushobjbuf(objbuf, &objbufpos, &wp, &wbuf, &nobj); > } > } > goto next_block; > > (the difference is the computation of the size to use in the new (Obj) > being queued), and now the net test is passing repeatedly. > > Russ
Sign in to reply to this message.
|