|
|
Descriptionsync: fix spurious wakeup from WaitGroup.Wait
There is a race condition that causes spurious wakeup from Wait
in the following case:
G1: decrement wg.counter, observe the counter is now 0
(should unblock goroutines queued *at this moment*)
G2: increment wg.counter
G2: call Wait() to add itself to the wait queue
G1: acquire wg.m, unblock all waiting goroutines
In the last step G2 is spuriously woken up by G1.
Fixes issue 7734.
Patch Set 1 #Patch Set 2 : diff -r 519230b4d06a https://code.google.com/p/go #Patch Set 3 : diff -r 519230b4d06a https://code.google.com/p/go #Patch Set 4 : diff -r 519230b4d06a https://code.google.com/p/go #Patch Set 5 : diff -r 519230b4d06a https://code.google.com/p/go #
MessagesTotal messages: 30
Hello golang-codereviews@googlegroups.com, I'd like you to review this change to https://code.google.com/p/go
Sign in to reply to this message.
R=dvyukov It seems to me that this code replaces a spurious wakeup with a missed wakeup. G0: call Wait to add itself to the wait queue G1: decrement wg.counter, observe the counter is now 0 (should unblock goroutines queued *at this moment*) G2: increment wg.counter G2: call Wait() to add itself to the wait queue G1: acquire wg.m, unblock all waiting goroutines Before this CL the problem was that G1 woke G2. After this CL the problem is that G1 does not wake G0.
Sign in to reply to this message.
I don't think that should be considered as a missed wakeup, because the order of step 2 and step 3 are not guaranteed and not observable by user. S1. G0: call Wait to add itself to the wait queue S2. G1: decrement wg.counter, observe the counter is now 0 (should unblock goroutines queued *at this moment*) S3. G2: increment wg.counter S4. G2: call Wait() to add itself to the wait queue S5. G1: acquire wg.m, unblock all waiting goroutines If S3 occurs before S2, the counter becomes 1 -> 2 -> 1 (instead of 1 -> 0 -> 1), so no one should be awaken by that. User cannot determine whether S2 occurs before S3 or not, so even if the decrement operation does not wake up anyone, I think it should be OK.
Sign in to reply to this message.
ouch! great job! how have you found this? preliminary the fix looks correct
Sign in to reply to this message.
On 2014/04/08 20:47:13, ruiu wrote: > I don't think that should be considered as a missed wakeup, because the > order of step 2 and step 3 are not guaranteed and not observable by user. > > S1. G0: call Wait to add itself to the wait queue > S2. G1: decrement wg.counter, observe the counter is now 0 > (should unblock goroutines queued *at this moment*) > S3. G2: increment wg.counter > S4. G2: call Wait() to add itself to the wait queue > S5. G1: acquire wg.m, unblock all waiting goroutines > > If S3 occurs before S2, the counter becomes 1 -> 2 -> 1 (instead of > 1 -> 0 -> 1), so no one should be awaken by that. User cannot determine > whether S2 occurs before S3 or not, so even if the decrement operation > does not wake up anyone, I think it should be OK. I would say that this is just incorrect usage of WaitGroup. S3 is a "root" Add, Wait must not be called before "root" Add's.
Sign in to reply to this message.
please add the test to waitgroup_test.go use the program that you provided in the issue the test must run for 1ms or so test with -cpu=1,2,4,8,16,32,64 and also with/without -race
Sign in to reply to this message.
I found it by reading the code of WaitGroup. I have also read all the other code in sync with the same degree of carefulness, but spotted no bug except this one. In order to reproduce the issue fairly reliably, the test have to be run for a few minutes. It's very unlikely that running it for 1ms would reveal anything. Do you still think adding such test makes sense?
Sign in to reply to this message.
On Wed, Apr 9, 2014 at 12:03 PM, <ruiu@google.com> wrote: > I found it by reading the code of WaitGroup. I have also read all > the other code in sync with the same degree of carefulness, but > spotted no bug except this one. > > In order to reproduce the issue fairly reliably, the test have to be > run for a few minutes. It's very unlikely that running it for 1ms > would reveal anything. Do you still think adding such test makes > sense? Yes, I think it still makes sense. First, we run the tests on a variety on platforms/machines (I suspect that on some of them it can reproduce with higher probability). Second, we run it on every change. Third, we run it with several values of GOMAXPROCS.
Sign in to reply to this message.
On Wed, Apr 9, 2014 at 3:36 AM, <dvyukov@google.com> wrote: > On 2014/04/08 20:47:13, ruiu wrote: > >> I don't think that should be considered as a missed wakeup, because >> > the > >> order of step 2 and step 3 are not guaranteed and not observable by >> > user. > > S1. G0: call Wait to add itself to the wait queue >> S2. G1: decrement wg.counter, observe the counter is now 0 >> (should unblock goroutines queued *at this moment*) >> S3. G2: increment wg.counter >> S4. G2: call Wait() to add itself to the wait queue >> S5. G1: acquire wg.m, unblock all waiting goroutines >> > > If S3 occurs before S2, the counter becomes 1 -> 2 -> 1 (instead of >> 1 -> 0 -> 1), so no one should be awaken by that. User cannot >> > determine > >> whether S2 occurs before S3 or not, so even if the decrement operation >> does not wake up anyone, I think it should be OK. >> > > I would say that this is just incorrect usage of WaitGroup. S3 is a > "root" Add, Wait must not be called before "root" Add's. > Are multiple rounds okay? Add() go Done() Wait() Add() go Done() Wait() ? Are you saying instead that Add and Wait should never be concurrent? What about Add of a negative number? None of these requirements are documented, if they exist. Russ
Sign in to reply to this message.
On Wed, Apr 9, 2014 at 3:41 PM, Russ Cox <rsc@golang.org> wrote: > Are you saying instead that Add and Wait should never be concurrent? IMO: If Add executes concurrently with Wait then Wait can wait or not. It follows that Add must HB Wait for Wait to have any defined behavior. -j
Sign in to reply to this message.
On Wed, Apr 9, 2014 at 9:50 AM, Jan Mercl <0xjnml@gmail.com> wrote: > On Wed, Apr 9, 2014 at 3:41 PM, Russ Cox <rsc@golang.org> wrote: > > Are you saying instead that Add and Wait should never be concurrent? > > IMO: If Add executes concurrently with Wait then Wait can wait or not. > It follows that Add must HB Wait for Wait to have any defined > behavior. > I think that ruiu's response to my scenario was correct: my scenario had Add and Done racing with each other, and instead of interpreting the result as a missed wakeup the correct interpretation is that the Add happened before the Done. If Add races against Wait (and if not for the Add, Wait would see a 0), then the same logic applies: it is okay for the Add to appear to happen before the Wait, and it is okay for the Add to appear to happen after the Wait. But one of those has to happen. That is different from not having a defined behavior. Russ
Sign in to reply to this message.
On Wed, Apr 9, 2014 at 6:41 AM, Russ Cox <rsc@golang.org> wrote: > On Wed, Apr 9, 2014 at 3:36 AM, <dvyukov@google.com> wrote: >> >> On 2014/04/08 20:47:13, ruiu wrote: >>> >>> I don't think that should be considered as a missed wakeup, because >> >> the >>> >>> order of step 2 and step 3 are not guaranteed and not observable by >> >> user. >> >>> S1. G0: call Wait to add itself to the wait queue >>> S2. G1: decrement wg.counter, observe the counter is now 0 >>> (should unblock goroutines queued *at this moment*) >>> S3. G2: increment wg.counter >>> S4. G2: call Wait() to add itself to the wait queue >>> S5. G1: acquire wg.m, unblock all waiting goroutines >> >> >>> If S3 occurs before S2, the counter becomes 1 -> 2 -> 1 (instead of >>> 1 -> 0 -> 1), so no one should be awaken by that. User cannot >> >> determine >>> >>> whether S2 occurs before S3 or not, so even if the decrement operation >>> does not wake up anyone, I think it should be OK. >> >> >> I would say that this is just incorrect usage of WaitGroup. S3 is a >> "root" Add, Wait must not be called before "root" Add's. > > > Are multiple rounds okay? > > Add() > go Done() > Wait() > > Add() > go Done() > Wait() > > ? Multiple rounds are okay. > Are you saying instead that Add and Wait should never be concurrent? What > about Add of a negative number? Add of a positive value that can potentially start with zero counter must not run concurrently with Wait. Add of a positive value that can not potentially start with zero counter can run concurrently with Wait. > None of these requirements are documented, if they exist. Docs state even stronger statement -- "Note that calls with positive delta must happen before the call to Wait". This actually prohibits a legal usage when you create and wait for a tree of goroutines.
Sign in to reply to this message.
On Wed, Apr 9, 2014 at 10:03 AM, Dmitry Vyukov <dvyukov@google.com> wrote: > Add of a positive value that can potentially start with zero counter > must not run concurrently with Wait. > In this case it seems like we can and do provide the guarantee that either the Add appears to happen before the Wait starts or Add appears to happen after the Wait ends. No? I am not sure the docs need to be explicit about this distinction - I think the Note is a caution rather than a requirement - but I want to make sure we understand what we think the implementation does. Thanks. Russ
Sign in to reply to this message.
On Wed, Apr 9, 2014 at 7:08 AM, Russ Cox <rsc@golang.org> wrote: > On Wed, Apr 9, 2014 at 10:03 AM, Dmitry Vyukov <dvyukov@google.com> wrote: >> >> Add of a positive value that can potentially start with zero counter >> must not run concurrently with Wait. > > > In this case it seems like we can and do provide the guarantee that either > the Add appears to happen before the Wait starts or Add appears to happen > after the Wait ends. No? Do you mean incorrect programs like: go func() { wg.Add(1) wg.Done() }() wg.Wait() ? If we provide some guarantees for such programs, it must be pure coincidence. > I am not sure the docs need to be explicit about this distinction - I think > the Note is a caution rather than a requirement - but I want to make sure we > understand what we think the implementation does. > > Thanks. > Russ
Sign in to reply to this message.
On Wed, Apr 9, 2014 at 10:16 AM, Dmitry Vyukov <dvyukov@google.com> wrote: > On Wed, Apr 9, 2014 at 7:08 AM, Russ Cox <rsc@golang.org> wrote: > > On Wed, Apr 9, 2014 at 10:03 AM, Dmitry Vyukov <dvyukov@google.com> > wrote: > >> > >> Add of a positive value that can potentially start with zero counter > >> must not run concurrently with Wait. > > > > > > In this case it seems like we can and do provide the guarantee that > either > > the Add appears to happen before the Wait starts or Add appears to happen > > after the Wait ends. No? > > > Do you mean incorrect programs like: > > go func() { > wg.Add(1) > wg.Done() > }() > wg.Wait() > > ? > > If we provide some guarantees for such programs, it must be pure > coincidence. > Surely we guarantee that Wait does not block forever? Russ
Sign in to reply to this message.
On Wed, Apr 9, 2014 at 7:33 AM, Russ Cox <rsc@golang.org> wrote: > On Wed, Apr 9, 2014 at 10:16 AM, Dmitry Vyukov <dvyukov@google.com> wrote: >> >> On Wed, Apr 9, 2014 at 7:08 AM, Russ Cox <rsc@golang.org> wrote: >> > On Wed, Apr 9, 2014 at 10:03 AM, Dmitry Vyukov <dvyukov@google.com> >> > wrote: >> >> >> >> Add of a positive value that can potentially start with zero counter >> >> must not run concurrently with Wait. >> > >> > >> > In this case it seems like we can and do provide the guarantee that >> > either >> > the Add appears to happen before the Wait starts or Add appears to >> > happen >> > after the Wait ends. No? >> >> >> Do you mean incorrect programs like: >> >> go func() { >> wg.Add(1) >> wg.Done() >> }() >> wg.Wait() >> >> ? >> >> If we provide some guarantees for such programs, it must be pure >> coincidence. > > > Surely we guarantee that Wait does not block forever? Maybe. Do you think it's important? Race detector must flag such code, and the code must be fixed.
Sign in to reply to this message.
On Wed, Apr 9, 2014 at 10:44 AM, Dmitry Vyukov <dvyukov@google.com> wrote: > >> Do you mean incorrect programs like: > >> > >> go func() { > >> wg.Add(1) > >> wg.Done() > >> }() > >> wg.Wait() > >> > >> ? > >> > >> If we provide some guarantees for such programs, it must be pure > >> coincidence. > > > > > > Surely we guarantee that Wait does not block forever? > > Maybe. Do you think it's important? > Race detector must flag such code, and the code must be fixed. > I think it is important that Wait and Add behave as if some ordering happened: either Wait before Add, or Wait after Add, and either way Wait does not block forever. I don't think that should be controversial, and I don't think it needs to be called out in any documentation either. If the race detector wants to point out concurrent Add+Wait, that's fine with me. Russ
Sign in to reply to this message.
On Wed, Apr 9, 2014 at 9:31 AM, Russ Cox <rsc@golang.org> wrote: > On Wed, Apr 9, 2014 at 10:44 AM, Dmitry Vyukov <dvyukov@google.com> wrote: > >> >> Do you mean incorrect programs like: >> >> >> >> go func() { >> >> wg.Add(1) >> >> wg.Done() >> >> }() >> >> wg.Wait() >> >> >> >> ? >> >> >> >> If we provide some guarantees for such programs, it must be pure >> >> coincidence. >> > >> > >> > Surely we guarantee that Wait does not block forever? >> >> Maybe. Do you think it's important? >> Race detector must flag such code, and the code must be fixed. >> > > I think it is important that Wait and Add behave as if some ordering > happened: either Wait before Add, or Wait after Add, and either way Wait > does not block forever. I don't think that should be controversial, and I > don't think it needs to be called out in any documentation either. > Agreed here. I'd think Wait and Add should behave as atomic operations on a WaitGroup, which is I believe what user expects. > If the race detector wants to point out concurrent Add+Wait, that's fine > with me. > > Russ >
Sign in to reply to this message.
ruiu, please add the test so we can submit this CL. thanks.
Sign in to reply to this message.
Hello dvyukov@google.com, 0xjnml@gmail.com (cc: golang-codereviews@googlegroups.com, rsc@golang.org), Please take another look.
Sign in to reply to this message.
LGTM leaving for dvyukov
Sign in to reply to this message.
Dmitry, I don't have commit access, so please submit if it's LGTM. Thanks. On Wed, Apr 9, 2014 at 12:54 PM, <rsc@golang.org> wrote: > LGTM leaving for dvyukov > > > https://codereview.appspot.com/85580043/ >
Sign in to reply to this message.
LGTM
Sign in to reply to this message.
*** Submitted as https://code.google.com/p/go/source/detail?r=e95997ba7135 *** sync: fix spurious wakeup from WaitGroup.Wait There is a race condition that causes spurious wakeup from Wait in the following case: G1: decrement wg.counter, observe the counter is now 0 (should unblock goroutines queued *at this moment*) G2: increment wg.counter G2: call Wait() to add itself to the wait queue G1: acquire wg.m, unblock all waiting goroutines In the last step G2 is spuriously woken up by G1. Fixes issue 7734. LGTM=rsc, dvyukov R=dvyukov, 0xjnml, rsc CC=golang-codereviews https://codereview.appspot.com/85580043 Committer: Dmitriy Vyukov <dvyukov@google.com>
Sign in to reply to this message.
This CL appears to have broken the windows-amd64-race builder. See http://build.golang.org/log/2739e5425f33ec233427501024b763e7309ee182
Sign in to reply to this message.
Looks unrelated: --- FAIL: TestCtrlBreak (0.55 seconds) signal_windows_test.go:73: Failed to compile: exit status 2 # command-line-arguments cannot create C:\Users\Vasya\AppData\Local\Temp\ctlbreak.exe: Permission denied On Thu, Apr 10, 2014 at 6:49 PM, <gobot@golang.org> wrote: > This CL appears to have broken the windows-amd64-race builder. > See http://build.golang.org/log/2739e5425f33ec233427501024b763e7309ee182 > > https://codereview.appspot.com/85580043/
Sign in to reply to this message.
Message was sent while issue was closed.
On 2014/04/10 14:51:28, dvyukov wrote: > Looks unrelated: > > --- FAIL: TestCtrlBreak (0.55 seconds) > signal_windows_test.go:73: Failed to compile: exit status 2 > # command-line-arguments > cannot create C:\Users\Vasya\AppData\Local\Temp\ctlbreak.exe: Permission denied > Dmitry, please check no one has ctlbreak.exe opened. Perhaps I should have made the file name unique (here https://codereview.appspot.com/84650047 the CL, if you think we should change it). Thank you. Also, your linux and windows race builders need git http://build.golang.org/log/3a0c90203f6c60720f93226c8ff6d939022589bf: go: missing Git command. See http://golang.org/s/gogetcmd package code.google.com/p/go.talks/... imports code.google.com/p/go.talks/2013/distsys imports github.com/golang/glog: exec: "git": executable file not found in $PATH Alex
Sign in to reply to this message.
On Fri, Apr 11, 2014 at 10:42 AM, <alex.brainman@gmail.com> wrote: > On 2014/04/10 14:51:28, dvyukov wrote: >> >> Looks unrelated: > > >> --- FAIL: TestCtrlBreak (0.55 seconds) >> signal_windows_test.go:73: Failed to compile: exit status 2 >> # command-line-arguments >> cannot create C:\Users\Vasya\AppData\Local\Temp\ctlbreak.exe: > > Permission denied > > > Dmitry, please check no one has ctlbreak.exe opened. Perhaps I should > have made the file name unique (here > https://codereview.appspot.com/84650047 the CL, if you think we should > change it). Thank you. I've connected to the builder and it has the following message box: ctlbreak.exe - Application Error The application was unable to start correctly (0xc0000142). Click OK to close the application > Also, your linux and windows race builders need git > http://build.golang.org/log/3a0c90203f6c60720f93226c8ff6d939022589bf: > > go: missing Git command. See http://golang.org/s/gogetcmd > package code.google.com/p/go.talks/... > imports code.google.com/p/go.talks/2013/distsys > imports github.com/golang/glog: exec: "git": executable file not > found > in $PATH done
Sign in to reply to this message.
Message was sent while issue was closed.
On 2014/04/11 07:34:59, dvyukov wrote: > > I've connected to the builder and it has the following message box: > > ctlbreak.exe - Application Error > The application was unable to start correctly (0xc0000142). Click OK > to close the application > Perhaps it is ERROR_DEVICE_NO_RESOURCES. We don't have proper exception handler anymore (the one that will stop the program and print stack trace) - we handle exceptions inside of Go code, but not outside. Perhaps we'll see more of errors like that. Perhaps will have to do something about that. For now, I submitted 04ab9968c31b, so the builder will continue even after failure like that. Alex
Sign in to reply to this message.
On Wednesday, April 9, 2014 6:31:40 PM UTC+2, russ cox wrote: > I think it is important that Wait and Add behave as if some ordering happened: either Wait before Add, or Wait after Add, and either way Wait does not block forever. I don't think that should be controversial, and I don't think it needs to be called out in any documentation either. Thanks for clearing this up! I always considered the sequence Wait();Add() as invalid and considered the sync.WaitGroup as consumable resource, which is consumed after the call to Wait.
Sign in to reply to this message.
|