10 years, 6 months ago
(2014-07-16 11:40:52 UTC)
#2
https://codereview.appspot.com/110580043/diff/40001/src/pkg/runtime/chan.goc
File src/pkg/runtime/chan.goc (right):
https://codereview.appspot.com/110580043/diff/40001/src/pkg/runtime/chan.goc#...
src/pkg/runtime/chan.goc:234: if(!block && !c->closed && ((c->dataqsiz == 0 &&
c->sendq.first == nil) ||
The check for data in the channel is a single read of mutable state
(c->recvq.first or c->qcount, depending on the immutable c->dataqsiz). However,
this line also contains a read of c->closed. There is nothing to say that the
two reads will be self-consistent.
If you have a program that does:
c := make(chan int, 1)
c <- 1
go func() {
select {
case <-c:
println("recv from c")
default:
println("c is not ready - BUG!")
}
}()
close(c)
<-c
I believe c is always ready and the select should never end up in the default
case. Either the select happens before the close, in which case it receives the
1, or it happens after the close but before the <-c, in which case it receives
the 1, or it happens after both close(c) and <-c, in which case it receives a 0
(from being closed and empty).
In the fast path, it could happen that the select sees c->closed == false, then
the original goroutine does close(c), <-c, then the select sees c->qcount == 0,
which causes it to return false incorrectly, falling into the 'c is not ready -
BUG!' case of the program.
Maybe moving the c->closed check to the end of the condition fixes the problem,
or maybe checking c->closed twice fixes the problem. Either way the change seems
a bit scarier and more subtle (less straightforward) than advertised. If moving
or repeating the closed check fixes the problem, do we need to use the atomic
operations to make sure the ordering is respected on ARM or Power or (maybe
later) MIPS?
I am not 100% sure, but I think that for send the checks are already in the
correct order. Reversing them there would introduce a similar race.
10 years, 6 months ago
(2014-07-16 13:50:32 UTC)
#3
On 2014/07/16 11:40:52, rsc wrote:
> https://codereview.appspot.com/110580043/diff/40001/src/pkg/runtime/chan.goc
> File src/pkg/runtime/chan.goc (right):
>
>
https://codereview.appspot.com/110580043/diff/40001/src/pkg/runtime/chan.goc#...
> src/pkg/runtime/chan.goc:234: if(!block && !c->closed && ((c->dataqsiz == 0 &&
> c->sendq.first == nil) ||
> The check for data in the channel is a single read of mutable state
> (c->recvq.first or c->qcount, depending on the immutable c->dataqsiz).
However,
> this line also contains a read of c->closed. There is nothing to say that the
> two reads will be self-consistent.
>
> If you have a program that does:
>
> c := make(chan int, 1)
> c <- 1
>
> go func() {
> select {
> case <-c:
> println("recv from c")
> default:
> println("c is not ready - BUG!")
> }
> }()
>
> close(c)
> <-c
>
> I believe c is always ready and the select should never end up in the default
> case. Either the select happens before the close, in which case it receives
the
> 1, or it happens after the close but before the <-c, in which case it receives
> the 1, or it happens after both close(c) and <-c, in which case it receives a
0
> (from being closed and empty).
>
> In the fast path, it could happen that the select sees c->closed == false,
then
> the original goroutine does close(c), <-c, then the select sees c->qcount ==
0,
> which causes it to return false incorrectly, falling into the 'c is not ready
-
> BUG!' case of the program.
>
> Maybe moving the c->closed check to the end of the condition fixes the
problem,
> or maybe checking c->closed twice fixes the problem. Either way the change
seems
> a bit scarier and more subtle (less straightforward) than advertised. If
moving
> or repeating the closed check fixes the problem, do we need to use the atomic
> operations to make sure the ordering is respected on ARM or Power or (maybe
> later) MIPS?
>
> I am not 100% sure, but I think that for send the checks are already in the
> correct order. Reversing them there would introduce a similar race.
You are right!
I am able to reproduce the failure.
Yes, reordering the loads helps.
I've updated the code and added comments.
PTAL
10 years, 6 months ago
(2014-07-16 13:53:55 UTC)
#4
On 2014/07/16 11:40:52, rsc wrote:
> https://codereview.appspot.com/110580043/diff/40001/src/pkg/runtime/chan.goc
> File src/pkg/runtime/chan.goc (right):
>
>
https://codereview.appspot.com/110580043/diff/40001/src/pkg/runtime/chan.goc#...
> src/pkg/runtime/chan.goc:234: if(!block && !c->closed && ((c->dataqsiz == 0 &&
> c->sendq.first == nil) ||
> The check for data in the channel is a single read of mutable state
> (c->recvq.first or c->qcount, depending on the immutable c->dataqsiz).
However,
> this line also contains a read of c->closed. There is nothing to say that the
> two reads will be self-consistent.
>
> If you have a program that does:
>
> c := make(chan int, 1)
> c <- 1
>
> go func() {
> select {
> case <-c:
> println("recv from c")
> default:
> println("c is not ready - BUG!")
> }
> }()
>
> close(c)
> <-c
>
> I believe c is always ready and the select should never end up in the default
> case. Either the select happens before the close, in which case it receives
the
> 1, or it happens after the close but before the <-c, in which case it receives
> the 1, or it happens after both close(c) and <-c, in which case it receives a
0
> (from being closed and empty).
>
> In the fast path, it could happen that the select sees c->closed == false,
then
> the original goroutine does close(c), <-c, then the select sees c->qcount ==
0,
> which causes it to return false incorrectly, falling into the 'c is not ready
-
> BUG!' case of the program.
>
> Maybe moving the c->closed check to the end of the condition fixes the
problem,
> or maybe checking c->closed twice fixes the problem. Either way the change
seems
> a bit scarier and more subtle (less straightforward) than advertised. If
moving
> or repeating the closed check fixes the problem, do we need to use the atomic
> operations to make sure the ordering is respected on ARM or Power or (maybe
> later) MIPS?
>
> I am not 100% sure, but I think that for send the checks are already in the
> correct order. Reversing them there would introduce a similar race.
Russ, Dmitry, I have a 4 core arm machine and can hammer the crap out of these
tests to try to provoke a failure. Do you think the existing test/select.c
test's cover this condition sufficiently ?
I've added the special test that triggers the race to this change. I would run ...
10 years, 6 months ago
(2014-07-16 13:56:17 UTC)
#5
I've added the special test that triggers the race to this change.
I would run all chan/select related tests from test/ and also
-run="Chan|Select|Recv|Send" from runtime.
On Wed, Jul 16, 2014 at 5:53 PM, <dave@cheney.net> wrote:
> On 2014/07/16 11:40:52, rsc wrote:
>
> https://codereview.appspot.com/110580043/diff/40001/src/pkg/runtime/chan.goc
>>
>> File src/pkg/runtime/chan.goc (right):
>
>
>
>
https://codereview.appspot.com/110580043/diff/40001/src/pkg/runtime/chan.goc#...
>>
>> src/pkg/runtime/chan.goc:234: if(!block && !c->closed && ((c->dataqsiz
>
> == 0 &&
>>
>> c->sendq.first == nil) ||
>> The check for data in the channel is a single read of mutable state
>> (c->recvq.first or c->qcount, depending on the immutable c->dataqsiz).
>
> However,
>>
>> this line also contains a read of c->closed. There is nothing to say
>
> that the
>>
>> two reads will be self-consistent.
>
>
>> If you have a program that does:
>
>
>> c := make(chan int, 1)
>> c <- 1
>
>
>> go func() {
>> select {
>> case <-c:
>> println("recv from c")
>> default:
>> println("c is not ready - BUG!")
>> }
>> }()
>
>
>> close(c)
>> <-c
>
>
>> I believe c is always ready and the select should never end up in the
>
> default
>>
>> case. Either the select happens before the close, in which case it
>
> receives the
>>
>> 1, or it happens after the close but before the <-c, in which case it
>
> receives
>>
>> the 1, or it happens after both close(c) and <-c, in which case it
>
> receives a 0
>>
>> (from being closed and empty).
>
>
>> In the fast path, it could happen that the select sees c->closed ==
>
> false, then
>>
>> the original goroutine does close(c), <-c, then the select sees
>
> c->qcount == 0,
>>
>> which causes it to return false incorrectly, falling into the 'c is
>
> not ready -
>>
>> BUG!' case of the program.
>
>
>> Maybe moving the c->closed check to the end of the condition fixes the
>
> problem,
>>
>> or maybe checking c->closed twice fixes the problem. Either way the
>
> change seems
>>
>> a bit scarier and more subtle (less straightforward) than advertised.
>
> If moving
>>
>> or repeating the closed check fixes the problem, do we need to use the
>
> atomic
>>
>> operations to make sure the ordering is respected on ARM or Power or
>
> (maybe
>>
>> later) MIPS?
>
>
>> I am not 100% sure, but I think that for send the checks are already
>
> in the
>>
>> correct order. Reversing them there would introduce a similar race.
>
>
> Russ, Dmitry, I have a 4 core arm machine and can hammer the crap out of
> these tests to try to provoke a failure. Do you think the existing
> test/select.c test's cover this condition sufficiently ?
>
> https://codereview.appspot.com/110580043/
On 2014/07/16 13:50:32, dvyukov wrote: > On 2014/07/16 11:40:52, rsc wrote: > > https://codereview.appspot.com/110580043/diff/40001/src/pkg/runtime/chan.goc > ...
10 years, 6 months ago
(2014-07-16 14:17:00 UTC)
#6
On 2014/07/16 13:50:32, dvyukov wrote:
> On 2014/07/16 11:40:52, rsc wrote:
> > https://codereview.appspot.com/110580043/diff/40001/src/pkg/runtime/chan.goc
> > File src/pkg/runtime/chan.goc (right):
> >
> >
>
https://codereview.appspot.com/110580043/diff/40001/src/pkg/runtime/chan.goc#...
> > src/pkg/runtime/chan.goc:234: if(!block && !c->closed && ((c->dataqsiz == 0
&&
> > c->sendq.first == nil) ||
> > The check for data in the channel is a single read of mutable state
> > (c->recvq.first or c->qcount, depending on the immutable c->dataqsiz).
> However,
> > this line also contains a read of c->closed. There is nothing to say that
the
> > two reads will be self-consistent.
> >
> > If you have a program that does:
> >
> > c := make(chan int, 1)
> > c <- 1
> >
> > go func() {
> > select {
> > case <-c:
> > println("recv from c")
> > default:
> > println("c is not ready - BUG!")
> > }
> > }()
> >
> > close(c)
> > <-c
> >
> > I believe c is always ready and the select should never end up in the
default
> > case. Either the select happens before the close, in which case it receives
> the
> > 1, or it happens after the close but before the <-c, in which case it
receives
> > the 1, or it happens after both close(c) and <-c, in which case it receives
a
> 0
> > (from being closed and empty).
> >
> > In the fast path, it could happen that the select sees c->closed == false,
> then
> > the original goroutine does close(c), <-c, then the select sees c->qcount ==
> 0,
> > which causes it to return false incorrectly, falling into the 'c is not
ready
> -
> > BUG!' case of the program.
> >
> > Maybe moving the c->closed check to the end of the condition fixes the
> problem,
> > or maybe checking c->closed twice fixes the problem. Either way the change
> seems
> > a bit scarier and more subtle (less straightforward) than advertised. If
> moving
> > or repeating the closed check fixes the problem, do we need to use the
atomic
> > operations to make sure the ordering is respected on ARM or Power or (maybe
> > later) MIPS?
> >
> > I am not 100% sure, but I think that for send the checks are already in the
> > correct order. Reversing them there would introduce a similar race.
Hummm... I fail to see how this can affect send, even if loads are reversed. In
all scenarios that I can construct,
there is always an ordering of operations in which the non-blocking send returns
false.
>
>
> You are right!
> I am able to reproduce the failure.
>
> Yes, reordering the loads helps.
> I've updated the code and added comments.
> PTAL
On 2014/07/16 14:17:00, dvyukov wrote: > On 2014/07/16 13:50:32, dvyukov wrote: > > On 2014/07/16 ...
10 years, 5 months ago
(2014-08-05 18:31:15 UTC)
#7
On 2014/07/16 14:17:00, dvyukov wrote:
> On 2014/07/16 13:50:32, dvyukov wrote:
> > On 2014/07/16 11:40:52, rsc wrote:
> > >
https://codereview.appspot.com/110580043/diff/40001/src/pkg/runtime/chan.goc
> > > File src/pkg/runtime/chan.goc (right):
> > >
> > >
> >
>
https://codereview.appspot.com/110580043/diff/40001/src/pkg/runtime/chan.goc#...
> > > src/pkg/runtime/chan.goc:234: if(!block && !c->closed && ((c->dataqsiz ==
0
> &&
> > > c->sendq.first == nil) ||
> > > The check for data in the channel is a single read of mutable state
> > > (c->recvq.first or c->qcount, depending on the immutable c->dataqsiz).
> > However,
> > > this line also contains a read of c->closed. There is nothing to say that
> the
> > > two reads will be self-consistent.
> > >
> > > If you have a program that does:
> > >
> > > c := make(chan int, 1)
> > > c <- 1
> > >
> > > go func() {
> > > select {
> > > case <-c:
> > > println("recv from c")
> > > default:
> > > println("c is not ready - BUG!")
> > > }
> > > }()
> > >
> > > close(c)
> > > <-c
> > >
> > > I believe c is always ready and the select should never end up in the
> default
> > > case. Either the select happens before the close, in which case it
receives
> > the
> > > 1, or it happens after the close but before the <-c, in which case it
> receives
> > > the 1, or it happens after both close(c) and <-c, in which case it
receives
> a
> > 0
> > > (from being closed and empty).
> > >
> > > In the fast path, it could happen that the select sees c->closed == false,
> > then
> > > the original goroutine does close(c), <-c, then the select sees c->qcount
==
> > 0,
> > > which causes it to return false incorrectly, falling into the 'c is not
> ready
> > -
> > > BUG!' case of the program.
> > >
> > > Maybe moving the c->closed check to the end of the condition fixes the
> > problem,
> > > or maybe checking c->closed twice fixes the problem. Either way the change
> > seems
> > > a bit scarier and more subtle (less straightforward) than advertised. If
> > moving
> > > or repeating the closed check fixes the problem, do we need to use the
> atomic
> > > operations to make sure the ordering is respected on ARM or Power or
(maybe
> > > later) MIPS?
> > >
> > > I am not 100% sure, but I think that for send the checks are already in
the
> > > correct order. Reversing them there would introduce a similar race.
>
> Hummm... I fail to see how this can affect send, even if loads are reversed.
In
> all scenarios that I can construct,
> there is always an ordering of operations in which the non-blocking send
returns
> false.
>
>
> >
> >
> > You are right!
> > I am able to reproduce the failure.
> >
> > Yes, reordering the loads helps.
> > I've updated the code and added comments.
> > PTAL
ping
LGTM https://codereview.appspot.com/110580043/diff/160001/src/pkg/runtime/chan.go File src/pkg/runtime/chan.go (right): https://codereview.appspot.com/110580043/diff/160001/src/pkg/runtime/chan.go#newcode105 src/pkg/runtime/chan.go:105: // Fast path: if it's a non-blocking operation ...
10 years, 5 months ago
(2014-08-25 00:26:45 UTC)
#10
LGTM
https://codereview.appspot.com/110580043/diff/160001/src/pkg/runtime/chan.go
File src/pkg/runtime/chan.go (right):
https://codereview.appspot.com/110580043/diff/160001/src/pkg/runtime/chan.go#...
src/pkg/runtime/chan.go:105: // Fast path: if it's a non-blocking operation that
can't proceed,
replace comment
// Fast path: check for failed non-blocking operation without acquiring the
lock.
//
// After observing that the channel is not closed, we observe that the channel
is
// not ready for sending. Each of these observations is a single word-sized read
// (first c.closed and second c.recvq.first or c.qcount depending on kind of
channel).
// Because a closed channel cannot transition from 'ready for sending' to
// 'not ready for sending', even if the channel is closed between the two
observations,
// they imply a moment between the two when the channel was both not yet closed
// and not ready for sending. We behave as if we observed the channel at that
moment,
// and report that the send cannot proceed.
//
// It is okay if the reads are reordered here: if we observe that the channel is
not
// ready for sending and then observe that it is not closed, that implies that
the
// channel wasn't closed during the first observation.
https://codereview.appspot.com/110580043/diff/160001/src/pkg/runtime/chan.goc
File src/pkg/runtime/chan.goc (right):
https://codereview.appspot.com/110580043/diff/160001/src/pkg/runtime/chan.goc...
src/pkg/runtime/chan.goc:36: chansend(ChanType *t, Hchan *c, byte *ep, bool
block, void *pc)
Why is there a copy of this code in both chan.go and chan.goc?
We need to have just one when this is over.
https://codereview.appspot.com/110580043/diff/160001/src/pkg/runtime/chan.goc...
src/pkg/runtime/chan.goc:63: // Fast path: if it's a non-blocking operation that
can't proceed,
use same comment i suggested for other file.
https://codereview.appspot.com/110580043/diff/160001/src/pkg/runtime/chan.goc...
src/pkg/runtime/chan.goc:195: // Fast path: if it's a non-blocking operation
that can't proceed,
replace comment
// Fast path: check for failed non-blocking operation without acquiring the
lock.
//
// After observing that the channel is not ready for receiving, we observe that
the
// channel is not closed. Each of these observations is a single word-sized read
// (first c.sendq.first or c.qcount, and second c.closed).
// Because a channel cannot be reopened, the later observation of the channel
// being not closed implies that it was also not closed at the moment of the
// first observation. We behave as if we observed the channel at that moment
// and report that the receive cannot proceed.
//
// The order of operations is important here: reversing the operations can lead
to
// incorrect behavior when racing with a close.
https://codereview.appspot.com/110580043/diff/160001/src/pkg/runtime/chan_tes...
File src/pkg/runtime/chan_test.go (right):
https://codereview.appspot.com/110580043/diff/160001/src/pkg/runtime/chan_tes...
src/pkg/runtime/chan_test.go:202: for i := 0; i < 1e4; i++ {
please make this shorter in short mode.
n := 10000
if testing.Short() {
n = 100
}
for i := 0; i < n; i++ {
https://codereview.appspot.com/110580043/diff/160001/src/pkg/runtime/chan.go File src/pkg/runtime/chan.go (right): https://codereview.appspot.com/110580043/diff/160001/src/pkg/runtime/chan.go#newcode105 src/pkg/runtime/chan.go:105: // Fast path: if it's a non-blocking operation that ...
10 years, 5 months ago
(2014-08-25 07:53:38 UTC)
#11
https://codereview.appspot.com/110580043/diff/160001/src/pkg/runtime/chan.go
File src/pkg/runtime/chan.go (right):
https://codereview.appspot.com/110580043/diff/160001/src/pkg/runtime/chan.go#...
src/pkg/runtime/chan.go:105: // Fast path: if it's a non-blocking operation that
can't proceed,
On 2014/08/25 00:26:45, rsc wrote:
> replace comment
>
> // Fast path: check for failed non-blocking operation without acquiring the
> lock.
> //
> // After observing that the channel is not closed, we observe that the channel
> is
> // not ready for sending. Each of these observations is a single word-sized
read
> // (first c.closed and second c.recvq.first or c.qcount depending on kind of
> channel).
> // Because a closed channel cannot transition from 'ready for sending' to
> // 'not ready for sending', even if the channel is closed between the two
> observations,
> // they imply a moment between the two when the channel was both not yet
closed
> // and not ready for sending. We behave as if we observed the channel at that
> moment,
> // and report that the send cannot proceed.
> //
> // It is okay if the reads are reordered here: if we observe that the channel
is
> not
> // ready for sending and then observe that it is not closed, that implies that
> the
> // channel wasn't closed during the first observation.
Done.
https://codereview.appspot.com/110580043/diff/160001/src/pkg/runtime/chan.goc
File src/pkg/runtime/chan.goc (right):
https://codereview.appspot.com/110580043/diff/160001/src/pkg/runtime/chan.goc...
src/pkg/runtime/chan.goc:36: chansend(ChanType *t, Hchan *c, byte *ep, bool
block, void *pc)
On 2014/08/25 00:26:45, rsc wrote:
> Why is there a copy of this code in both chan.go and chan.goc?
> We need to have just one when this is over.
Keith said that something in chan.goc still uses chansend, and that he will
remove it soon
https://codereview.appspot.com/110580043/diff/160001/src/pkg/runtime/chan.goc...
src/pkg/runtime/chan.goc:63: // Fast path: if it's a non-blocking operation that
can't proceed,
On 2014/08/25 00:26:45, rsc wrote:
> use same comment i suggested for other file.
Done.
https://codereview.appspot.com/110580043/diff/160001/src/pkg/runtime/chan.goc...
src/pkg/runtime/chan.goc:195: // Fast path: if it's a non-blocking operation
that can't proceed,
On 2014/08/25 00:26:45, rsc wrote:
> replace comment
>
> // Fast path: check for failed non-blocking operation without acquiring the
> lock.
> //
> // After observing that the channel is not ready for receiving, we observe
that
> the
> // channel is not closed. Each of these observations is a single word-sized
read
> // (first c.sendq.first or c.qcount, and second c.closed).
> // Because a channel cannot be reopened, the later observation of the channel
> // being not closed implies that it was also not closed at the moment of the
> // first observation. We behave as if we observed the channel at that moment
> // and report that the receive cannot proceed.
> //
> // The order of operations is important here: reversing the operations can
lead
> to
> // incorrect behavior when racing with a close.
>
>
Done.
https://codereview.appspot.com/110580043/diff/160001/src/pkg/runtime/chan_tes...
File src/pkg/runtime/chan_test.go (right):
https://codereview.appspot.com/110580043/diff/160001/src/pkg/runtime/chan_tes...
src/pkg/runtime/chan_test.go:202: for i := 0; i < 1e4; i++ {
On 2014/08/25 00:26:45, rsc wrote:
> please make this shorter in short mode.
>
> n := 10000
> if testing.Short() {
> n = 100
> }
> for i := 0; i < n; i++ {
Done.
*** Submitted as https://code.google.com/p/go/source/detail?r=ab81254ccaec *** runtime: add fast paths to non-blocking channel operations benchmark old ...
10 years, 5 months ago
(2014-08-25 07:55:44 UTC)
#12
*** Submitted as https://code.google.com/p/go/source/detail?r=ab81254ccaec ***
runtime: add fast paths to non-blocking channel operations
benchmark old ns/op new ns/op delta
BenchmarkChanNonblocking 27.8 7.80 -71.94%
BenchmarkChanNonblocking-2 79.1 3.94 -95.02%
BenchmarkChanNonblocking-4 71.2 2.04 -97.13%
LGTM=rsc
R=golang-codereviews, rsc, dave
CC=golang-codereviews
https://codereview.appspot.com/110580043
Issue 110580043: code review 110580043: runtime: add fast paths to non-blocking channel operations
(Closed)
Created 10 years, 6 months ago by dvyukov
Modified 10 years, 5 months ago
Reviewers: gobot
Base URL:
Comments: 11