|
|
Descriptionruntime: fix use after close race in Solaris network poller
The Solaris network poller uses event ports, which are
level-triggered. As such, it has to re-arm itself after each
wakeup. The arming mechanism (which runs in its own thread) raced
with the closing of a file descriptor happening in a different
thread. When a network file descriptor is about to be closed,
the network poller is awaken to give it a chance to remove its
association with the file descriptor. Because the poller always
re-armed itself, it raced with code that closed the descriptor.
This change makes the network poller check before re-arming if
the file descriptor is about to be closed, in which case it will
ignore the re-arming request. It uses the per-PollDesc lock in
order to serialize access to the PollDesc.
This change also adds extensive documentation describing the
Solaris implementation of the network poller.
Fixes issue 7410.
Patch Set 1 #Patch Set 2 : diff -r 3512bf47df7e https://code.google.com/p/go #Patch Set 3 : diff -r 3512bf47df7e https://code.google.com/p/go #
Total comments: 2
Patch Set 4 : diff -r 3cd546ac3a99 https://code.google.com/p/go #Patch Set 5 : diff -r 3cd546ac3a99 https://code.google.com/p/go #
Total comments: 4
Patch Set 6 : diff -r ceb13960d05b https://code.google.com/p/go #Patch Set 7 : diff -r ceb13960d05b https://code.google.com/p/go #
Total comments: 8
Patch Set 8 : diff -r 08dcdcdb757b https://code.google.com/p/go #Patch Set 9 : diff -r 08dcdcdb757b https://code.google.com/p/go #
MessagesTotal messages: 51
Hello golang-codereviews@googlegroups.com, I'd like you to review this change to https://code.google.com/p/go
Sign in to reply to this message.
This should fix both the builders.
Sign in to reply to this message.
[+iant, dmitry] On Wed, Feb 26, 2014 at 10:16 AM, <aram@mgk.ro> wrote: > Reviewers: golang-codereviews, > > Message: > Hello golang-codereviews@googlegroups.com, > > I'd like you to review this change to > https://code.google.com/p/go > > > Description: > runtime: fix use after close race in Solaris network poller > > The Solaris network poller uses event ports, which are > level-triggered. As such, it has to re-arm itself after each > wakeup. The arming mechanism (which runs in its own thread) raced > with the closing of a file descriptor happening in a different > thread. When a network file descriptor is about to be closed, > the network poller is awaken to give it a chance to remove its > association with the file descriptor. Because the poller always > re-armed itself, it raced with code that closed the descriptor. > > This change makes the network poller check before re-arming if > the file descriptor is about to be closed, in which case it will > ignore the re-arming request. > > Fixes issue 7410. > > Please review this at https://codereview.appspot.com/69190044/ > > Affected files (+9, -0 lines): > M src/pkg/runtime/netpoll.goc > M src/pkg/runtime/netpoll_solaris.c > M src/pkg/runtime/runtime.h > > > Index: src/pkg/runtime/netpoll.goc > =================================================================== > --- a/src/pkg/runtime/netpoll.goc > +++ b/src/pkg/runtime/netpoll.goc > @@ -263,6 +263,12 @@ > return &pd->user; > } > > +bool > +runtime·netpollclosing(PollDesc *pd) > +{ > + return pd->closing; > +} > + > // make pd ready, newly runnable goroutines (if any) are enqueued info > gpp list > void > runtime·netpollready(G **gpp, PollDesc *pd, int32 mode) > Index: src/pkg/runtime/netpoll_solaris.c > =================================================================== > --- a/src/pkg/runtime/netpoll_solaris.c > +++ b/src/pkg/runtime/netpoll_solaris.c > @@ -90,6 +90,8 @@ > uintptr fd = runtime·netpollfd(pd); > ep = (uint32*)runtime·netpolluser(pd); > > + if(runtime·netpollclosing(pd)) > + return; > do { > old = *ep; > events = (old & ~clear) | set; > Index: src/pkg/runtime/runtime.h > =================================================================== > --- a/src/pkg/runtime/runtime.h > +++ b/src/pkg/runtime/runtime.h > @@ -942,6 +942,7 @@ > uintptr runtime·netpollfd(PollDesc*); > void runtime·netpollarm(PollDesc*, int32); > void** runtime·netpolluser(PollDesc*); > +bool runtime·netpollclosing(PollDesc*); > void runtime·crash(void); > void runtime·parsedebugvars(void); > void _rt0_go(void); > > > -- > You received this message because you are subscribed to the Google Groups > "golang-codereviews" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to golang-codereviews+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. >
Sign in to reply to this message.
I don't understand how this avoids the race. I can see that it narrows the time span when the race can occur, but I don't see how the race is avoided. Isn't it possible for closing = true to occur during the loop in runtime·netpollupdate? Would it work to check whether the port is closed in the case where runtime·port_associate fails, and ignore the failure in that case?
Sign in to reply to this message.
On Wed, Feb 26, 2014 at 8:11 PM, <iant@golang.org> wrote: > I don't understand how this avoids the race. I can see that it narrows > the time span when the race can occur, but I don't see how the race is > avoided. Isn't it possible for closing = true to occur during the loop > in runtime·netpollupdate? There are three events of importance here, closing=true, unblocking poller, and poller wakeup (for whatever reason). Unblocking is always after close=true, so what really is important here is how poller wakeup relates to the other two. So we can have this three situations: 1: wakeup, closing=true, unblock This is not very interesting at all, works fine. 2: closing=true, wakeup, unblock Here the poller awakens for whatever reason after closing is set to true, but before being forcibly unblocked. Netpollupdate sees closing=true and doesn't do anything. That's ok, since we are going to disassociate the fd anyway, and in the time between now and when the fd is actually closed, nothing can use it. After all this, the poller will be forcibly awakened, and that's ok. We are already disassociated. 3: closing=true, unblock, wakeup Netpollupdate sees closing=true and will leave the fd disassociated. The poller doesn't unblock again for this fd. Do you find anything wrong in the explanation above? I could be nuts indeed. > Would it work to check whether the port is closed in the case where > runtime·port_associate fails, and ignore the failure in that case? No, this exposes a different race. fd=8 is about to be closed, the poller unblocks, the fd is closed, some other thread opens a new socket, gets the same fd=8, netpollupdate runs, it calls port_associate *successfully* (because the fd is valid), but with a wrong PortDesc. Havoc happens when code actually uses that stale, possibly freed, PortDesc. -- Aram Hăvărneanu
Sign in to reply to this message.
On Wed, Feb 26, 2014 at 11:26 AM, Aram Hăvărneanu <aram@mgk.ro> wrote: > On Wed, Feb 26, 2014 at 8:11 PM, <iant@golang.org> wrote: >> I don't understand how this avoids the race. I can see that it narrows >> the time span when the race can occur, but I don't see how the race is >> avoided. Isn't it possible for closing = true to occur during the loop >> in runtime·netpollupdate? > > There are three events of importance here, closing=true, unblocking > poller, and poller wakeup (for whatever reason). Unblocking is always > after close=true, so what really is important here is how poller wakeup > relates to the other two. > > So we can have this three situations: > > 1: wakeup, closing=true, unblock > > This is not very interesting at all, works fine. > > 2: closing=true, wakeup, unblock > > Here the poller awakens for whatever reason after closing is set to true, > but before being forcibly unblocked. Netpollupdate sees closing=true and > doesn't do anything. That's ok, since we are going to disassociate the fd > anyway, and in the time between now and when the fd is actually closed, > nothing can use it. > > After all this, the poller will be forcibly awakened, and that's ok. We > are already disassociated. > > 3: closing=true, unblock, wakeup > > Netpollupdate sees closing=true and will leave the fd disassociated. The > poller doesn't unblock again for this fd. > > Do you find anything wrong in the explanation above? I could be nuts > indeed. The case I wonder about is: First runtime·netpoll is called, it calls runtime·netpollupdate, that picks up fd, and sees that closing is false. At this point the kernel preempts the thread. Another thread calls fd.Close() which calls fd.pd.Evict which calls runtime_pollUnblock which locks the pd, sets pd->closing = true, and eventually proceeds to close the file descriptor. Now the kernel schedules the first thread again. That thread tries to associate fd, and fails because it is closed. I can't see any locks that prevent that from happening. There is no similar problem in the epoll code because that code is careful to never look at the PollDesc in runtime·netpoll, except for the rg and wg fields that it accesses using atomic operations. Can you explain why this scenario can not occur? >> Would it work to check whether the port is closed in the case where > > runtime·port_associate fails, and ignore the failure in that case? > > No, this exposes a different race. > > fd=8 is about to be closed, the poller unblocks, the fd is closed, some > other thread opens a new socket, gets the same fd=8, netpollupdate runs, > it calls port_associate *successfully* (because the fd is valid), but > with a wrong PortDesc. Havoc happens when code actually uses that stale, > possibly freed, PortDesc. Good point, I also don't understand why that can't happen. Ian
Sign in to reply to this message.
Why do you need to rearm it after polling? You arm it before each netpollwait, and netpoll does not care about notification if nobody is waiting. So it seems that arm before wait is enough. Or is it because of concurrent read/write waiters? Then I do not understand "To effect edge-triggered events" comment. Netpoll does not require notifications to be edge-triggered.
Sign in to reply to this message.
> Why do you need to rearm it after polling? > You arm it before each netpollwait, and netpoll does not care about > notification if nobody is waiting. So it seems that arm before wait is > enough. It doesn't work otherwise, e.g. (with your proposal): : oos:src; go test -v -short -timeout 20s -run 'TestShutdown|TestConnAndListener' net === RUN TestConnAndListener --- FAIL: TestConnAndListener (10.01 seconds) conn_test.go:114: Conn.Read failed: read tcp 127.0.0.1:59245: i/o timeout conn_test.go:80: Conn.Read failed: read tcp 127.0.0.1:52002: i/o timeout === RUN TestShutdown panic: test timed out after 20s goroutine 24 [running]: runtime.panic(0x5ba840, 0xc208001380) /home/aram/go/src/pkg/runtime/panic.c:250 +0xba testing.func·008() /home/aram/go/src/pkg/testing/testing.go:628 +0xf5 created by time.goFunc /home/aram/go/src/pkg/time/sleep.go:121 +0x47 goroutine 16 [chan receive]: testing.RunTests(0x705a20, 0x7dfe20, 0x79, 0x79, 0x0) /home/aram/go/src/pkg/testing/testing.go:504 +0x906 testing.Main(0x705a20, 0x7dfe20, 0x79, 0x79, 0x7d64c0, ...) /home/aram/go/src/pkg/testing/testing.go:435 +0x9e main.main() /tmp/go-build116241956/net/_test/_testmain.go:321 +0x9c goroutine 22 [IO wait]: net.runtime_pollWait(0xfffffd7fff160718, 0x72, 0x0) /home/aram/go/src/pkg/runtime/netpoll.goc:146 +0x76 net.(*pollDesc).Wait(0xc20802c300, 0x72, 0x0, 0x0) /home/aram/go/src/pkg/net/fd_poll_runtime.go:84 +0x46 net.(*pollDesc).WaitRead(0xc20802c300, 0x0, 0x0) /home/aram/go/src/pkg/net/fd_poll_runtime.go:89 +0x42 net.(*netFD).Read(0xc20802c2a0, 0xc208001360, 0xa, 0xa, 0x0, ...) /home/aram/go/src/pkg/net/fd_unix.go:214 +0x30c net.(*conn).Read(0xc208042068, 0xc208001360, 0xa, 0xa, 0x0, ...) /home/aram/go/src/pkg/net/net.go:122 +0xf2 net.TestShutdown(0xc208054000) /home/aram/go/src/pkg/net/net_test.go:53 +0x507 testing.tRunner(0xc208054000, 0x7e03d8) /home/aram/go/src/pkg/testing/testing.go:422 +0x9d created by testing.RunTests /home/aram/go/src/pkg/testing/testing.go:503 +0x8d6 goroutine 21 [chan send]: net.func·029() /home/aram/go/src/pkg/net/conn_test.go:88 +0x45 net.transponder(0xc208054120, 0xfffffd7fff15f7f8, 0xc208042018, 0xc208030480) /home/aram/go/src/pkg/net/conn_test.go:121 +0x24c created by net.TestConnAndListener /home/aram/go/src/pkg/net/conn_test.go:61 +0x4ef goroutine 23 [IO wait]: net.runtime_pollWait(0xfffffd7fff160668, 0x72, 0x0) /home/aram/go/src/pkg/runtime/netpoll.goc:146 +0x76 net.(*pollDesc).Wait(0xc20802c370, 0x72, 0x0, 0x0) /home/aram/go/src/pkg/net/fd_poll_runtime.go:84 +0x46 net.(*pollDesc).WaitRead(0xc20802c370, 0x0, 0x0) /home/aram/go/src/pkg/net/fd_poll_runtime.go:89 +0x42 net.(*netFD).Read(0xc20802c310, 0xc208001340, 0xa, 0xa, 0x0, ...) /home/aram/go/src/pkg/net/fd_unix.go:214 +0x30c net.(*conn).Read(0xc208042060, 0xc208001340, 0xa, 0xa, 0x0, ...) /home/aram/go/src/pkg/net/net.go:122 +0xf2 net.func·056() /home/aram/go/src/pkg/net/net_test.go:34 +0x1e2 created by net.TestShutdown /home/aram/go/src/pkg/net/net_test.go:40 +0x20e exit status 2 FAIL net 20.035s It needs re-arming after many types of events, not just reads and writes, e.g. concurrent accept.
Sign in to reply to this message.
On 2014/03/02 12:56:22, aram wrote: > > Why do you need to rearm it after polling? > > You arm it before each netpollwait, and netpoll does not care about > > notification if nobody is waiting. So it seems that arm before wait is > > enough. > > It doesn't work otherwise, e.g. (with your proposal): why? > : oos:src; go test -v -short -timeout 20s -run > 'TestShutdown|TestConnAndListener' net > === RUN TestConnAndListener > --- FAIL: TestConnAndListener (10.01 seconds) > conn_test.go:114: Conn.Read failed: read tcp 127.0.0.1:59245: i/o timeout > conn_test.go:80: Conn.Read failed: read tcp 127.0.0.1:52002: i/o timeout > === RUN TestShutdown > panic: test timed out after 20s > > goroutine 24 [running]: > runtime.panic(0x5ba840, 0xc208001380) > /home/aram/go/src/pkg/runtime/panic.c:250 +0xba > testing.func·008() > /home/aram/go/src/pkg/testing/testing.go:628 +0xf5 > created by time.goFunc > /home/aram/go/src/pkg/time/sleep.go:121 +0x47 > > goroutine 16 [chan receive]: > testing.RunTests(0x705a20, 0x7dfe20, 0x79, 0x79, 0x0) > /home/aram/go/src/pkg/testing/testing.go:504 +0x906 > testing.Main(0x705a20, 0x7dfe20, 0x79, 0x79, 0x7d64c0, ...) > /home/aram/go/src/pkg/testing/testing.go:435 +0x9e > main.main() > /tmp/go-build116241956/net/_test/_testmain.go:321 +0x9c > > goroutine 22 [IO wait]: > net.runtime_pollWait(0xfffffd7fff160718, 0x72, 0x0) > /home/aram/go/src/pkg/runtime/netpoll.goc:146 +0x76 > net.(*pollDesc).Wait(0xc20802c300, 0x72, 0x0, 0x0) > /home/aram/go/src/pkg/net/fd_poll_runtime.go:84 +0x46 > net.(*pollDesc).WaitRead(0xc20802c300, 0x0, 0x0) > /home/aram/go/src/pkg/net/fd_poll_runtime.go:89 +0x42 > net.(*netFD).Read(0xc20802c2a0, 0xc208001360, 0xa, 0xa, 0x0, ...) > /home/aram/go/src/pkg/net/fd_unix.go:214 +0x30c > net.(*conn).Read(0xc208042068, 0xc208001360, 0xa, 0xa, 0x0, ...) > /home/aram/go/src/pkg/net/net.go:122 +0xf2 > net.TestShutdown(0xc208054000) > /home/aram/go/src/pkg/net/net_test.go:53 +0x507 > testing.tRunner(0xc208054000, 0x7e03d8) > /home/aram/go/src/pkg/testing/testing.go:422 +0x9d > created by testing.RunTests > /home/aram/go/src/pkg/testing/testing.go:503 +0x8d6 > > goroutine 21 [chan send]: > net.func·029() > /home/aram/go/src/pkg/net/conn_test.go:88 +0x45 > net.transponder(0xc208054120, 0xfffffd7fff15f7f8, 0xc208042018, 0xc208030480) > /home/aram/go/src/pkg/net/conn_test.go:121 +0x24c > created by net.TestConnAndListener > /home/aram/go/src/pkg/net/conn_test.go:61 +0x4ef > > goroutine 23 [IO wait]: > net.runtime_pollWait(0xfffffd7fff160668, 0x72, 0x0) > /home/aram/go/src/pkg/runtime/netpoll.goc:146 +0x76 > net.(*pollDesc).Wait(0xc20802c370, 0x72, 0x0, 0x0) > /home/aram/go/src/pkg/net/fd_poll_runtime.go:84 +0x46 > net.(*pollDesc).WaitRead(0xc20802c370, 0x0, 0x0) > /home/aram/go/src/pkg/net/fd_poll_runtime.go:89 +0x42 > net.(*netFD).Read(0xc20802c310, 0xc208001340, 0xa, 0xa, 0x0, ...) > /home/aram/go/src/pkg/net/fd_unix.go:214 +0x30c > net.(*conn).Read(0xc208042060, 0xc208001340, 0xa, 0xa, 0x0, ...) > /home/aram/go/src/pkg/net/net.go:122 +0xf2 > net.func·056() > /home/aram/go/src/pkg/net/net_test.go:34 +0x1e2 > created by net.TestShutdown > /home/aram/go/src/pkg/net/net_test.go:40 +0x20e > exit status 2 > FAIL net 20.035s > > It needs re-arming after many types of events, not just reads and writes, > e.g. concurrent accept. we don't do accept and read/write on the same descriptor
Sign in to reply to this message.
I do not understand what invariant you want to preserve: 1. fd is always armed for POLLIN/POLLOUT 2. fd is armed only for events that somebody waits for ? You arm fd in netpollopen which suggests 1, but then after poll you re-arm only subset of events which suggests 2. Why do you need to arm both in netpollopen and in pollWait? I can't grasp the big picture.
Sign in to reply to this message.
> Good point, I also don't understand why that can't happen. Well this can't happen because if closing = false, we won't be trying to associate a stale PortDesc. Sure, another thread can race us and eventually get the same file descriptor, but then we will call port_associate with a *new*, valid PortDesc. The point is that by requiring closing = true, we never associate a dead PortDesc. Yes, we can get into a situation where we have two PortDescs with the same fd, one live, one stale but not yet freed. In that case however, we will always call port_associate for the live one.
Sign in to reply to this message.
On Fri, Mar 7, 2014 at 12:48 PM, <aram@mgk.ro> wrote: >> Good point, I also don't understand why that can't happen. > > > Well this can't happen because if closing = false, we won't be trying to > associate a stale PortDesc. Sure, another thread can race us and > eventually get the same file descriptor, but then we will call > port_associate with a *new*, valid PortDesc. > > The point is that by requiring closing = true, we never associate a dead > PortDesc. Yes, we can get into a situation where we have two PortDescs > with the same fd, one live, one stale but not yet freed. In that case > however, we will always call port_associate for the live one. The sequence would be: thread 1: fd.Close calls fd.decref calls fd.destroy calls fd.pd.Close calls runtime_pollClose calls runtime·netpollclose thread 2: preempts thread 1 runtime·netpoll returns from runtime·port_getn with fd's descriptor thread 1: preempts thread 2 runtime·netpollclose calls runtime·port_dissociate back to runtime_pollClose back to fd.pd.Close back to fd.Close back to fd.destroy which calls closesocket Now a new socket is opened with the same descriptor number. thread 2: finally starts running again in runtime·netpolln calls runtime·netpollupdate netpollupdate still has a PollDesc. It has been released by runtime_pollClose but netpollupdate still has a pointer to it. The closing field is set but nothing checks it. calls runtime·port_associate with the old PollDesc I don't see anything that prevents that sequence from happening. I may well be missing something. To me this code seems to be full of race conditions. The epoll code was carefully designed to work without any locks. The epoll code can also be preempted just after epoll_wait returns but before the socket is closed and the PollDesc released. If the PollDesc is reallocated before runtime·netpoll continues, I think there can be a spurious wakeup of a goroutine. But I don't think the consequences are any more severe than that. Ian
Sign in to reply to this message.
Ian Lance Taylor <iant@golang.org> wrote: > ... Thanks for the detailed explanation. > I don't see anything that prevents that sequence from happening. > I may well be missing something. I don't think you are. I believe what you said is correct, it's a real posibility, however unlikely. <dvyukov@google.com> wrote: > Why do you need to arm both in netpollopen and in pollWait? > > I can't grasp the big picture Say we open with POLLIN | POLLOUT, we get POLLIN. At this point, POLLOUT is *also* unarmed. At some point we will re-arm POLLIN again (via runtime·netpollarm), but we need to preserve the POLLOUT otherwise we won't get POLLOUT, ever. We could always arm POLLIN | POLLOUT in runtime·netpollarm, but then there's a window of time where one is not armed while the other is not processed yet, and that's bad. -- Aram Hăvărneanu
Sign in to reply to this message.
I'd like to reference Dmitry's #5 comment in this bug report here: https://code.google.com/p/go/issues/detail?id=6955#c5 Dmitry, care to expand on that note? What's mutex business is missing from our code today? Thanks. -- Aram Hăvărneanu
Sign in to reply to this message.
On 2014/03/08 19:00:17, aram wrote: > I'd like to reference Dmitry's #5 comment in this bug report here: > https://code.google.com/p/go/issues/detail?id=6955#c5 > > Dmitry, care to expand on that note? What's mutex business is missing > from our code today? Nevermind. Maybe I was thinking about associating a context (PollDesc*) with fd, but this is supported by solaris port_getn. I did not have a detailed plan at that time, so let's just start from scratch.
Sign in to reply to this message.
On 2014/03/08 00:01:56, iant wrote: > On Fri, Mar 7, 2014 at 12:48 PM, <mailto:aram@mgk.ro> wrote: > >> Good point, I also don't understand why that can't happen. > > > > > > Well this can't happen because if closing = false, we won't be trying to > > associate a stale PortDesc. Sure, another thread can race us and > > eventually get the same file descriptor, but then we will call > > port_associate with a *new*, valid PortDesc. > > > > The point is that by requiring closing = true, we never associate a dead > > PortDesc. Yes, we can get into a situation where we have two PortDescs > > with the same fd, one live, one stale but not yet freed. In that case > > however, we will always call port_associate for the live one. > > The sequence would be: > > thread 1: > fd.Close calls > fd.decref calls > fd.destroy calls > fd.pd.Close calls > runtime_pollClose calls > runtime·netpollclose > > thread 2: > preempts thread 1 > runtime·netpoll returns from runtime·port_getn with fd's descriptor > > thread 1: > preempts thread 2 > runtime·netpollclose calls > runtime·port_dissociate > back to runtime_pollClose > back to fd.pd.Close > back to fd.Close > back to fd.destroy which calls closesocket > > Now a new socket is opened with the same descriptor number. > > thread 2: > finally starts running again in runtime·netpolln > calls runtime·netpollupdate > netpollupdate still has a PollDesc. > It has been released by runtime_pollClose > but netpollupdate still has a pointer to it. > The closing field is set but nothing checks it. > calls runtime·port_associate with the old PollDesc > > I don't see anything that prevents that sequence from happening. I > may well be missing something. I agree that the racy netpollclosing check does not look like fixing anything (other than reducing probability of bad things). > To me this code seems to be full of race conditions. The epoll code > was carefully designed to work without any locks. > > The epoll code can also be preempted just after epoll_wait returns but > before the socket is closed and the PollDesc released. If the > PollDesc is reallocated before runtime·netpoll continues, I think > there can be a spurious wakeup of a goroutine. But I don't think the > consequences are any more severe than that. > > Ian
Sign in to reply to this message.
On 2014/03/08 12:28:14, aram wrote: > Ian Lance Taylor <mailto:iant@golang.org> wrote: > > ... > > Thanks for the detailed explanation. > > > I don't see anything that prevents that sequence from happening. > > I may well be missing something. > > I don't think you are. I believe what you said is correct, it's a > real posibility, however unlikely. > > <mailto:dvyukov@google.com> wrote: > > Why do you need to arm both in netpollopen and in pollWait? > > > > I can't grasp the big picture > > Say we open with POLLIN | POLLOUT, we get POLLIN. At this point, > POLLOUT is *also* unarmed. At some point we will re-arm POLLIN again > (via runtime·netpollarm), but we need to preserve the POLLOUT > otherwise we won't get POLLOUT, ever. > > We could always arm POLLIN | POLLOUT in runtime·netpollarm, but > then there's a window of time where one is not armed while the other > is not processed yet, and that's bad. I still do not understand why we need to arm it in netpollopen, nobody is waiting at that time.
Sign in to reply to this message.
> I still do not understand why we need to arm it in netpollopen, nobody > is waiting at that time. But it is, see comment #8. -- Aram Hăvărneanu
Sign in to reply to this message.
On Tue, Mar 11, 2014 at 11:23 AM, <dvyukov@google.com> wrote: > Nevermind. Maybe I was thinking about associating a context (PollDesc*) > with fd, but this is supported by solaris port_getn. I did not have a > detailed plan at that time, so let's just start from scratch. Unfortunately, I still think that we need to associate custom context in port_associate, not a PollDesc*. The reason is that we need access to previous poll state, and if we put it in PollDesc and access it without locks, it's racy. I think we can get away with introducing a new type that would need per fd malloc/free. We can just encode previous poll state as the user context in port_associate, it's just a word. Then, to get from fd to PollDesc we use an array. If we put locks around add/delete/associate we can safely access that array. We could get late to the party and a fd could disappear, but then it's safe to ignore EBADFD error because the lock prevents other thread getting the same network fd (it could get the same fd through regular means, but then it's ok, at worse we get a spurious wakeup, just like with epoll). -- Aram Hăvărneanu
Sign in to reply to this message.
https://codereview.appspot.com/69190044/diff/40001/src/pkg/runtime/netpoll_so... File src/pkg/runtime/netpoll_solaris.c (right): https://codereview.appspot.com/69190044/diff/40001/src/pkg/runtime/netpoll_so... src/pkg/runtime/netpoll_solaris.c:105: } while(runtime·cas(ep, old, events) != events); cas return success/fail bool, not the old value even if it would return the old value, this would be wrong https://codereview.appspot.com/69190044/diff/40001/src/pkg/runtime/netpoll_so... src/pkg/runtime/netpoll_solaris.c:181: runtime·netpollupdate(pd, 0, ev->portev_events & (POLLIN|POLLOUT)); if we unblock a reader due to POLLERR, we do not reset the port association this will get netpoll into a busy loop i think you need to use the same conditions as above here
Sign in to reply to this message.
Here is what I would do: 1. remove port_associate from netpollopen 2. mutate the events mask and call port_associate under the PollDesc Lock 3. under the lock we can always check how exactly we need to mutate the OS association and whether the fd is closed or not
Sign in to reply to this message.
On Tue, Mar 11, 2014 at 11:47 AM, <dvyukov@google.com> wrote: > remove port_associate from netpollopen Do you think associating in netpollarm is enough> > mutate the events mask and call port_associate under the PollDesc > Lock But how can I safely lock ev->portev_use in netpoll, the PollDesc might have already been freed, no? -- Aram Hăvărneanu
Sign in to reply to this message.
On Tue, Mar 11, 2014 at 3:30 PM, Aram Hăvărneanu <aram@mgk.ro> wrote: > On Tue, Mar 11, 2014 at 11:47 AM, <dvyukov@google.com> wrote: >> remove port_associate from netpollopen > > Do you think associating in netpollarm is enough> I do not see any need for the association while nobody is waiting for events >> mutate the events mask and call port_associate under the PollDesc >> Lock > > But how can I safely lock ev->portev_use in netpoll, the PollDesc > might have already been freed, no? For exactly this reason I've designed PollDesc's to be persistent and type stable. I.e. if you have a PollDesc* you can dereference it at any point in time, it can be a PollDesc for another fd, but it's still a PollDesc.
Sign in to reply to this message.
> I do not see any need for the association while nobody is waiting for events We have to consume the socket-ready event, it doesn't work without this.
Sign in to reply to this message.
Hello golang-codereviews@googlegroups.com, bradfitz@golang.org, iant@golang.org, dvyukov@google.com (cc: golang-codereviews@googlegroups.com), Please take another look.
Sign in to reply to this message.
On Tue, Mar 11, 2014 at 5:33 PM, <aram@mgk.ro> wrote: >> I do not see any need for the association while nobody is waiting for > > events > > We have to consume the socket-ready event, it doesn't work without this. What exactly is not working if we remove port_associate from netpollopen? And how? I don't know how solaris ports work but for select/epoll/GetQueuedCompletionStatus it's OK to not arm the notification mechanism if you are not interested in events.
Sign in to reply to this message.
In #8 you can see the test timeout panic trace. -- Aram Hăvărneanu
Sign in to reply to this message.
On 2014/03/11 13:37:42, aram2 wrote: > In #8 you can see the test timeout panic trace. #8 was about re-arming after polling I am asking about arming in netpollopen
Sign in to reply to this message.
> #8 was about re-arming after polling > I am asking about arming in netpollopen Here it is with arming in netpollopen disabled: go test -v -timeout 20s -short net === RUN TestConnAndListener --- PASS: TestConnAndListener (0.00 seconds) === RUN TestDialTimeout --- SKIP: TestDialTimeout (0.00 seconds) dial_test.go:88: skipping test on "solaris"; untested. === RUN TestSelfConnect --- PASS: TestSelfConnect (0.62 seconds) === RUN TestDialError --- PASS: TestDialError (0.00 seconds) dial_test.go:213: test disabled; use -run_error_test to enable === RUN TestInvalidDialAndListenArgs --- PASS: TestInvalidDialAndListenArgs (0.00 seconds) === RUN TestDialTimeoutFDLeak --- SKIP: TestDialTimeoutFDLeak (0.00 seconds) dial_test.go:265: skipping test on "solaris" === RUN TestDialMultiFDLeak --- SKIP: TestDialMultiFDLeak (0.00 seconds) dial_test.go:367: skipping test; error finding or running lsof: exec: "lsof": executable file not found in $PATH === RUN TestDialFailPDLeak --- SKIP: TestDialFailPDLeak (0.00 seconds) dial_test.go:432: skipping test in short mode === RUN TestDialer panic: test timed out after 20s goroutine 34 [running]: runtime.panic(0x5c2ca0, 0xc208001fc0) /home/aram/go/src/pkg/runtime/panic.c:279 +0xf5 testing.func·008() /home/aram/go/src/pkg/testing/testing.go:628 +0xf5 created by time.goFunc /home/aram/go/src/pkg/time/sleep.go:121 +0x47 goroutine 16 [chan receive]: testing.RunTests(0x713710, 0x7f1220, 0x7a, 0x7a, 0x1) /home/aram/go/src/pkg/testing/testing.go:504 +0x906 testing.Main(0x713710, 0x7f1220, 0x7a, 0x7a, 0x7e76a0, ...) /home/aram/go/src/pkg/testing/testing.go:435 +0x9e main.main() /tmp/go-build365758353/net/_test/_testmain.go:323 +0x9c goroutine 32 [IO wait]: net.runtime_pollWait(0xfffffd7fff160790, 0x77, 0x0) /home/aram/go/src/pkg/runtime/netpoll.goc:146 +0x76 net.(*pollDesc).Wait(0xc20807b4f0, 0x77, 0x0, 0x0) /home/aram/go/src/pkg/net/fd_poll_runtime.go:84 +0x46 net.(*pollDesc).WaitWrite(0xc20807b4f0, 0x0, 0x0) /home/aram/go/src/pkg/net/fd_poll_runtime.go:93 +0x42 net.(*netFD).connect(0xc20807b490, 0xfffffd7fff15f848, 0xc208041c40, 0xfffffd7fff15f848, 0xc208041c60, ...) /home/aram/go/src/pkg/net/fd_unix.go:96 +0x1ec net.(*netFD).dial(0xc20807b490, 0xfffffd7fff15f7f8, 0xc208029560, 0xfffffd7fff15f7f8, 0xc208029590, ...) /home/aram/go/src/pkg/net/sock_posix.go:121 +0x333 net.socket(0x649d90, 0x4, 0x2, 0x2, 0x0, ...) /home/aram/go/src/pkg/net/sock_posix.go:91 +0x43a net.internetSocket(0x649d90, 0x4, 0xfffffd7fff15f7f8, 0xc208029560, 0xfffffd7fff15f7f8, ...) /home/aram/go/src/pkg/net/ipsock_posix.go:136 +0x179 net.dialTCP(0x649d90, 0x4, 0xc208029560, 0xc208029590, 0x0, ...) /home/aram/go/src/pkg/net/tcpsock_posix.go:155 +0x11c net.dialSingle(0x649d90, 0x4, 0xc208001f80, 0xf, 0xfffffd7fff15f790, ...) /home/aram/go/src/pkg/net/dial.go:238 +0x426 net.func·019(0x0, 0x0, 0x0, 0x0, 0x0, ...) /home/aram/go/src/pkg/net/dial.go:164 +0x120 net.dial(0x649d90, 0x4, 0xfffffd7fff15f790, 0xc208029590, 0xfffffd7ffeb20e28, ...) /home/aram/go/src/pkg/net/fd_unix.go:40 +0x75 net.(*Dialer).Dial(0xc208079c80, 0x649d90, 0x4, 0xc208001f80, 0xf, ...) /home/aram/go/src/pkg/net/dial.go:171 +0x42b net.TestDialer(0xc208054630) /home/aram/go/src/pkg/net/dial_test.go:504 +0x3cf testing.tRunner(0xc208054630, 0x7f12e0) /home/aram/go/src/pkg/testing/testing.go:422 +0x9d created by testing.RunTests /home/aram/go/src/pkg/testing/testing.go:503 +0x8d6 exit status 2 FAIL net 20.037s : oos:runtime; -- Aram Hăvărneanu
Sign in to reply to this message.
But why does it hang? Maybe there is some other latent issue.
Sign in to reply to this message.
R=golang-dev (assigned by bradfitz@golang.org)
Sign in to reply to this message.
On Tue, Mar 11, 2014 at 6:39 PM, <dvyukov@google.com> wrote: > But why does it hang? Because runtime·netpoll is blocked in port_getn, and calling port_associate during this time will not unblock it, it needs to be associated a priori. -- Aram Hăvărneanu
Sign in to reply to this message.
On 2014/03/11 18:02:08, aram wrote: > On Tue, Mar 11, 2014 at 6:39 PM, <mailto:dvyukov@google.com> wrote: > > But why does it hang? > > Because runtime·netpoll is blocked in port_getn, and calling > port_associate during this time will not unblock it, it needs to be > associated a priori. I still do not understand. What if we call port_associate in netpollopen, but there is already a thread blocked on port_getn, port_getn still hangs, right? If we don't forcibly unblock outstanding port_getn when we start waiting of a new file descriptor (which we don't do now), then we can only ensure that we call port_associate before parking the waiting goroutine (but not before port_getn). And this works fine for epoll/kqueue; if it does not work for solaris ports, then you need to forcibly unblock outstanding port_getn's when you start waiting on any new file descriptor. What am I missing?
Sign in to reply to this message.
On Tue, Mar 11, 2014 at 7:12 PM, <dvyukov@google.com> wrote: > What if we call port_associate in netpollopen, but there is already > a thread blocked on port_getn, port_getn still hangs, right? Right. > If we don't forcibly unblock outstanding port_getn when we start > waiting of a new file descriptor (which we don't do now), then > we can only ensure that we call port_associate before parking the > waiting goroutine (but not before port_getn). And this works fine > for epoll/kqueue I don't understand this. Do you mean that we can't guarantee that we will unblock the poller right away? I guess that's true (but see below). > if it does not work for solaris ports, then you need to forcibly > unblock outstanding port_getn's when you start waiting on any > new file descriptor. Maybe, it could be that the code works just by chance, but see below. > What am I missing? A successful connect will always unblock port_getn. We register with the pollwe when we dial (by calling netpollopen), when we do any kind of I/O we won't block indefinitely because we already have this one event to consume. We re-arm *before* the I/O thread has a chance to do I/O, which means that after it will have done its I/O, it will always be registered. Considering the above, do you still think we need to forcibly unblock? -- Aram Hăvărneanu
Sign in to reply to this message.
On 2014/03/11 18:23:42, aram wrote: > On Tue, Mar 11, 2014 at 7:12 PM, <mailto:dvyukov@google.com> wrote: > > What if we call port_associate in netpollopen, but there is already > > a thread blocked on port_getn, port_getn still hangs, right? > > Right. > > > If we don't forcibly unblock outstanding port_getn when we start > > waiting of a new file descriptor (which we don't do now), then > > we can only ensure that we call port_associate before parking the > > waiting goroutine (but not before port_getn). And this works fine > > for epoll/kqueue > > I don't understand this. You've said: "Because runtime·netpoll is blocked in port_getn, and calling port_associate during this time will not unblock it, it needs to be associated a priori." I do not see how calling port_associate in netpollopen helps to overcome this. netpollopen can happen while runtime·netpoll is already blocked in port_getn. > Do you mean that we can't guarantee > that we will unblock the poller right away? I guess that's true > (but see below). > > > if it does not work for solaris ports, then you need to forcibly > > unblock outstanding port_getn's when you start waiting on any > > new file descriptor. > > Maybe, it could be that the code works just by chance, but see below. > > > What am I missing? > > A successful connect will always unblock port_getn. We register > with the pollwe when we dial (by calling netpollopen), when we > do any kind of I/O we won't block indefinitely because we already > have this one event to consume. We re-arm *before* the I/O thread > has a chance to do I/O, which means that after it will have done > its I/O, it will always be registered. > > Considering the above, do you still think we need to forcibly > unblock? Sorry, I still do not understand. Probably I should not be reviewing this change. In runtime·netpoll we don't rearm, we *clear* the association: runtime·netpollupdate(pd, 0, /*clear=*/ ev->portev_events & (POLLIN|POLLOUT));
Sign in to reply to this message.
Could we review the locking bits? If there are any doubts about the design of the network poller I can create an issue, but can we please review the bug at hand and the proposed fix? Thank you very much.
Sign in to reply to this message.
> You've said: "Because runtime·netpoll is blocked in port_getn, and > calling port_associate during this time will not unblock it, it needs > to be associated a priori." > > I do not see how calling port_associate in netpollopen helps to overcome > this. netpollopen can happen while runtime·netpoll is already blocked > in port_getn. What I wanted to say is that the mere act of calling port_associate will not unblock port_getn (even if the descriptor is ready for I/O). If port_getn is blocked, and we call port_associate, and *then* the descriptor gets the I/O ready notification from asynchronous connect/accept completion, port_getn will unblock. If asynchronous connect completes *before* calling port_associate, port_getn will not unblock. > In runtime·netpoll we don't rearm, we *clear* the association: > runtime·netpollupdate(pd, 0, /*clear=*/ ev->portev_events & > (POLLIN|POLLOUT)); We clear an association only with port_dissociate, after return from port_getn a descriptor is still associated, it's just not registered for that event. Because it's still associated, when we re-register for the type of event we care about, we get what we want. -- Aram Hăvărneanu
Sign in to reply to this message.
OK, now I see how it fixes the closing race https://codereview.appspot.com/69190044/diff/70001/src/pkg/runtime/netpoll_so... File src/pkg/runtime/netpoll_solaris.c (right): https://codereview.appspot.com/69190044/diff/70001/src/pkg/runtime/netpoll_so... src/pkg/runtime/netpoll_solaris.c:109: } while(!runtime·cas(ep, old, events)); remove this cas-loop now that it's under the mutex, you can just call port_associate and update the netpolluser word https://codereview.appspot.com/69190044/diff/70001/src/pkg/runtime/netpoll_so... src/pkg/runtime/netpoll_solaris.c:179: if((ev->portev_events & (POLLERR)) != POLLERR) On both linux and windows we ignore some errors on some sockets (e.g. ECONNABORTED from accept). I don't know whether it can happen on solaris. If it can, then it is safer to always re-arm regardless of errors (there must be no notifications of badly broken sockets, so a single re-arm should not cause performance issues).
Sign in to reply to this message.
On 2014/03/11 19:22:21, aram wrote: > Could we review the locking bits? If there are any doubts about the design of > the network poller I can create an issue, but can we please review the bug at > hand and the proposed fix? > > Thank you very much. Done
Sign in to reply to this message.
On 2014/03/11 19:22:21, aram wrote: > Could we review the locking bits? If there are any doubts about the design of > the network poller I can create an issue, but can we please review the bug at > hand and the proposed fix? > > Thank you very much. Done
Sign in to reply to this message.
On 2014/03/11 19:22:35, aram wrote: > > You've said: "Because runtime·netpoll is blocked in port_getn, and > > calling port_associate during this time will not unblock it, it needs > > to be associated a priori." > > > > I do not see how calling port_associate in netpollopen helps to overcome > > this. netpollopen can happen while runtime·netpoll is already blocked > > in port_getn. > > What I wanted to say is that the mere act of calling port_associate > will not unblock port_getn (even if the descriptor is ready > for I/O). If port_getn is blocked, and we call port_associate, > and *then* the descriptor gets the I/O ready notification from > asynchronous connect/accept completion, port_getn will unblock. If > asynchronous connect completes *before* calling port_associate, > port_getn will not unblock. > > > In runtime·netpoll we don't rearm, we *clear* the association: > > runtime·netpollupdate(pd, 0, /*clear=*/ ev->portev_events & > > (POLLIN|POLLOUT)); > > We clear an association only with port_dissociate, after return > from port_getn a descriptor is still associated, it's just not > registered for that event. Because it's still associated, when > we re-register for the type of event we care about, we get what > we want. Ah, OK, now it all makes sense. So there are 2 separate things: (1) association with the port and (2) subscription to events. And they differently affect behavior of port_getn. Right? Please add a comment at the top of netpoll_solaris.c explaining this (in a separate CL if you prefer). This IO readiness mechanism seems to be quite different from all of select, epoll, kqueue and IOCP. It would be nice if there will be an explanation of why we call port_associate in netpollopen, why we need to re-arm in netpollwait and why we need to re-arm in netpoll. One last question: do we need to arm POLLIN|POLLOUT in *netpollopen*? From your description it seems that we need a mere port association, but we don't yet need subscription to any events. Later, when we actually wait on the fd, we will call port_accosiate with the interesting events and receive the pending notifications. Or port_accosiate with an empty event set will fail? Thanks for bearing with me.
Sign in to reply to this message.
> Ah, OK, now it all makes sense. So there are 2 separate things: (1) > association with the port and (2) subscription to events. And they > differently affect behavior of port_getn. Right? Yes. > Please add a comment at the top of netpoll_solaris.c explaining this (in a > separate CL if you prefer). This IO readiness mechanism seems to be quite > different from all of select, epoll, kqueue and IOCP. It would be nice if > there will be an explanation of why we call port_associate in netpollopen, > why we need to re-arm in netpollwait and why we need to re-arm in netpoll. I added a huge comment and smaller comments throughout. > One last question: do we need to arm POLLIN|POLLOUT in *netpollopen*? From > your description it seems that we need a mere port association, but we > don't yet need subscription to any events. Later, when we actually wait > on the fd, we will call port_accosiate with the interesting events and > receive the pending notifications. Or port_accosiate with an empty event > set will fail? Empty port_associate is fine, I have updated code to make empty association in netpollopen, maybe it's clearer this way. > Thanks for bearing with me. Thank you. https://codereview.appspot.com/69190044/diff/70001/src/pkg/runtime/netpoll_so... File src/pkg/runtime/netpoll_solaris.c (right): https://codereview.appspot.com/69190044/diff/70001/src/pkg/runtime/netpoll_so... src/pkg/runtime/netpoll_solaris.c:109: } while(!runtime·cas(ep, old, events)); On 2014/03/12 05:58:43, dvyukov wrote: > remove this cas-loop > now that it's under the mutex, you can just call port_associate and update the > netpolluser word Done. https://codereview.appspot.com/69190044/diff/70001/src/pkg/runtime/netpoll_so... src/pkg/runtime/netpoll_solaris.c:179: if((ev->portev_events & (POLLERR)) != POLLERR) On 2014/03/12 05:58:43, dvyukov wrote: > On both linux and windows we ignore some errors on some sockets (e.g. > ECONNABORTED from accept). I don't know whether it can happen on solaris. If it > can, then it is safer to always re-arm regardless of errors (there must be no > notifications of badly broken sockets, so a single re-arm should not cause > performance issues). Done.
Sign in to reply to this message.
Hello golang-codereviews@googlegroups.com, bradfitz@golang.org, iant@golang.org, dvyukov@google.com, aram.h@mgk.ro, gobot@golang.org (cc: golang-codereviews@googlegroups.com), Please take another look.
Sign in to reply to this message.
On 2014/03/13 12:46:29, aram wrote: > > Ah, OK, now it all makes sense. So there are 2 separate things: (1) > > association with the port and (2) subscription to events. And they > > differently affect behavior of port_getn. Right? > > Yes. > > > Please add a comment at the top of netpoll_solaris.c explaining this (in a > > separate CL if you prefer). This IO readiness mechanism seems to be quite > > different from all of select, epoll, kqueue and IOCP. It would be nice if > > there will be an explanation of why we call port_associate in netpollopen, > > why we need to re-arm in netpollwait and why we need to re-arm in netpoll. > > I added a huge comment and smaller comments throughout. Thank you very much. I now feel that I understand what happens here. > > One last question: do we need to arm POLLIN|POLLOUT in *netpollopen*? From > > your description it seems that we need a mere port association, but we > > don't yet need subscription to any events. Later, when we actually wait > > on the fd, we will call port_accosiate with the interesting events and > > receive the pending notifications. Or port_accosiate with an empty event > > set will fail? > > Empty port_associate is fine, I have updated code to make empty > association in netpollopen, maybe it's clearer this way. > > > Thanks for bearing with me. > > Thank you. > > https://codereview.appspot.com/69190044/diff/70001/src/pkg/runtime/netpoll_so... > File src/pkg/runtime/netpoll_solaris.c (right): > > https://codereview.appspot.com/69190044/diff/70001/src/pkg/runtime/netpoll_so... > src/pkg/runtime/netpoll_solaris.c:109: } while(!runtime·cas(ep, old, events)); > On 2014/03/12 05:58:43, dvyukov wrote: > > remove this cas-loop > > now that it's under the mutex, you can just call port_associate and update the > > netpolluser word > > Done. > > https://codereview.appspot.com/69190044/diff/70001/src/pkg/runtime/netpoll_so... > src/pkg/runtime/netpoll_solaris.c:179: if((ev->portev_events & (POLLERR)) != > POLLERR) > On 2014/03/12 05:58:43, dvyukov wrote: > > On both linux and windows we ignore some errors on some sockets (e.g. > > ECONNABORTED from accept). I don't know whether it can happen on solaris. If > it > > can, then it is safer to always re-arm regardless of errors (there must be no > > notifications of badly broken sockets, so a single re-arm should not cause > > performance issues). > > Done.
Sign in to reply to this message.
https://codereview.appspot.com/69190044/diff/110001/src/pkg/runtime/netpoll_s... File src/pkg/runtime/netpoll_solaris.c (right): https://codereview.appspot.com/69190044/diff/110001/src/pkg/runtime/netpoll_s... src/pkg/runtime/netpoll_solaris.c:156: // Updates the the association with a new set of interested events. After s/the the/the/ https://codereview.appspot.com/69190044/diff/110001/src/pkg/runtime/netpoll_s... src/pkg/runtime/netpoll_solaris.c:238: runtime·netpolllock(pd); please move this just before runtime·netpollupdate, because that's what it protects https://codereview.appspot.com/69190044/diff/110001/src/pkg/runtime/netpoll_s... src/pkg/runtime/netpoll_solaris.c:252: runtime·netpollupdate(pd, 0, ev->portev_events & (POLLIN|POLLOUT)); I think "ev->portev_events & (POLLIN|POLLOUT)" is still incorrect. You unblock readers when "ev->portev_events & (POLLIN|POLLHUP|POLLERR)", and that's exactly when you need to clear the POLLIN association. I think you need something along the lines of: mode = 0; clear = 0; if(ev->portev_events & (POLLIN|POLLHUP|POLLERR)) { mode += 'r'; clear |= POLLIN; } if(ev->portev_events & (POLLOUT|POLLHUP|POLLERR)) { mode += 'w'; clear |= POLLOUT; } if(clear != 0) runtime·netpollupdate(pd, 0, clear); https://codereview.appspot.com/69190044/diff/110001/src/pkg/runtime/netpoll_s... src/pkg/runtime/netpoll_solaris.c:253: remove the tabs
Sign in to reply to this message.
Ian, do you want to take another look? I am going to l-g-t-m it after the minor fixes.
Sign in to reply to this message.
https://codereview.appspot.com/69190044/diff/110001/src/pkg/runtime/netpoll_s... File src/pkg/runtime/netpoll_solaris.c (right): https://codereview.appspot.com/69190044/diff/110001/src/pkg/runtime/netpoll_s... src/pkg/runtime/netpoll_solaris.c:156: // Updates the the association with a new set of interested events. After On 2014/03/14 06:55:44, dvyukov wrote: > s/the the/the/ Done. https://codereview.appspot.com/69190044/diff/110001/src/pkg/runtime/netpoll_s... src/pkg/runtime/netpoll_solaris.c:238: runtime·netpolllock(pd); On 2014/03/14 06:55:44, dvyukov wrote: > please move this just before runtime·netpollupdate, because that's what it > protects Done. https://codereview.appspot.com/69190044/diff/110001/src/pkg/runtime/netpoll_s... src/pkg/runtime/netpoll_solaris.c:252: runtime·netpollupdate(pd, 0, ev->portev_events & (POLLIN|POLLOUT)); On 2014/03/14 06:55:44, dvyukov wrote: > I think "ev->portev_events & (POLLIN|POLLOUT)" is still incorrect. > You unblock readers when "ev->portev_events & (POLLIN|POLLHUP|POLLERR)", and > that's exactly when you need to clear the POLLIN association. > > I think you need something along the lines of: > > mode = 0; > clear = 0; > if(ev->portev_events & (POLLIN|POLLHUP|POLLERR)) { > mode += 'r'; > clear |= POLLIN; > } > if(ev->portev_events & (POLLOUT|POLLHUP|POLLERR)) { > mode += 'w'; > clear |= POLLOUT; > } > if(clear != 0) > runtime·netpollupdate(pd, 0, clear); Done. https://codereview.appspot.com/69190044/diff/110001/src/pkg/runtime/netpoll_s... src/pkg/runtime/netpoll_solaris.c:253: On 2014/03/14 06:55:44, dvyukov wrote: > remove the tabs Done.
Sign in to reply to this message.
Hello golang-codereviews@googlegroups.com, bradfitz@golang.org, iant@golang.org, dvyukov@google.com, aram.h@mgk.ro, gobot@golang.org (cc: golang-codereviews@googlegroups.com), Please take another look.
Sign in to reply to this message.
LGTM let's wait for min(Ian reply, 1 day)
Sign in to reply to this message.
LGTM
Sign in to reply to this message.
*** Submitted as https://code.google.com/p/go/source/detail?r=b6b69b97b4b7 *** runtime: fix use after close race in Solaris network poller The Solaris network poller uses event ports, which are level-triggered. As such, it has to re-arm itself after each wakeup. The arming mechanism (which runs in its own thread) raced with the closing of a file descriptor happening in a different thread. When a network file descriptor is about to be closed, the network poller is awaken to give it a chance to remove its association with the file descriptor. Because the poller always re-armed itself, it raced with code that closed the descriptor. This change makes the network poller check before re-arming if the file descriptor is about to be closed, in which case it will ignore the re-arming request. It uses the per-PollDesc lock in order to serialize access to the PollDesc. This change also adds extensive documentation describing the Solaris implementation of the network poller. Fixes issue 7410. LGTM=dvyukov, iant R=golang-codereviews, bradfitz, iant, dvyukov, aram.h, gobot CC=golang-codereviews https://codereview.appspot.com/69190044 Committer: Dmitriy Vyukov <dvyukov@google.com>
Sign in to reply to this message.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
