|
|
Descriptionnet: implement netpoll for windows
Moves the network poller from net package into runtime.
benchmark old ns/op new ns/op delta
BenchmarkTCP4OneShot 316386 287061 -9.27%
BenchmarkTCP4OneShot-2 339822 313424 -7.77%
BenchmarkTCP4OneShot-3 330057 306589 -7.11%
BenchmarkTCP4OneShotTimeout 341775 287061 -16.01%
BenchmarkTCP4OneShotTimeout-2 380835 295849 -22.32%
BenchmarkTCP4OneShotTimeout-3 398412 328070 -17.66%
BenchmarkTCP4Persistent 40622 33392 -17.80%
BenchmarkTCP4Persistent-2 44528 35736 -19.74%
BenchmarkTCP4Persistent-3 44919 36907 -17.84%
BenchmarkTCP4PersistentTimeout 45309 33588 -25.87%
BenchmarkTCP4PersistentTimeout-2 50289 38079 -24.28%
BenchmarkTCP4PersistentTimeout-3 51559 37103 -28.04%
BenchmarkTCP6OneShot 361305 345645 -4.33%
BenchmarkTCP6OneShot-2 361305 331976 -8.12%
BenchmarkTCP6OneShot-3 376929 347598 -7.78%
BenchmarkTCP6OneShotTimeout 361305 322212 -10.82%
BenchmarkTCP6OneShotTimeout-2 378882 333928 -11.86%
BenchmarkTCP6OneShotTimeout-3 388647 335881 -13.58%
BenchmarkTCP6Persistent 47653 35345 -25.83%
BenchmarkTCP6Persistent-2 49215 35736 -27.39%
BenchmarkTCP6Persistent-3 38474 37493 -2.55%
BenchmarkTCP6PersistentTimeout 56637 34369 -39.32%
BenchmarkTCP6PersistentTimeout-2 42575 38079 -10.56%
BenchmarkTCP6PersistentTimeout-3 44137 37689 -14.61%
Patch Set 1 #Patch Set 2 : diff -r 0c9561ddf631 https://go.googlecode.com/hg/ #
Total comments: 17
Patch Set 3 : diff -r fab6ba2a2d10 https://go.googlecode.com/hg/ #Patch Set 4 : diff -r 2d4825868d95 https://go.googlecode.com/hg/ #Patch Set 5 : diff -r f809e1e845c4 https://go.googlecode.com/hg/ #
Total comments: 14
Patch Set 6 : diff -r 657760a60531 https://go.googlecode.com/hg/ #
Total comments: 2
Patch Set 7 : diff -r 5b2b5f8d89b0 https://go.googlecode.com/hg/ #Patch Set 8 : diff -r 8de6d4c58935 https://go.googlecode.com/hg/ #
Total comments: 21
Patch Set 9 : diff -r d3f99f091748 https://go.googlecode.com/hg/ #
Total comments: 4
Patch Set 10 : diff -r afb7b199f074 https://go.googlecode.com/hg/ #Patch Set 11 : diff -r fccd815ed3bb https://go.googlecode.com/hg/ #Patch Set 12 : diff -r 2bbca155a87f https://go.googlecode.com/hg/ #Patch Set 13 : diff -r 5ee81a14cdfe https://go.googlecode.com/hg/ #
MessagesTotal messages: 82
LGTM (but this is not for submission, right) We can also experiment with GetQueueCompletionStatusEx() and removing memory allocations from Read/Write later. https://codereview.appspot.com/8670044/diff/2001/src/pkg/net/fd_poll_runtime.go File src/pkg/net/fd_poll_runtime.go (right): https://codereview.appspot.com/8670044/diff/2001/src/pkg/net/fd_poll_runtime.... src/pkg/net/fd_poll_runtime.go:107: println("unreachable: ", res) I am not sure what is the policy about adding new println() calls. https://codereview.appspot.com/8670044/diff/2001/src/pkg/net/fd_poll_unix.go File src/pkg/net/fd_poll_unix.go (right): https://codereview.appspot.com/8670044/diff/2001/src/pkg/net/fd_poll_unix.go#... src/pkg/net/fd_poll_unix.go:306: func init() { I thought there are some initialization dependency issues, and that's why sysInit() was introduced. Is not sysInit() needed now? https://codereview.appspot.com/8670044/diff/2001/src/pkg/runtime/netpoll_epoll.c File src/pkg/runtime/netpoll_epoll.c (right): https://codereview.appspot.com/8670044/diff/2001/src/pkg/runtime/netpoll_epol... src/pkg/runtime/netpoll_epoll.c:34: runtime·netpollopen(uintptr fd, PollDesc *pd) I think this should be the first patch - change int32 to uintptr for fd. https://codereview.appspot.com/8670044/diff/2001/src/pkg/runtime/netpoll_wind... File src/pkg/runtime/netpoll_windows.c (right): https://codereview.appspot.com/8670044/diff/2001/src/pkg/runtime/netpoll_wind... src/pkg/runtime/netpoll_windows.c:50: if(runtime·stdcall(runtime·CreateIoCompletionPort, 4, fd, cpiohandle, (uintptr)0, (uintptr)0) == 0) { drop {} https://codereview.appspot.com/8670044/diff/2001/src/pkg/runtime/netpoll_wind... src/pkg/runtime/netpoll_windows.c:83: if(runtime·stdcall(runtime·GetQueuedCompletionStatus, 5, cpiohandle, &qty, &key, &o, (uintptr)wait) == 0) { I think you need a loop here to fetch all pending notifications (or at least a batch). Scheduler will behave better if you give it a batch of newly runnable goroutines. https://codereview.appspot.com/8670044/diff/2001/src/pkg/runtime/netpoll_wind... src/pkg/runtime/netpoll_windows.c:83: if(runtime·stdcall(runtime·GetQueuedCompletionStatus, 5, cpiohandle, &qty, &key, &o, (uintptr)wait) == 0) { I remember you've said that it's difficult to call GetQueuedCompletionStatusEx() here because it's not available on all systems. But at least try to benchmark with it, it will be useful to understand how much we lose here. https://codereview.appspot.com/8670044/diff/2001/src/pkg/runtime/netpoll_wind... src/pkg/runtime/netpoll_windows.c:86: if(!block) { drop {}
Sign in to reply to this message.
https://codereview.appspot.com/8670044/diff/2001/src/pkg/net/fd_windows.go File src/pkg/net/fd_windows.go (right): https://codereview.appspot.com/8670044/diff/2001/src/pkg/net/fd_windows.go#ne... src/pkg/net/fd_windows.go:216: o.fd.pd.WaitCanceled(int(o.mode)) wait. you need the result from WaitCanceled. at this point the operation still can succeed. even if timeout or close happened, GetQueueCompletionStatus() can return success, and we need to communicate it to user.
Sign in to reply to this message.
Hello dvyukov@google.com (cc: golang-dev@googlegroups.com), I'd like you to review this change to https://go.googlecode.com/hg/
Sign in to reply to this message.
Thank you for review. Alex https://codereview.appspot.com/8670044/diff/2001/src/pkg/net/fd_poll_runtime.go File src/pkg/net/fd_poll_runtime.go (right): https://codereview.appspot.com/8670044/diff/2001/src/pkg/net/fd_poll_runtime.... src/pkg/net/fd_poll_runtime.go:107: println("unreachable: ", res) I don't know either. I will wait to be corrected. https://codereview.appspot.com/8670044/diff/2001/src/pkg/net/fd_poll_unix.go File src/pkg/net/fd_poll_unix.go (right): https://codereview.appspot.com/8670044/diff/2001/src/pkg/net/fd_poll_unix.go#... src/pkg/net/fd_poll_unix.go:306: func init() { sysInit is still needed for windows, but you code here should not dependent on it. I also don't want to have 2 copies of sysInit (one in fd_poll_unix.go and another in fd_poll_runtime.go), and I am moving it to fd_unix.go. I don't know when this duplicate code was introduced, but it is not needed. This change should be in a separate CL, and I will send it separately. https://codereview.appspot.com/8670044/diff/2001/src/pkg/net/fd_windows.go File src/pkg/net/fd_windows.go (right): https://codereview.appspot.com/8670044/diff/2001/src/pkg/net/fd_windows.go#ne... src/pkg/net/fd_windows.go:216: o.fd.pd.WaitCanceled(int(o.mode)) I don't care if it succeeds or not, because our previous wait returned errClosing or errTimeout (see code above). I already made my mind on what I will return to the user - errClosing or errTimeout. I am just waiting here for IO to complete so I can release IO buffers. I don't care if completion is success or failure. Perhaps, the function name is no good - I am opened to suggestions. https://codereview.appspot.com/8670044/diff/2001/src/pkg/runtime/netpoll_epoll.c File src/pkg/runtime/netpoll_epoll.c (right): https://codereview.appspot.com/8670044/diff/2001/src/pkg/runtime/netpoll_epol... src/pkg/runtime/netpoll_epoll.c:34: runtime·netpollopen(uintptr fd, PollDesc *pd) Sounds good: https://codereview.appspot.com/9569043/ https://codereview.appspot.com/8670044/diff/2001/src/pkg/runtime/netpoll_wind... File src/pkg/runtime/netpoll_windows.c (right): https://codereview.appspot.com/8670044/diff/2001/src/pkg/runtime/netpoll_wind... src/pkg/runtime/netpoll_windows.c:50: if(runtime·stdcall(runtime·CreateIoCompletionPort, 4, fd, cpiohandle, (uintptr)0, (uintptr)0) == 0) { On 2013/05/19 15:15:48, dvyukov wrote: > drop {} Done. https://codereview.appspot.com/8670044/diff/2001/src/pkg/runtime/netpoll_wind... src/pkg/runtime/netpoll_windows.c:83: if(runtime·stdcall(runtime·GetQueuedCompletionStatus, 5, cpiohandle, &qty, &key, &o, (uintptr)wait) == 0) { Noted. https://codereview.appspot.com/8670044/diff/2001/src/pkg/runtime/netpoll_wind... src/pkg/runtime/netpoll_windows.c:83: if(runtime·stdcall(runtime·GetQueuedCompletionStatus, 5, cpiohandle, &qty, &key, &o, (uintptr)wait) == 0) { Noted. I will try and implement GetQueuedCompletionStatusEx when is is available. But at later stage. https://codereview.appspot.com/8670044/diff/2001/src/pkg/runtime/netpoll_wind... src/pkg/runtime/netpoll_windows.c:86: if(!block) { On 2013/05/19 15:15:48, dvyukov wrote: > drop {} Done.
Sign in to reply to this message.
https://codereview.appspot.com/8670044/diff/2001/src/pkg/net/fd_windows.go File src/pkg/net/fd_windows.go (right): https://codereview.appspot.com/8670044/diff/2001/src/pkg/net/fd_windows.go#ne... src/pkg/net/fd_windows.go:216: o.fd.pd.WaitCanceled(int(o.mode)) On 2013/05/20 02:12:45, brainman wrote: > I don't care if it succeeds or not, because our previous wait returned > errClosing or errTimeout (see code above). I already made my mind on what I will > return to the user - errClosing or errTimeout. I am just waiting here for IO to > complete so I can release IO buffers. I don't care if completion is success or > failure. Perhaps, the function name is no good - I am opened to suggestions. When you are writing/reading to/from network you want to know whether data was sent/received to/from network. It is a bug to return errTimeout/errClosing when data was actually sent/received. Even if timeout has happened and you called CancelIOEx(), you still need to wait for overlapped completion, and if it has succeeded, you need to return OK to user.
Sign in to reply to this message.
On 2013/05/20 04:02:53, dvyukov wrote: > > ... you still need to wait for overlapped completion, and if it has > succeeded, you need to return OK to user. If you say so. I will see what I can do. Alex
Sign in to reply to this message.
On 2013/05/20 05:12:08, brainman wrote: > On 2013/05/20 04:02:53, dvyukov wrote: > > > > ... you still need to wait for overlapped completion, and if it has > > succeeded, you need to return OK to user. > > If you say so. I will see what I can do. I am pretty sure it must work that way (and it works that way now). A simpler option would be to call CancelIOEx() directly in timer routine or in evict routine, then waiting path is not changed at all. But it must be difficult to do w/o CancelIOEx(). So I think you need to make runtime_pollWaitCanceled() return overlapped result (but do not look at current fd status -- closed or timedout).
Sign in to reply to this message.
On 2013/05/20 06:50:05, dvyukov wrote: > > I am pretty sure it must work that way (and it works that way now). I am pretty sure you are right on both counts. But as far as I remember, I couldn't make it work - some tests were failing. I will revisit this again and report my findings. > A simpler option would be to call CancelIOEx() directly in timer routine or in > evict routine, then waiting path is not changed at all. ... Lets say, I decide to chose that option for systems that have CancelIOEx present (most windows systems have it this days). How do I go about implementing it in runtime? Alex
Sign in to reply to this message.
On 2013/05/20 07:45:08, brainman wrote: > On 2013/05/20 06:50:05, dvyukov wrote: > > > > I am pretty sure it must work that way (and it works that way now). > > I am pretty sure you are right on both counts. But as far as I remember, I > couldn't make it work - some tests were failing. I will revisit this again and > report my findings. > > > A simpler option would be to call CancelIOEx() directly in timer routine or in > > evict routine, then waiting path is not changed at all. ... > > Lets say, I decide to chose that option for systems that have CancelIOEx present > (most windows systems have it this days). How do I go about implementing it in > runtime? If you remember read and write anOp or OVERLAPPED pointer in PollDesc, you can issue CancelIOEx() directly in deadlineimpl() and in runtime_pollUnblock(). They both lock the PollDesc's lock, so they should be able to determine outstanding OVERLAPPEDs and issue CancelIOEx() for them.
Sign in to reply to this message.
On 2013/05/20 10:09:55, dvyukov wrote: > On 2013/05/20 07:45:08, brainman wrote: > > On 2013/05/20 06:50:05, dvyukov wrote: > > > > > > I am pretty sure it must work that way (and it works that way now). > > > > I am pretty sure you are right on both counts. But as far as I remember, I > > couldn't make it work - some tests were failing. I will revisit this again > and > > report my findings. > > > > > A simpler option would be to call CancelIOEx() directly in timer routine or > in > > > evict routine, then waiting path is not changed at all. ... > > > > Lets say, I decide to chose that option for systems that have CancelIOEx > present > > (most windows systems have it this days). How do I go about implementing it in > > runtime? > > > If you remember read and write anOp or OVERLAPPED pointer in PollDesc, you can > issue CancelIOEx() directly in deadlineimpl() and in runtime_pollUnblock(). They > both lock the PollDesc's lock, so they should be able to determine outstanding > OVERLAPPEDs and issue CancelIOEx() for them. But how do you want to handle old OSes with only CancelIO()? I think you can not do potentially blocking send in deadlineimpl() (in timer goroutine).
Sign in to reply to this message.
On 2013/05/20 10:11:32, dvyukov wrote: > > But how do you want to handle old OSes with only CancelIO()? > I think you can not do potentially blocking send in deadlineimpl() (in timer > goroutine). I do not think I can without CancelIOEx. We will have to have 2 different code paths, like we do mow. It just a question of how much different. Alex
Sign in to reply to this message.
On 2013/05/20 04:02:53, dvyukov wrote: > > ... Even if timeout has happened and you called > CancelIOEx(), you still need to wait for overlapped completion, and if it has > succeeded, you need to return OK to user. I updated my code. But now this fails: G:\src\pkg\net>net.test.exe -test.run=lines4 --- FAIL: TestVariousDeadlines4Proc (0.03 seconds) timeout_test.go:483: 1ns run 1/3 timeout_test.go:503: for 1ns run 1/3, good client timeout after 15.6251ms, reading 696320 bytes timeout_test.go:513: for 1ns run 1/3: server in 15.6251ms wrote 753664,WSASend tcp 127.0.0.1:4240: An existing connection was forcibly closed by the remote host. timeout_test.go:483: 1ns run 2/3 timeout_test.go:505: for 1ns run 2/3: client Copy = 3694592, <nil> (want timeout) FAIL As far as I can tell this test assumes that io.Copy(ioutil.Discard, c) will return err != nil. But, with this change, we get IO completion (after IO is canelled) with err == 0 and n == 0. This is translated into io.EOF. I cannot decide which part is wrong. Can you? Alex
Sign in to reply to this message.
Hello dvyukov@google.com (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
On Tue, May 21, 2013 at 8:24 AM, <alex.brainman@gmail.com> wrote: > On 2013/05/20 04:02:53, dvyukov wrote: > >> ... Even if timeout has happened and you called >> CancelIOEx(), you still need to wait for overlapped completion, and if > > it has >> >> succeeded, you need to return OK to user. > > > I updated my code. But now this fails: > > G:\src\pkg\net>net.test.exe -test.run=lines4 > --- FAIL: TestVariousDeadlines4Proc (0.03 seconds) > timeout_test.go:483: 1ns run 1/3 > timeout_test.go:503: for 1ns run 1/3, good client timeout after > 15.6251ms, reading 696320 bytes > timeout_test.go:513: for 1ns run 1/3: server in 15.6251ms wrote > 753664,WSASend tcp 127.0.0.1:4240: An existing connection was forcibly > closed by the remote host. > timeout_test.go:483: 1ns run 2/3 > timeout_test.go:505: for 1ns run 2/3: client Copy = 3694592, > <nil> (want timeout) > FAIL > > As far as I can tell this test assumes that io.Copy(ioutil.Discard, c) > will return err != nil. But, with this change, we get IO completion > (after IO is canelled) with err == 0 and n == 0. This is translated into > io.EOF. I cannot decide which part is wrong. Can you? Is there any chance you do 0-length read/write? It is a perfectly valid WSA operation, and I guess it results in err=nil,n=0, and it does not mean EOF.
Sign in to reply to this message.
On 2013/05/21 06:47:30, dvyukov wrote: > > Is there any chance you do 0-length read/write? I inserted debug prints. I do not have any 0-lenght read/writes as far as I can see. Alex
Sign in to reply to this message.
On Tue, May 21, 2013 at 11:02 AM, <alex.brainman@gmail.com> wrote: > On 2013/05/21 06:47:30, dvyukov wrote: > >> Is there any chance you do 0-length read/write? > > > I inserted debug prints. I do not have any 0-lenght read/writes as far > as I can see. And WSAGetOverlappedResult() returns TRUE with *lpcbTransfer==0? Does it happen for reads or writes? It's weird. Maybe it's some upper layers mess the values.
Sign in to reply to this message.
I have been busy lately. And it is more complicated then I expected. But I will continue with it when I have time. Just letting you know. Alex
Sign in to reply to this message.
Dmitriy, The way I see things is this: - ready to read, so net issues netpoll.Prepare, this resets everything, and returns no error, good so far; - start read by issuing WSARecv; returns with ERROR_IO_PENDING, still good; - timeout gets fired, so netpoll state is set to "report timed out when requested", still good; - net calls netpoll.Wait, which returns immediately, since the state is "timed out", still good; - net calls CancelIO to stop submitted IO, still good; - net calls netpoll.WaitCanceled to wait for IO to complete or get canceled, but returns immediately again, since netpoll state is still "timed out" ... wrong; So, I think, my current netpoll.WaitCanceled implementation is no good. I had an idea to add netpoll.Prepare's "reset state" logic to it, but that will break if IO completes (or get canceled) before we call netpoll.WaitCanceled. What do you think I should do? Thank you. Alex
Sign in to reply to this message.
On 2013/06/03 00:18:26, brainman wrote: > Dmitriy, > > The way I see things is this: > > - ready to read, so net issues netpoll.Prepare, this resets everything, and > returns no error, good so far; > - start read by issuing WSARecv; returns with ERROR_IO_PENDING, still good; > - timeout gets fired, so netpoll state is set to "report timed out when > requested", still good; > - net calls netpoll.Wait, which returns immediately, since the state is "timed > out", still good; > - net calls CancelIO to stop submitted IO, still good; > - net calls netpoll.WaitCanceled to wait for IO to complete or get canceled, but > returns immediately again, since netpoll state is still "timed out" ... wrong; > > So, I think, my current netpoll.WaitCanceled implementation is no good. > > I had an idea to add netpoll.Prepare's "reset state" logic to it, but that will > break if IO completes (or get canceled) before we call netpoll.WaitCanceled. Hummm... the first idea is to try the following thing: When a descriptor is closed we set pd.closing, when the descriptor is timedout we set pd.rd/wd. And these are checked in checkerr() before any IO wait. But in both cases we also set pd.rg/wg=READY, which is supposedly unnecessary. So what if you introduce a parameter to netpollunblock() which says "unblock currently waiting goroutines if any, but do not set READY", and use it in close/timeout?
Sign in to reply to this message.
Like that: diff -r 2d4825868d95 src/pkg/runtime/netpoll.goc --- a/src/pkg/runtime/netpoll.goc Mon May 20 15:23:45 2013 +1000 +++ b/src/pkg/runtime/netpoll.goc Mon Jun 03 14:54:29 2013 +1000 @@ -2,7 +2,7 @@ // Use of this source code is governed by a BSD-style // license that can be found in the LICENSE file. -// +build darwin linux +// +build darwin linux windows package net @@ -48,7 +48,7 @@ } pollcache; static void netpollblock(PollDesc*, int32); -static G* netpollunblock(PollDesc*, int32); +static G* netpollunblock(PollDesc*, int32, bool); static void deadline(int64, Eface); static void readDeadline(int64, Eface); static void writeDeadline(int64, Eface); @@ -120,6 +120,12 @@ runtime·unlock(pd); } +func runtime_pollWaitCanceled(pd *PollDesc, mode int) { + runtime·lock(pd); + netpollblock(pd, mode); + runtime·unlock(pd); +} + func runtime_pollSetDeadline(pd *PollDesc, d int64, mode int) { runtime·lock(pd); if(pd->closing) @@ -179,8 +185,8 @@ runtime·throw("runtime_pollUnblock: already closing"); pd->closing = true; pd->seq++; - rg = netpollunblock(pd, 'r'); - wg = netpollunblock(pd, 'w'); + rg = netpollunblock(pd, 'r', false); + wg = netpollunblock(pd, 'w', false); if(pd->rt.fv) { runtime·deltimer(&pd->rt); pd->rt.fv = nil; @@ -205,9 +211,9 @@ rg = wg = nil; runtime·lock(pd); if(mode == 'r' || mode == 'r'+'w') - rg = netpollunblock(pd, 'r'); + rg = netpollunblock(pd, 'r', true); if(mode == 'w' || mode == 'r'+'w') - wg = netpollunblock(pd, 'w'); + wg = netpollunblock(pd, 'w', true); runtime·unlock(pd); if(rg) { rg->schedlink = *gpp; @@ -249,7 +255,7 @@ } static G* -netpollunblock(PollDesc *pd, int32 mode) +netpollunblock(PollDesc *pd, int32 mode, bool setready) { G **gpp, *old; @@ -259,7 +265,8 @@ if(*gpp == READY) return nil; if(*gpp == nil) { - *gpp = READY; + if(setready) + *gpp = READY; return nil; } old = *gpp; @@ -291,14 +298,14 @@ runtime·throw("deadlineimpl: inconsistent read deadline"); pd->rd = -1; pd->rt.fv = nil; - rg = netpollunblock(pd, 'r'); + rg = netpollunblock(pd, 'r', false); } if(write) { if(pd->wd <= 0 || (pd->wt.fv == nil && !read)) runtime·throw("deadlineimpl: inconsistent write deadline"); pd->wd = -1; pd->wt.fv = nil; - wg = netpollunblock(pd, 'w'); + wg = netpollunblock(pd, 'w', false); } runtime·unlock(pd); if(rg) ? But, then I can see both IO completion and timeout gets fired during initial wait, then second wait (after cancelation) gets no events and test timesout. Alex
Sign in to reply to this message.
On 2013/06/03 04:57:05, brainman wrote: > Like that: > > diff -r 2d4825868d95 src/pkg/runtime/netpoll.goc > --- a/src/pkg/runtime/netpoll.goc Mon May 20 15:23:45 2013 +1000 > +++ b/src/pkg/runtime/netpoll.goc Mon Jun 03 14:54:29 2013 +1000 > @@ -2,7 +2,7 @@ > // Use of this source code is governed by a BSD-style > // license that can be found in the LICENSE file. > > -// +build darwin linux > +// +build darwin linux windows > > package net > > @@ -48,7 +48,7 @@ > } pollcache; > > static void netpollblock(PollDesc*, int32); > -static G* netpollunblock(PollDesc*, int32); > +static G* netpollunblock(PollDesc*, int32, bool); > static void deadline(int64, Eface); > static void readDeadline(int64, Eface); > static void writeDeadline(int64, Eface); > @@ -120,6 +120,12 @@ > runtime·unlock(pd); > } > > +func runtime_pollWaitCanceled(pd *PollDesc, mode int) { > + runtime·lock(pd); > + netpollblock(pd, mode); > + runtime·unlock(pd); > +} > + > func runtime_pollSetDeadline(pd *PollDesc, d int64, mode int) { > runtime·lock(pd); > if(pd->closing) > @@ -179,8 +185,8 @@ > runtime·throw("runtime_pollUnblock: already closing"); > pd->closing = true; > pd->seq++; > - rg = netpollunblock(pd, 'r'); > - wg = netpollunblock(pd, 'w'); > + rg = netpollunblock(pd, 'r', false); > + wg = netpollunblock(pd, 'w', false); > if(pd->rt.fv) { > runtime·deltimer(&pd->rt); > pd->rt.fv = nil; > @@ -205,9 +211,9 @@ > rg = wg = nil; > runtime·lock(pd); > if(mode == 'r' || mode == 'r'+'w') > - rg = netpollunblock(pd, 'r'); > + rg = netpollunblock(pd, 'r', true); > if(mode == 'w' || mode == 'r'+'w') > - wg = netpollunblock(pd, 'w'); > + wg = netpollunblock(pd, 'w', true); > runtime·unlock(pd); > if(rg) { > rg->schedlink = *gpp; > @@ -249,7 +255,7 @@ > } > > static G* > -netpollunblock(PollDesc *pd, int32 mode) > +netpollunblock(PollDesc *pd, int32 mode, bool setready) > { > G **gpp, *old; > > @@ -259,7 +265,8 @@ > if(*gpp == READY) > return nil; > if(*gpp == nil) { > - *gpp = READY; > + if(setready) > + *gpp = READY; > return nil; > } > old = *gpp; > @@ -291,14 +298,14 @@ > runtime·throw("deadlineimpl: inconsistent read deadline"); > pd->rd = -1; > pd->rt.fv = nil; > - rg = netpollunblock(pd, 'r'); > + rg = netpollunblock(pd, 'r', false); > } > if(write) { > if(pd->wd <= 0 || (pd->wt.fv == nil && !read)) > runtime·throw("deadlineimpl: inconsistent write deadline"); > pd->wd = -1; > pd->wt.fv = nil; > - wg = netpollunblock(pd, 'w'); > + wg = netpollunblock(pd, 'w', false); > } > runtime·unlock(pd); > if(rg) > > ? But, then I can see both IO completion and timeout gets fired during initial > wait, then second wait (after cancelation) gets no events and test timesout. netpollunblock() is executed under the descriptor lock. If IO completion unblocks the wait, then goroutine will not wait second time. It will have the final result -- success or failure. If cancel/timeout unblocks the wait, then IO completion will set rg/wg to READY. And the second wait will instantly return. There may be other caveats or course. Reasoning about asynchronous concurrent logic is tricky.
Sign in to reply to this message.
> > netpollunblock() is executed under the descriptor lock. But, I think, it is possible to have 2 events fired (IO completion and timeout) while netpollblock waits for something to happened. In between runtime.park and runtime.lock. Isn't it? Then reported result gets skewed. Alex
Sign in to reply to this message.
On Mon, Jun 3, 2013 at 11:19 AM, <alex.brainman@gmail.com> wrote: > >> netpollunblock() is executed under the descriptor lock. > > > But, I think, it is possible to have 2 events fired (IO completion and > timeout) while netpollblock waits for something to happened. In between > runtime.park and runtime.lock. Isn't it? > > Then reported result gets skewed. I do not understand the exact failure scenario. Please describe it in more detail.
Sign in to reply to this message.
On 2013/06/03 08:17:15, dvyukov wrote: > > I do not understand the exact failure scenario. Please describe it in > more detail. I see new IO submitted, then the goroutine goes into netpoll.Wait and blocks there. Then I can see both IO completion and timeout gets fired and original netpoll.Wait returns with error=2 (timeout). I can also see result of IO completion (error=0 and qty=8192). Since netpoll.Wait returns error=2, the goroutine continues into netpoll.WaitCancelled and never returns from there (I can see it calls runtime.park there). The test fails with "timeout error". Alex
Sign in to reply to this message.
On 2013/06/03 09:26:06, brainman wrote: > On 2013/06/03 08:17:15, dvyukov wrote: > > > > I do not understand the exact failure scenario. Please describe it in > > more detail. > > I see new IO submitted, then the goroutine goes into netpoll.Wait and blocks > there. Then I can see both IO completion and timeout gets fired and original > netpoll.Wait returns with error=2 (timeout). I can also see result of IO > completion (error=0 and qty=8192). Since netpoll.Wait returns error=2, the > goroutine continues into netpoll.WaitCancelled and never returns from there (I > can see it calls runtime.park there). > > The test fails with "timeout error". I see the problem. The problem is that wait is preempted by IO completion notification, but then rechecks state in checkerr() and returns timeout/cancel, right? The epoll/kqueue poller imposes quite weak requirements on the notifications, it's only important to not lose a notification and gets deadlocked. IOCP requires much more strict notifications. I think you need to make netpollblock() return whether it was preempted by IO completion or timeout/cancel. Then, if it's IO completion, do not call checkerr(). I.e. add the setready flag to netpollunblock: netpollunblock(PollDesc *pd, int32 mode, bool setready) add bool return type to netpollblock: bool netpollblock(PollDesc *pd, int32 mode); Then change runtime_pollWait as: func runtime_pollWait(pd *PollDesc, mode int) (err int) { runtime·lock(pd); err = checkerr(pd, mode); if(err == 0) { if(!netpollblock(pd, mode)) err = checkerr(pd, mode); } runtime·unlock(pd); } If netpollblock() sees READY, then it returns true (got IO notification). If it blocks, then you need to communicate the return value from netpollunblock(). You can do it by storing the value in gp->param.
Sign in to reply to this message.
> I see the problem. > The problem is that wait is preempted by IO completion notification, but then > rechecks state in checkerr() and returns timeout/cancel, right? Correct. > ... Sounds like a plan. I will have a go. Alex
Sign in to reply to this message.
On 2013/06/03 10:37:19, dvyukov wrote: > ... > I think you need to make ... Done. All test pass now on windows/386, windows/amd64 and linux/386. Can you test it on darwin, please? Because if it breaks, I have no idea what is going on. Thank you. PTAL. Alex
Sign in to reply to this message.
Hello dvyukov@google.com (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
On 2013/06/04 04:30:25, brainman wrote: > On 2013/06/03 10:37:19, dvyukov wrote: > > ... > > I think you need to make ... > > Done. All test pass now on windows/386, windows/amd64 and linux/386. Can you > test it on darwin, please? Because if it breaks, I have no idea what is going > on. Thank you. $ GOARCH=386 ./all.bash && ./all.bash passes on darwin
Sign in to reply to this message.
https://codereview.appspot.com/8670044/diff/44001/src/pkg/net/fd_windows.go File src/pkg/net/fd_windows.go (right): https://codereview.appspot.com/8670044/diff/44001/src/pkg/net/fd_windows.go#n... src/pkg/net/fd_windows.go:224: return int(o.qty), nil add a comment explaining that we've issued the cancellation request, but the operation succeeded before, so we need to treat it as succeeded (the bytes are actually sent/recv from network) https://codereview.appspot.com/8670044/diff/44001/src/pkg/runtime/netpoll.goc File src/pkg/runtime/netpoll.goc (right): https://codereview.appspot.com/8670044/diff/44001/src/pkg/runtime/netpoll.goc... src/pkg/runtime/netpoll.goc:117: err = checkerr(pd, mode); check that err != 0 it must be, right? because we are unblocked not by ioready event if the check does not fire, then, great, we have another invariant that we can rely on in this complex logic. otherwise there is something we do not understand in the code we wrote :) https://codereview.appspot.com/8670044/diff/44001/src/pkg/runtime/netpoll.goc... src/pkg/runtime/netpoll.goc:213: rg = netpollunblock(pd, 'r', true); This subtly changes behaviour of kqueue/epoll. If they get concurrent io ready and timeout/cancel (or all 3 at once :) ), they will retry read/write syscall, and it will most likely succeed. While currently they return timeout/cancel. Difficult to say what is better. Since it's a race, both behaviors are legal I think. For cancel you probably want to cancel as soon as possible (rather than accept and handle one more tcp conn). For timeout you probably want to give read() the last chance. I think the simplest way to revert to original behavior is to make runtime_pollWait() call checkerr() unconditionally second time for epoll/kqueue pollers. I am not sure whether it's worth doing, it looks close to insignificant. Leave it as is and please ask Brad. https://codereview.appspot.com/8670044/diff/44001/src/pkg/runtime/netpoll.goc... src/pkg/runtime/netpoll.goc:261: netpollunblock(PollDesc *pd, int32 mode, bool setready) probably call it 'ioready' to be consistent with return value of netpollblock() netpollblock() returns whether IO is ready or not, which is determined by ioready flag passed to netpollunblock() https://codereview.appspot.com/8670044/diff/44001/src/pkg/runtime/netpoll.goc... src/pkg/runtime/netpoll.goc:272: if(setready) explain why we do not set READY when !ioready (runtime_pollWait checks for timeout/cancel before waiting)
Sign in to reply to this message.
The high-level algorithm looks good. I actually like the way you changed netpoll.goc. I think netpoll.goc and fd_poll_runtime.go changes should go separately, and then the windows-specific changes. $ GOMAXPROCS=4 GOARCH=386 ./all.bash && GOMAXPROCS=4 ./all.bash also passed on darwin https://codereview.appspot.com/8670044/diff/44001/src/pkg/runtime/netpoll.goc File src/pkg/runtime/netpoll.goc (right): https://codereview.appspot.com/8670044/diff/44001/src/pkg/runtime/netpoll.goc... src/pkg/runtime/netpoll.goc:213: rg = netpollunblock(pd, 'r', true); On 2013/06/04 09:47:34, dvyukov wrote: > This subtly changes behaviour of kqueue/epoll. > If they get concurrent io ready and timeout/cancel (or all 3 at once :) ), they > will retry read/write syscall, and it will most likely succeed. While currently > they return timeout/cancel. > Difficult to say what is better. Since it's a race, both behaviors are legal I > think. > For cancel you probably want to cancel as soon as possible (rather than accept > and handle one more tcp conn). > For timeout you probably want to give read() the last chance. > > I think the simplest way to revert to original behavior is to make > runtime_pollWait() call checkerr() unconditionally second time for epoll/kqueue > pollers. > I am not sure whether it's worth doing, it looks close to insignificant. > Leave it as is and please ask Brad. On second though, your code is fine. The concurrent ioready/cancel/timeout events are ordered by the descriptor mutex, and the code respects that order. E.g. if we get ioready followed by timeout, goroutine will do IO. If we get timeout followed by ioready, goroutine will return timeout.
Sign in to reply to this message.
https://codereview.appspot.com/8670044/diff/44001/src/pkg/runtime/netpoll_win... File src/pkg/runtime/netpoll_windows.c (right): https://codereview.appspot.com/8670044/diff/44001/src/pkg/runtime/netpoll_win... src/pkg/runtime/netpoll_windows.c:85: // TODO(brainman): Call GetQueuedCompletionStatusEx() here when possible. This is unrelated to this change, but I am eager to see how GetQueuedCompletionStatusEx() affects performance. Once this is submitted, you can patch it locally to use GetQueuedCompletionStatusEx() with a batch size 64 or something and run the net tcp benchmarks.
Sign in to reply to this message.
And later you may also look at removing memory allocation from Read/Write (by embedding some structs into netFD). Reads/Writes oven an established tcp connection should trigger basically no allocations. And I think that will be very cool, because it moves it closer to the raw C performance.
Sign in to reply to this message.
On 2013/06/04 09:56:42, dvyukov wrote: > This is unrelated to this change, but I am eager to see how > GetQueuedCompletionStatusEx() affects performance. Once this is submitted, you > can patch it locally to use GetQueuedCompletionStatusEx() with a batch size 64 > or something and run the net tcp benchmarks. I am snowed over at the moment. It is on my to do list. I will try as soon as this is submitted. I don't see significant speed up with this change, but I noticed that net benchmarks are more consistent with this change. So I am generally happy to go ahead with this change. Alex
Sign in to reply to this message.
On 2013/06/04 10:08:10, dvyukov wrote: > And later you may also look at removing memory allocation from Read/Write (by > embedding some structs into netFD). Reads/Writes oven an established tcp > connection should trigger basically no allocations. And I think that will be > very cool, because it moves it closer to the raw C performance. On my to do list. In fact I have done it before, but then I got stuck with timers allocating memory. So I ditched memory allocation improvements in favor of doing this. I will get it in once this is submitted. Alex
Sign in to reply to this message.
Hello dvyukov@google.com (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
Thank you for review. Alex https://codereview.appspot.com/8670044/diff/44001/src/pkg/net/fd_windows.go File src/pkg/net/fd_windows.go (right): https://codereview.appspot.com/8670044/diff/44001/src/pkg/net/fd_windows.go#n... src/pkg/net/fd_windows.go:224: return int(o.qty), nil On 2013/06/04 09:47:34, dvyukov wrote: > add a comment explaining that we've issued the cancellation request, but the > operation succeeded before, so we need to treat it as succeeded (the bytes are > actually sent/recv from network) Done. https://codereview.appspot.com/8670044/diff/44001/src/pkg/runtime/netpoll.goc File src/pkg/runtime/netpoll.goc (right): https://codereview.appspot.com/8670044/diff/44001/src/pkg/runtime/netpoll.goc... src/pkg/runtime/netpoll.goc:117: err = checkerr(pd, mode); On 2013/06/04 09:47:34, dvyukov wrote: > check that err != 0 > it must be, right? because we are unblocked not by ioready event > if the check does not fire, then, great, we have another invariant that we can > rely on in this complex logic. otherwise there is something we do not understand > in the code we wrote :) Done. https://codereview.appspot.com/8670044/diff/44001/src/pkg/runtime/netpoll.goc... src/pkg/runtime/netpoll.goc:213: rg = netpollunblock(pd, 'r', true); Left as is. https://codereview.appspot.com/8670044/diff/44001/src/pkg/runtime/netpoll.goc... src/pkg/runtime/netpoll.goc:213: rg = netpollunblock(pd, 'r', true); Left as is. https://codereview.appspot.com/8670044/diff/44001/src/pkg/runtime/netpoll.goc... src/pkg/runtime/netpoll.goc:261: netpollunblock(PollDesc *pd, int32 mode, bool setready) On 2013/06/04 09:47:34, dvyukov wrote: > probably call it 'ioready' to be consistent with return value of netpollblock() > netpollblock() returns whether IO is ready or not, which is determined by > ioready flag passed to netpollunblock() Done. https://codereview.appspot.com/8670044/diff/44001/src/pkg/runtime/netpoll.goc... src/pkg/runtime/netpoll.goc:272: if(setready) On 2013/06/04 09:47:34, dvyukov wrote: > explain why we do not set READY when !ioready (runtime_pollWait checks for > timeout/cancel before waiting) Done. https://codereview.appspot.com/8670044/diff/44001/src/pkg/runtime/netpoll_win... File src/pkg/runtime/netpoll_windows.c (right): https://codereview.appspot.com/8670044/diff/44001/src/pkg/runtime/netpoll_win... src/pkg/runtime/netpoll_windows.c:85: // TODO(brainman): Call GetQueuedCompletionStatusEx() here when possible. Will do.
Sign in to reply to this message.
ping
Sign in to reply to this message.
On 2013/06/20 23:04:55, brainman wrote: > ping I thought we are waiting for you: > I think netpoll.goc and fd_poll_runtime.go changes should go separately, and > then the windows-specific changes. Probably it can be split even more. Other people need to look at these smaller changes as well.
Sign in to reply to this message.
On 2013/06/21 15:09:42, dvyukov wrote: > On 2013/06/20 23:04:55, brainman wrote: > > ping > > I thought we are waiting for you: Oops :-) > > I think netpoll.goc and fd_poll_runtime.go changes should go separately, and > > then the windows-specific changes. > > Probably it can be split even more. Other people need to look at these smaller > changes as well. Are you sure? You want me to copy netpoll.goc and fd_poll_runtime.go into new CL and send them separately? Why? You won't be able to see the reason for netpoll.goc and fd_poll_runtime.go changes. Sure, I will do it, if you insist. Just let me know. Alex
Sign in to reply to this message.
On 2013/06/23 10:34:07, brainman wrote: > On 2013/06/21 15:09:42, dvyukov wrote: > > On 2013/06/20 23:04:55, brainman wrote: > > > ping > > > > I thought we are waiting for you: > > Oops :-) > > > > I think netpoll.goc and fd_poll_runtime.go changes should go separately, and > > > then the windows-specific changes. > > > > Probably it can be split even more. Other people need to look at these smaller > > changes as well. > > Are you sure? You want me to copy netpoll.goc and fd_poll_runtime.go into new CL > and send them separately? Why? You won't be able to see the reason for > netpoll.goc and fd_poll_runtime.go changes. > > Sure, I will do it, if you insist. Just let me know. This CL changes the way runtime netpoll works and adds lots of windows-specific code. This is inconvenient for review/testing/rollback. E.g. people interested only in bsd systems won't even look at "net: implement netpoll for windows". Please submit the platform-independent part separately. Regarding the reason for change, you may reference this patch and say that windows requires to know whether notification is ioready or timeout/closed.
Sign in to reply to this message.
https://codereview.appspot.com/8670044/diff/57001/src/pkg/runtime/netpoll.goc File src/pkg/runtime/netpoll.goc (right): https://codereview.appspot.com/8670044/diff/57001/src/pkg/runtime/netpoll.goc... src/pkg/runtime/netpoll.goc:127: netpollblock(pd, mode); we expect that this returns only ioready, right? please add such assertion can't it be unblocked by runtime_pollUnblock? I suspect it can be, if we get timeout, unblock, issue CancelIOEx, wait for cancellation, at this point the fd is closed and we are unblocked by close; so we still don't know the result of the IO operation (succeeded/canceled)
Sign in to reply to this message.
I also sent new CL https://codereview.appspot.com/10485043/ as per your request. Alex https://codereview.appspot.com/8670044/diff/57001/src/pkg/runtime/netpoll.goc File src/pkg/runtime/netpoll.goc (right): https://codereview.appspot.com/8670044/diff/57001/src/pkg/runtime/netpoll.goc... src/pkg/runtime/netpoll.goc:127: netpollblock(pd, mode); On 2013/06/23 11:09:44, dvyukov wrote: > we expect that this returns only ioready, right? > please add such assertion > can't it be unblocked by runtime_pollUnblock? I suspect it can be, if we get > timeout, unblock, issue CancelIOEx, wait for cancellation, at this point the fd > is closed and we are unblocked by close; so we still don't know the result of > the IO operation (succeeded/canceled) Done.
Sign in to reply to this message.
Hello dvyukov@google.com (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
On 2013/06/24 04:37:54, brainman wrote: > Hello mailto:dvyukov@google.com (cc: mailto:golang-dev@googlegroups.com), > > Please take another look. The fix looks good. Thanks for splitting this CL.
Sign in to reply to this message.
Hello dvyukov@google.com (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
Updated to the tip. Waiting for instructions. Alex
Sign in to reply to this message.
https://codereview.appspot.com/8670044/diff/81001/src/pkg/net/fd_windows.go File src/pkg/net/fd_windows.go (right): https://codereview.appspot.com/8670044/diff/81001/src/pkg/net/fd_windows.go#n... src/pkg/net/fd_windows.go:96: var i int guard the following lines by !canCancelIO, we really don't want to allocate these chans on modern OSes and probably regroup it as: o.fd = fd o.mode = mode o.runtimeCtx = fd.pd.runtimeCtx if !canCancelIO { everything else } https://codereview.appspot.com/8670044/diff/81001/src/pkg/net/fd_windows.go#n... src/pkg/net/fd_windows.go:123: o.buf.Len = uint32(len(buf)) Is it a known issue that windows impl does not support >2GB net operations? I think we need to file an issue. https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll.goc File src/pkg/runtime/netpoll.goc (right): https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll.goc... src/pkg/runtime/netpoll.goc:273: if(*gpp == READY) { drop {} https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll_win... File src/pkg/runtime/netpoll_windows.c (right): https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll_win... src/pkg/runtime/netpoll_windows.c:17: #pragma pack on hummm... how does this work? net_anOp matches net.anOp, but the latter does not have any special packing. So pragma pack is either unnecessary here, or do the wrong thing... https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll_win... src/pkg/runtime/netpoll_windows.c:21: struct net_anOp { { on the next line https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll_win... src/pkg/runtime/netpoll_windows.c:23: Overlapped o; ident 'o' with tab https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll_win... src/pkg/runtime/netpoll_windows.c:24: // my extras I think a better comment would be along the lines of "used by netpoll" https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll_win... src/pkg/runtime/netpoll_windows.c:33: static uintptr cpiohandle = INVALID_HANDLE_VALUE; // completion port io handle it's usually abbreviated as iocp (IO Completion Port), cpio means something else https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll_win... src/pkg/runtime/netpoll_windows.c:38: cpiohandle = (uintptr)runtime·stdcall(runtime·CreateIoCompletionPort, 4, INVALID_HANDLE_VALUE, (uintptr)0, (uintptr)0, (uintptr)0); I think it's better to pass DWORD_MAX as max number of concurrent threads Windows does not understand what happens here and we do not have dedicated IO threads, so this model won't work. A worker thread can poll IOCP and then go into cgo. It will be a disaster if windows will decide to throttle Go scheduler worker threads. https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll_win... src/pkg/runtime/netpoll_windows.c:101: pd = o->runtimeCtx; pd refers to PollDesc* in netpoll-related files please either (1) make pd PollDesc* or better just: runtime·netpollready(&gp, (void*)o->runtimeCtx, mode);
Sign in to reply to this message.
On 2013/06/25 04:43:00, brainman wrote: > Updated to the tip. Waiting for instructions. I think we are very close to landing this.
Sign in to reply to this message.
Hello dvyukov@google.com (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
https://codereview.appspot.com/8670044/diff/81001/src/pkg/net/fd_windows.go File src/pkg/net/fd_windows.go (right): https://codereview.appspot.com/8670044/diff/81001/src/pkg/net/fd_windows.go#n... src/pkg/net/fd_windows.go:96: var i int On 2013/06/25 12:00:32, dvyukov wrote: > guard the following lines by !canCancelIO, we really don't want to allocate > these chans on modern OSes > and probably regroup it as: > > o.fd = fd > o.mode = mode > o.runtimeCtx = fd.pd.runtimeCtx > if !canCancelIO { > everything else > } Done. https://codereview.appspot.com/8670044/diff/81001/src/pkg/net/fd_windows.go#n... src/pkg/net/fd_windows.go:123: o.buf.Len = uint32(len(buf)) On 2013/06/25 12:00:32, dvyukov wrote: > Is it a known issue that windows impl does not support >2GB net operations? I > think we need to file an issue. No, no one complained about that. Feel free to create an issue, if you like. https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll.goc File src/pkg/runtime/netpoll.goc (right): https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll.goc... src/pkg/runtime/netpoll.goc:273: if(*gpp == READY) { On 2013/06/25 12:00:32, dvyukov wrote: > drop {} Done. https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll_win... File src/pkg/runtime/netpoll_windows.c (right): https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll_win... src/pkg/runtime/netpoll_windows.c:17: #pragma pack on On 2013/06/25 12:00:32, dvyukov wrote: > ... So pragma pack is either unnecessary here, or do the > wrong thing... Removed. https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll_win... src/pkg/runtime/netpoll_windows.c:21: struct net_anOp { On 2013/06/25 12:00:32, dvyukov wrote: > { on the next line Done. https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll_win... src/pkg/runtime/netpoll_windows.c:23: Overlapped o; On 2013/06/25 12:00:32, dvyukov wrote: > ident 'o' with tab Done. https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll_win... src/pkg/runtime/netpoll_windows.c:24: // my extras On 2013/06/25 12:00:32, dvyukov wrote: > I think a better comment would be along the lines of "used by netpoll" I agree. Changed. https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll_win... src/pkg/runtime/netpoll_windows.c:33: static uintptr cpiohandle = INVALID_HANDLE_VALUE; // completion port io handle On 2013/06/25 12:00:32, dvyukov wrote: > it's usually abbreviated as iocp (IO Completion Port), cpio means something else Renamed. https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll_win... src/pkg/runtime/netpoll_windows.c:38: cpiohandle = (uintptr)runtime·stdcall(runtime·CreateIoCompletionPort, 4, INVALID_HANDLE_VALUE, (uintptr)0, (uintptr)0, (uintptr)0); On 2013/06/25 12:00:32, dvyukov wrote: > I think it's better to pass DWORD_MAX as max number of concurrent threads > Windows does not understand what happens here and we do not have dedicated IO > threads, so this model won't work. A worker thread can poll IOCP and then go > into cgo. It will be a disaster if windows will decide to throttle Go scheduler > worker threads. I have been thinking long and hard about this value. I did a lot of testing too. From my tests, I can only see that setting affect number of threads allowed to proceed into GetQueuedCompletionStatus. From what information I can gather, this model is designed around concept that all threads are blocked inside of GetQueuedCompletionStatus. Designers wanted to mimimize thread switching, so waiting threads are managed in FIFO order in the hope that thread that proceeds with IO will return back into GetQueuedCompletionStatus as soon as possible. Giving everything I know, I thought that 0 for that value (is translated into CPU number under covers) is most suitable. http://msdn.microsoft.com/en-us/library/windows/desktop/aa365198%28v=vs.85%29... and http://int64.org/2009/05/13/high-performance-io-on-windows/ are some links on the subject. Also the fact that we are effectively running in 2 different modes here (controlled by block parameter in runtime.netpoll) makes things even more complicated. Leaving it as is. But will change to whatever you like, if you insist. Do not forget, we could always change it latter, once we gather more knowledge on the matter. https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll_win... src/pkg/runtime/netpoll_windows.c:101: pd = o->runtimeCtx; On 2013/06/25 12:00:32, dvyukov wrote: > pd refers to PollDesc* in netpoll-related files > please either (1) make pd PollDesc* or better just: > runtime·netpollready(&gp, (void*)o->runtimeCtx, mode); Done.
Sign in to reply to this message.
LGTM I think you can submit now (after addressing the comments), it is on review for enough time. Hope you've done stress testing with GOARCH=386/amd64 GOMAXPROCS=1/2/4/8. https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll_win... File src/pkg/runtime/netpoll_windows.c (right): https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll_win... src/pkg/runtime/netpoll_windows.c:38: cpiohandle = (uintptr)runtime·stdcall(runtime·CreateIoCompletionPort, 4, INVALID_HANDLE_VALUE, (uintptr)0, (uintptr)0, (uintptr)0); On 2013/06/26 07:18:05, brainman wrote: > On 2013/06/25 12:00:32, dvyukov wrote: > > I think it's better to pass DWORD_MAX as max number of concurrent threads > > Windows does not understand what happens here and we do not have dedicated IO > > threads, so this model won't work. A worker thread can poll IOCP and then go > > into cgo. It will be a disaster if windows will decide to throttle Go > scheduler > > worker threads. > > I have been thinking long and hard about this value. I did a lot of testing too. > From my tests, I can only see that setting affect number of threads allowed to > proceed into GetQueuedCompletionStatus. From what information I can gather, this > model is designed around concept that all threads are blocked inside of > GetQueuedCompletionStatus. Designers wanted to mimimize thread switching, so > waiting threads are managed in FIFO order in the hope that thread that proceeds > with IO will return back into GetQueuedCompletionStatus as soon as possible. > Giving everything I know, I thought that 0 for that value (is translated into > CPU number under covers) is most suitable. > > http://msdn.microsoft.com/en-us/library/windows/desktop/aa365198%2528v=vs.85%... > and http://int64.org/2009/05/13/high-performance-io-on-windows/ are some links > on the subject. Also the fact that we are effectively running in 2 different > modes here (controlled by block parameter in runtime.netpoll) makes things even > more complicated. > > Leaving it as is. But will change to whatever you like, if you insist. Do not > forget, we could always change it latter, once we gather more knowledge on the > matter. Yes, it is designed around a different model -- you have a thread pool (much more threads than CPUs), all threads are blocked on GQCS, once a thread is unblocked it processes the event and blocks again. Go model is different -- the thread that gets events does not process them, instead it distributes the events over a thread pool with number of CPUs threads (GOMAXPROCS). It's further complicated by syscalls and cgo. Please set the maximum number of threads to something very large. Any potential performance problems that arise from that will be very difficult to track down. Disabling thread throttling has an additional benefit of aligning it with epoll/kqueue (that do not do any throttling). So any performance problems and changes will have more uniform effect across different OSes. https://codereview.appspot.com/8670044/diff/90001/src/pkg/runtime/netpoll_win... File src/pkg/runtime/netpoll_windows.c (right): https://codereview.appspot.com/8670044/diff/90001/src/pkg/runtime/netpoll_win... src/pkg/runtime/netpoll_windows.c:37: runtime·printf("netpoll: failed to create cpio handle (errno=%d)\n", runtime·getlasterror()); s/cpio/iocp/ https://codereview.appspot.com/8670044/diff/90001/src/pkg/runtime/netpoll_win... src/pkg/runtime/netpoll_windows.c:38: runtime·throw("netpoll: failed to create cpio handle"); s/cpio/iocp/
Sign in to reply to this message.
On 2013/06/26 10:23:22, dvyukov wrote: > LGTM > > I think you can submit now (after addressing the comments), it is on review for > enough time. > > Hope you've done stress testing with GOARCH=386/amd64 GOMAXPROCS=1/2/4/8. I think you need to extend CL description a bit (for such a significant change). Please also include fresh benchmark results.
Sign in to reply to this message.
On 2013/06/26 10:24:59, dvyukov wrote: > > I think you need to extend CL description a bit (for such a significant change)... I cannot think of what else to say. Do you have a suggestion? It is kind of all the same, but using netpoll functionality. Alex
Sign in to reply to this message.
On 2013/06/26 10:23:22, dvyukov wrote: > > Hope you've done stress testing with GOARCH=386/amd64 GOMAXPROCS=1/2/4/8. Done as much as I could. > > Please set the maximum number of threads to something very large. Set to 0xffffffff. But now I have a problem - TestVariousDeadlines1Proc fails on amd64: ... --- FAIL: TestVariousDeadlines1Proc (5.87 seconds) timeout_test.go:483: 1ns run 1/3 timeout_test.go:503: for 1ns run 1/3, good client timeout after 2.9292ms, reading 0 bytes ... timeout_test.go:483: 750ns run 1/3 timeout_test.go:508: for 750ns run 1/3: timeout (2s) waiting for client to timeout (750ns) reading FAIL I looked high and low, and I cannot see any flaws in our changes. Also, If I change TestVariousDeadlines1Proc timeout time from 2s up to 10s, then problem goes away (but test sometimes runs for too long). So, to me, that proves our netpoll code is correct, it just scheduler is not good enough for this test. Perhaps I am wrong about that. Not sure what to do here :-( > > Please also include fresh benchmark results. I have collected some benchmarks https://code.google.com/p/go/issues/detail?id=5284#c4 (there is README to explain what is what). Like I said before, they are not conclusive. If you insist I publish something, help me pick benchmarks to include. Alex https://codereview.appspot.com/8670044/diff/90001/src/pkg/runtime/netpoll_win... File src/pkg/runtime/netpoll_windows.c (right): https://codereview.appspot.com/8670044/diff/90001/src/pkg/runtime/netpoll_win... src/pkg/runtime/netpoll_windows.c:37: runtime·printf("netpoll: failed to create cpio handle (errno=%d)\n", runtime·getlasterror()); On 2013/06/26 10:23:24, dvyukov wrote: > s/cpio/iocp/ Done. https://codereview.appspot.com/8670044/diff/90001/src/pkg/runtime/netpoll_win... src/pkg/runtime/netpoll_windows.c:38: runtime·throw("netpoll: failed to create cpio handle"); On 2013/06/26 10:23:24, dvyukov wrote: > s/cpio/iocp/ Done.
Sign in to reply to this message.
Hello dvyukov@google.com (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
On Thu, Jun 27, 2013 at 10:52 AM, <alex.brainman@gmail.com> wrote: > On 2013/06/26 10:23:22, dvyukov wrote: > >> Hope you've done stress testing with GOARCH=386/amd64 > > GOMAXPROCS=1/2/4/8. > > Done as much as I could. > >> Please set the maximum number of threads to something very large. > Set to 0xffffffff. But now I have a problem - TestVariousDeadlines1Proc > fails on amd64: > > ... > --- FAIL: TestVariousDeadlines1Proc (5.87 seconds) > timeout_test.go:483: 1ns run 1/3 > timeout_test.go:503: for 1ns run 1/3, good client timeout after > 2.9292ms, reading 0 bytes > ... > timeout_test.go:483: 750ns run 1/3 > timeout_test.go:508: for 750ns run 1/3: timeout (2s) waiting for > client to timeout (750ns) reading > FAIL Is it started happening when you set max threads to 0xffffffff? Or it was happening before? > I looked high and low, and I cannot see any flaws in our changes. Also, > If I change TestVariousDeadlines1Proc timeout time from 2s up to 10s, > then problem goes away (but test sometimes runs for too long). So, to > me, that proves our netpoll code is correct, it just scheduler is not > good enough for this test. Perhaps I am wrong about that. Not sure what > to do here :-( > > > >> Please also include fresh benchmark results. > > > I have collected some benchmarks > https://code.google.com/p/go/issues/detail?id=5284#c4 (there is README > to explain what is what). Like I said before, they are not conclusive. > If you insist I publish something, help me pick benchmarks to include. > > Alex > > > > https://codereview.appspot.com/8670044/diff/90001/src/pkg/runtime/netpoll_win... > File src/pkg/runtime/netpoll_windows.c (right): > > https://codereview.appspot.com/8670044/diff/90001/src/pkg/runtime/netpoll_win... > src/pkg/runtime/netpoll_windows.c:37: runtime·printf("netpoll: failed to > create cpio handle (errno=%d)\n", runtime·getlasterror()); > On 2013/06/26 10:23:24, dvyukov wrote: >> >> s/cpio/iocp/ > > > Done. > > > https://codereview.appspot.com/8670044/diff/90001/src/pkg/runtime/netpoll_win... > src/pkg/runtime/netpoll_windows.c:38: runtime·throw("netpoll: failed to > create cpio handle"); > On 2013/06/26 10:23:24, dvyukov wrote: >> >> s/cpio/iocp/ > > > Done. > > https://codereview.appspot.com/8670044/
Sign in to reply to this message.
On Thu, Jun 27, 2013 at 3:25 PM, Dmitry Vyukov <dvyukov@google.com> wrote: > On Thu, Jun 27, 2013 at 10:52 AM, <alex.brainman@gmail.com> wrote: >> On 2013/06/26 10:23:22, dvyukov wrote: >> >>> Hope you've done stress testing with GOARCH=386/amd64 >> >> GOMAXPROCS=1/2/4/8. >> >> Done as much as I could. >> >>> Please set the maximum number of threads to something very large. >> Set to 0xffffffff. But now I have a problem - TestVariousDeadlines1Proc >> fails on amd64: >> >> ... >> --- FAIL: TestVariousDeadlines1Proc (5.87 seconds) >> timeout_test.go:483: 1ns run 1/3 >> timeout_test.go:503: for 1ns run 1/3, good client timeout after >> 2.9292ms, reading 0 bytes >> ... >> timeout_test.go:483: 750ns run 1/3 >> timeout_test.go:508: for 750ns run 1/3: timeout (2s) waiting for >> client to timeout (750ns) reading >> FAIL > > Is it started happening when you set max threads to 0xffffffff? Or it > was happening before? What if you set max threads to 10000? Does TestVariousDeadlines4Proc timeout? >> I looked high and low, and I cannot see any flaws in our changes. Also, >> If I change TestVariousDeadlines1Proc timeout time from 2s up to 10s, >> then problem goes away (but test sometimes runs for too long). So, to >> me, that proves our netpoll code is correct, it just scheduler is not >> good enough for this test. Perhaps I am wrong about that. Not sure what >> to do here :-( >> >> >> >>> Please also include fresh benchmark results. >> >> >> I have collected some benchmarks >> https://code.google.com/p/go/issues/detail?id=5284#c4 (there is README >> to explain what is what). Like I said before, they are not conclusive. >> If you insist I publish something, help me pick benchmarks to include. >> >> Alex >> >> >> >> https://codereview.appspot.com/8670044/diff/90001/src/pkg/runtime/netpoll_win... >> File src/pkg/runtime/netpoll_windows.c (right): >> >> https://codereview.appspot.com/8670044/diff/90001/src/pkg/runtime/netpoll_win... >> src/pkg/runtime/netpoll_windows.c:37: runtime·printf("netpoll: failed to >> create cpio handle (errno=%d)\n", runtime·getlasterror()); >> On 2013/06/26 10:23:24, dvyukov wrote: >>> >>> s/cpio/iocp/ >> >> >> Done. >> >> >> https://codereview.appspot.com/8670044/diff/90001/src/pkg/runtime/netpoll_win... >> src/pkg/runtime/netpoll_windows.c:38: runtime·throw("netpoll: failed to >> create cpio handle"); >> On 2013/06/26 10:23:24, dvyukov wrote: >>> >>> s/cpio/iocp/ >> >> >> Done. >> >> https://codereview.appspot.com/8670044/
Sign in to reply to this message.
On 2013/06/27 11:25:45, dvyukov wrote: > > Is it started happening when you set max threads to 0xffffffff? Or it > was happening before? > The former. Maybe it was broken before, but I didn't notice. Maybe give enough stress it will fail too. With max threads 0xffffffff it fails pretty much every second time. This is on amd64 only, 386 works fine. Alex
Sign in to reply to this message.
On Thu, Jun 27, 2013 at 3:26 PM, Dmitry Vyukov <dvyukov@google.com> wrote: > On Thu, Jun 27, 2013 at 3:25 PM, Dmitry Vyukov <dvyukov@google.com> wrote: >> On Thu, Jun 27, 2013 at 10:52 AM, <alex.brainman@gmail.com> wrote: >>> On 2013/06/26 10:23:22, dvyukov wrote: >>> >>>> Hope you've done stress testing with GOARCH=386/amd64 >>> >>> GOMAXPROCS=1/2/4/8. >>> >>> Done as much as I could. >>> >>>> Please set the maximum number of threads to something very large. >>> Set to 0xffffffff. But now I have a problem - TestVariousDeadlines1Proc >>> fails on amd64: >>> >>> ... >>> --- FAIL: TestVariousDeadlines1Proc (5.87 seconds) >>> timeout_test.go:483: 1ns run 1/3 >>> timeout_test.go:503: for 1ns run 1/3, good client timeout after >>> 2.9292ms, reading 0 bytes >>> ... >>> timeout_test.go:483: 750ns run 1/3 >>> timeout_test.go:508: for 750ns run 1/3: timeout (2s) waiting for >>> client to timeout (750ns) reading >>> FAIL >> >> Is it started happening when you set max threads to 0xffffffff? Or it >> was happening before? > > What if you set max threads to 10000? > Does TestVariousDeadlines4Proc timeout? Does it happen deterministically or not? Please post several failure logs. I suspect it can be due to some fairness issues in the scheduler -- the test requires timer goroutine to run, but client and server goroutines never give up CPU. I mean that it's similar to: go func() { stop = 1 }() for stop == 0 { }
Sign in to reply to this message.
On 2013/06/27 11:27:19, dvyukov wrote: > > What if you set max threads to 10000? Will try. > Does TestVariousDeadlines4Proc timeout? I didn't notice. Will check. Alex
Sign in to reply to this message.
On Thu, Jun 27, 2013 at 3:36 PM, <alex.brainman@gmail.com> wrote: > On 2013/06/27 11:25:45, dvyukov wrote: > >> Is it started happening when you set max threads to 0xffffffff? Or it >> was happening before? > > > > The former. Maybe it was broken before, but I didn't notice. Maybe give > enough stress it will fail too. With max threads 0xffffffff it fails > pretty much every second time. This is on amd64 only, 386 works fine. I would not be surprised if kernel uses int16 to store the value. Please try 10000. Also please try adding runtime.Gosched() into client and server goroutines.
Sign in to reply to this message.
On 2013/06/27 11:37:15, dvyukov wrote: > > Does it happen deterministically or not? ... As I said. It happens about every second time. Same problem - connection keeps going and ignore timeout set. And test times out. > ... Please post several failure logs. Like I said above, they all look similar. Sometimes it fails early, sometimes late. > I suspect it can be due to some fairness issues in the scheduler -- > the test requires timer goroutine to run, but client and server > goroutines never give up CPU. ... I though as much. What can we do apart from disabling the test? Alex
Sign in to reply to this message.
On 2013/06/27 11:39:26, dvyukov wrote: > > I would not be surprised if kernel uses int16 to store the value. > Please try 10000. Will do. > Also please try adding runtime.Gosched() into client and server goroutines. Where exactly should I put it? Alex
Sign in to reply to this message.
On 2013/06/27 11:45:44, brainman wrote: > On 2013/06/27 11:39:26, dvyukov wrote: > > > > I would not be surprised if kernel uses int16 to store the value. > > Please try 10000. > > Will do. > > > Also please try adding runtime.Gosched() into client and server goroutines. > > Where exactly should I put it? Yeah, there is no good to add them, because the test uses io.Copy. Please try to inline io.Copy, and add runtime.Gosched() into the loop, so that it yields every 4K or so. Also please try this patch: https://codereview.appspot.com/10683044
Sign in to reply to this message.
On 2013/06/27 12:12:23, dvyukov wrote: > > Yeah, there is no good to add them, because the test uses io.Copy. > Please try to inline io.Copy, and add runtime.Gosched() into the loop, so that > it yields every 4K or so. > > > Also please try this patch: > https://codereview.appspot.com/10683044 Sounds like a plan. Thank you. Alex
Sign in to reply to this message.
> ... TestVariousDeadlines4Proc timeout too? Yes. > ... set max threads to 10000, see if that helps ... No difference. It acts just like 0xffffffff. I was mistaken yesterday to say that TestVariousDeadlines1Proc works with max threads set to 0 - it only works if test is run with -short flag (do 1, not 3 runs each, and skip tests that go longer then 500*time.Microsecond). It also works with max threads set to 0, if I run ALL tests, not just TestVariousDeadlines1Proc, -short or no -short. It works unconditionally on windows/386, regardless of max threads value ot -short flag or running together or single TestVariousDeadlines1Proc test. > ... try this patch: https://codereview.appspot.com/10683044 It does not help. TestVariousDeadlines1Proc breaks as before. > ... inline io.Copy, and add runtime.Gosched() into the loop, so that it yields every 4K or so ... That did the trick. Here https://codereview.appspot.com/10690045/ is my changed test. > ... I think you need to extend CL description a bit (for such a significant change)... I cannot think of what else to say. Do you have a suggestion? It is kind of all the same, but using netpoll functionality. > ... Please also include fresh benchmark results. I have collected some benchmarks https://code.google.com/p/go/issues/detail?id=5284#c4 (there is README to explain what is what). Like I said before, they are not conclusive. If you insist I publish something, help me pick benchmarks to include. Thank you. Alex
Sign in to reply to this message.
On 2013/06/28 02:43:12, brainman wrote: > > I have collected some benchmarks > https://code.google.com/p/go/issues/detail?id=5284#c4 (there is README to > explain what is what). Like I said before, they are not conclusive. If you > insist I publish something, help me pick benchmarks to include. > I have problem with my benchmark script, so I updated them to https://code.google.com/p/go/issues/detail?id=5284#c5. Not that they are much different. Alex
Sign in to reply to this message.
On 2013/06/28 02:43:12, brainman wrote: > > ... TestVariousDeadlines4Proc timeout too? > > Yes. OK, so it's not related to GOMAXPROCS value. > > ... set max threads to 10000, see if that helps ... > > No difference. It acts just like 0xffffffff. > > I was mistaken yesterday to say that TestVariousDeadlines1Proc works with max > threads set to 0 - it only works if test is run with -short flag (do 1, not 3 > runs each, and skip tests that go longer then 500*time.Microsecond). It also > works with max threads set to 0, if I run ALL tests, not just > TestVariousDeadlines1Proc, -short or no -short. > > It works unconditionally on windows/386, regardless of max threads value ot > -short flag or running together or single TestVariousDeadlines1Proc test. and it's not related to NumberOfConcurrentThreads value > > ... try this patch: https://codereview.appspot.com/10683044 > > It does not help. TestVariousDeadlines1Proc breaks as before. and it's not a problem with scheduler (at least with this particular part of the scheduler) > > ... inline io.Copy, and add runtime.Gosched() into the loop, so that it yields > every 4K or so ... > > That did the trick. Here https://codereview.appspot.com/10690045/ is my changed > test. OK > > ... I think you need to extend CL description a bit (for such a significant > change)... > > I cannot think of what else to say. Do you have a suggestion? It is kind of all > the same, but using netpoll functionality. > > > ... Please also include fresh benchmark results. > > I have collected some benchmarks > https://code.google.com/p/go/issues/detail?id=5284#c4 (there is README to > explain what is what). Like I said before, they are not conclusive. If you > insist I publish something, help me pick benchmarks to include. say that it moves the network poller from net package into runtime I would include the following benchmarks: BenchmarkTCP4OneShot 500 296856 ns/op BenchmarkTCP4OneShot-2 500 324198 ns/op BenchmarkTCP4OneShot-3 500 312480 ns/op BenchmarkTCP4OneShotTimeout 1000 294903 ns/op BenchmarkTCP4OneShotTimeout-2 1000 343728 ns/op BenchmarkTCP4OneShotTimeout-3 500 355446 ns/op BenchmarkTCP4Persistent 5000 33396 ns/op BenchmarkTCP4Persistent-2 5000 34763 ns/op BenchmarkTCP4Persistent-3 5000 36911 ns/op BenchmarkTCP4PersistentTimeout 5000 34177 ns/op BenchmarkTCP4PersistentTimeout-2 5000 35739 ns/op BenchmarkTCP4PersistentTimeout-3 5000 36130 ns/op
Sign in to reply to this message.
Regarding the test failure, I don't have any good ideas now... You've said that if you increase timeout to 10 seconds, then it passes, but sometimes takes long time, right? I do not understand what is the potential difference between amd64 and 386... I would add debug prints into netpoller and timers, and try to understand what exactly happens wrong. E.g. (1) client/server just continue the communication, timeout does not fire or (2) client/server just continue the communication, timeout fire but take no effect or (3) client/server hang, where? or (4) something else
Sign in to reply to this message.
On 2013/06/28 11:15:14, dvyukov wrote: > I would include the following benchmarks: > BenchmarkTCP4OneShot 500 296856 ns/op > BenchmarkTCP4OneShot-2 500 324198 ns/op > BenchmarkTCP4OneShot-3 500 312480 ns/op > BenchmarkTCP4OneShotTimeout 1000 294903 ns/op > BenchmarkTCP4OneShotTimeout-2 1000 343728 ns/op > BenchmarkTCP4OneShotTimeout-3 500 355446 ns/op > BenchmarkTCP4Persistent 5000 33396 ns/op > BenchmarkTCP4Persistent-2 5000 34763 ns/op > BenchmarkTCP4Persistent-3 5000 36911 ns/op > BenchmarkTCP4PersistentTimeout 5000 34177 ns/op > BenchmarkTCP4PersistentTimeout-2 5000 35739 ns/op > BenchmarkTCP4PersistentTimeout-3 5000 36130 ns/op No "old to new comparisons"? Alex
Sign in to reply to this message.
On Fri, Jun 28, 2013 at 4:00 PM, <alex.brainman@gmail.com> wrote: > On 2013/06/28 11:15:14, dvyukov wrote: >> >> I would include the following benchmarks: >> BenchmarkTCP4OneShot 500 296856 ns/op >> BenchmarkTCP4OneShot-2 500 324198 ns/op >> BenchmarkTCP4OneShot-3 500 312480 ns/op >> BenchmarkTCP4OneShotTimeout 1000 294903 ns/op >> BenchmarkTCP4OneShotTimeout-2 1000 343728 ns/op >> BenchmarkTCP4OneShotTimeout-3 500 355446 ns/op >> BenchmarkTCP4Persistent 5000 33396 ns/op >> BenchmarkTCP4Persistent-2 5000 34763 ns/op >> BenchmarkTCP4Persistent-3 5000 36911 ns/op >> BenchmarkTCP4PersistentTimeout 5000 34177 ns/op >> BenchmarkTCP4PersistentTimeout-2 5000 35739 ns/op >> BenchmarkTCP4PersistentTimeout-3 5000 36130 ns/op > > > No "old to new comparisons"? Including "old to new comparisons".
Sign in to reply to this message.
On 2013/06/28 11:22:28, dvyukov wrote: > Regarding the test failure, I don't have any good ideas now... How about changing the test as per my new CL? > You've said that if you increase timeout to 10 seconds, then it passes, but > sometimes takes long time, right? Yes, but what is the limit? 2s? 10s? 1m? 1h? Who came up with 2s? Why should it end in 2s? Also, there are many short tests there (start with 1ns or something). If each takes 10s, the total test takes too long. > I do not understand what is the potential difference between amd64 and 386... It is not really the arch, but the way OS works. Remember old OS version (386) does not have CancelEx function, so the execution path is completely different. For example on 386 all IO is started and canceled on single thread. So all thread switching pattern is very different. I am just guessing. :-) > I would add debug prints into netpoller and timers, and try to understand what > exactly happens wrong. > E.g. (1) client/server just continue the communication, timeout does not fire > or (2) client/server just continue the communication, timeout fire but take no > effect > or (3) client/server hang, where? > or (4) something else I put panic("CRASH") in the test where t.Fatal is, so I can see stack trace. I can see goroutine waiting in park() in runtime inside runtime_WaitCancelled (or whatever it is called). So, it means, timeout is fired, and we called CancelIO and now we waiting for IO to get canceled, but it has not happened yet. But, given that it works if I extend wait till 10s it works, it means that IO eventually competes. So, I would say it is (3), but it does not hang, just takes longer then we expect it to. Probably, because it is busy doing other things. Alex
Sign in to reply to this message.
On Fri, Jun 28, 2013 at 4:14 PM, <alex.brainman@gmail.com> wrote: > On 2013/06/28 11:22:28, dvyukov wrote: >> >> Regarding the test failure, I don't have any good ideas now... > > How about changing the test as per my new CL? Well, if it's a bug in implementation, it's not a good idea to just update the test to account for the bug. >> You've said that if you increase timeout to 10 seconds, then it > > passes, but >> >> sometimes takes long time, right? > > > Yes, but what is the limit? 2s? 10s? 1m? 1h? Who came up with 2s? Why > should it end in 2s? Also, there are many short tests there (start with > 1ns or something). If each takes 10s, the total test takes too long. I think it's 2s, because the max timeout in the test is 1s. So 2s must be enough for the timeout to happen. >> I do not understand what is the potential difference between amd64 and > > 386... > > It is not really the arch, but the way OS works. Remember old OS version > (386) does not have CancelEx function, so the execution path is > completely different. For example on 386 all IO is started and canceled > on single thread. So all thread switching pattern is very different. Ah, I see, makes sense. > I am just guessing. :-) > > >> I would add debug prints into netpoller and timers, and try to > > understand what >> >> exactly happens wrong. >> E.g. (1) client/server just continue the communication, timeout does > > not fire >> >> or (2) client/server just continue the communication, timeout fire but > > take no >> >> effect >> or (3) client/server hang, where? >> or (4) something else > > > I put panic("CRASH") in the test where t.Fatal is, so I can see stack > trace. I can see goroutine waiting in park() in runtime inside > runtime_WaitCancelled (or whatever it is called). So, it means, timeout > is fired, and we called CancelIO and now we waiting for IO to get > canceled, but it has not happened yet. But, given that it works if I > extend wait till 10s it works, it means that IO eventually competes. So, > I would say it is (3), but it does not hang, just takes longer then we > expect it to. Probably, because it is busy doing other things. It's already something. What other things can it do? There must be not much other things...
Sign in to reply to this message.
On 2013/06/28 12:28:27, dvyukov wrote: > What other things can it do? There must be not much other things... Fair enough. I will run that test again and see if I can find out more. Alex
Sign in to reply to this message.
Maybe it correlates with some cold error paths, e.g. WSARecv returns nil instead of ERROR_IO_PENDING (or vice versa); or CancelIOEx() returns ERROR_NOT_FOUND. On Fri, Jun 28, 2013 at 4:32 PM, <alex.brainman@gmail.com> wrote: > On 2013/06/28 12:28:27, dvyukov wrote: >> >> What other things can it do? There must be not much other things... > > > Fair enough. I will run that test again and see if I can find out more. > > Alex > > https://codereview.appspot.com/8670044/
Sign in to reply to this message.
message: On 2013/06/28 12:28:27, dvyukov wrote: > ... > What other things can it do? There must be not much other things... I can see that net timeouts aren't fired. They fired but they take longer then they should. So I started looking at what they do, and they happened to be sitting in WaitForSingleObject in runtime.semasleep for much longer then they should - WaitForSingleObject gets passed *timeout* parameters, but it times out after much longer time then *timeout* says. I was thinking about why, and the only good idea I had is that something else is hogging CPU while we waiting in WaitForSingleObject. So I started looking for any new busy loops that my change could introduce, and the only thing that I could come up with is GetQueuedCompletionStatus function. I changed GetQueuedCompletionStatus "wait" parameter from 0 to 1, and that fixes broken test. Here 79c79,80 < wait = 0; --- > // TODO(brainman): should use 0 here instead, but scheduler hogs CPU > wait = 1; is the netpoll_windows.c change. Perhaps when GetQueuedCompletionStatus gets called with 0 (do-not-wait) it is very quick when data is not there, but calling it with 1 (wait for 1ms) makes it go into OS scheduler or something. I am not convinced it is correct solution. So I put TODO to look at it later. Perhaps Go scheduler needs to be adjusted to accommodate for what is happening here. Why should we wait in GetQueuedCompletionStatus unless we deliberately want to do so? One thing I did noticed is that windows runtime.nanotime works in ~15ms increments - maybe that confuses Go scheduler. Anyway, I am happy to submit the CL with the latest change. There is still plenty more work to do, but we have something that works for the moment. I have updated CL description and added benchmark output. Please let me now what to do next. Thank you. Alex
Sign in to reply to this message.
Hello dvyukov@google.com (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
On 2013/07/05 02:58:58, brainman wrote: > message: On 2013/06/28 12:28:27, dvyukov wrote: > > ... > > What other things can it do? There must be not much other things... > > I can see that net timeouts aren't fired. They fired but they take longer then > they should. So I started looking at what they do, and they happened to be > sitting in WaitForSingleObject in runtime.semasleep for much longer then they > should - WaitForSingleObject gets passed *timeout* parameters, but it times out > after much longer time then *timeout* says. I was thinking about why, and the > only good idea I had is that something else is hogging CPU while we waiting in > WaitForSingleObject. So I started looking for any new busy loops that my change > could introduce, and the only thing that I could come up with is > GetQueuedCompletionStatus function. I changed GetQueuedCompletionStatus "wait" > parameter from 0 to 1, and that fixes broken test. Here > > 79c79,80 > < wait = 0; > --- > > // TODO(brainman): should use 0 here instead, but scheduler hogs CPU > > wait = 1; > > is the netpoll_windows.c change. Perhaps when GetQueuedCompletionStatus gets > called with 0 (do-not-wait) it is very quick when data is not there, but calling > it with 1 (wait for 1ms) makes it go into OS scheduler or something. > > I am not convinced it is correct solution. So I put TODO to look at it later. > Perhaps Go scheduler needs to be adjusted to accommodate for what is happening > here. Why should we wait in GetQueuedCompletionStatus unless we deliberately > want to do so? One thing I did noticed is that windows runtime.nanotime works in > ~15ms increments - maybe that confuses Go scheduler. > > Anyway, I am happy to submit the CL with the latest change. There is still > plenty more work to do, but we have something that works for the moment. I have > updated CL description and added benchmark output. > > Please let me now what to do next. Thank you. Good idea about busy loops. scheduler sysmon thread can potentially call netpoll() in an almost busy loop. However in between it calls usleep(). Probably windows ignores small values in usleep() and lets it run all the time. Different runnable thread still should be alternated at 15ms intervals, so probably IO priority boosting cames into picture (however sysmon does not do any IO, but it calls GQCS, so probably OS thinks that it's IO thread). And it all makes sense only if you run on a single core machine, because otherwise different threads should run in parallel. Is the machine single core? In any case please try the following patch, and temporary change wait to 0 for non-blocking netpoll calls, and check if it helps. diff -r dc24634de6c5 src/pkg/runtime/proc.c --- a/src/pkg/runtime/proc.c Wed Jul 17 11:04:34 2013 +0200 +++ b/src/pkg/runtime/proc.c Wed Jul 17 14:40:15 2013 +0400 @@ -2086,6 +2086,7 @@ lastpoll = runtime·atomicload64(&runtime·sched.lastpoll); now = runtime·nanotime(); if(lastpoll != 0 && lastpoll + 10*1000*1000 > now) { + runtime·cas64(&runtime·sched.lastpoll, lastpoll, now); gp = runtime·netpoll(false); // non-blocking injectglist(gp); } If it does not help, we will land current version.
Sign in to reply to this message.
On 2013/07/17 10:42:15, dvyukov wrote: > ... Different runnable thread still should be > alternated at 15ms intervals, so probably IO priority boosting cames into > picture (however sysmon does not do any IO, but it calls GQCS, so probably OS > thinks that it's IO thread). Windows threads can be assigned different priorities. Can we use these to provide better control? > ... And it all makes sense only if you run on a single > core machine, because otherwise different threads should run in parallel. Is the > machine single core? It is. All other pcs I have are multiple cpus. It is the only pc that fails TestVariousDeadlines1Proc. > In any case please try the following patch, and temporary change wait to 0 for > non-blocking netpoll calls, and check if it helps. > It does not help. Your change does not build, so I had to change it to: diff -r 2bbca155a87f src/pkg/runtime/proc.c --- a/src/pkg/runtime/proc.c Thu Jul 04 14:24:21 2013 +1000 +++ b/src/pkg/runtime/proc.c Fri Jul 19 14:37:43 2013 +1000 @@ -2043,7 +2043,7 @@ sysmon(void) { uint32 idle, delay; - int64 now, lastpoll; + uint64 now, lastpoll; G *gp; uint32 ticks[MaxGomaxprocs]; @@ -2073,6 +2073,7 @@ lastpoll = runtime·atomicload64(&runtime·sched.lastpoll); now = runtime·nanotime(); if(lastpoll != 0 && lastpoll + 10*1000*1000 > now) { + runtime·cas64(&runtime·sched.lastpoll, &lastpoll, now); gp = runtime·netpoll(false); // non-blocking injectglist(gp); } > > If it does not help, we will land current version. Sounds good. What should I do now? Alex
Sign in to reply to this message.
On 2013/07/19 04:50:16, brainman wrote: > On 2013/07/17 10:42:15, dvyukov wrote: > > ... Different runnable thread still should be > > alternated at 15ms intervals, so probably IO priority boosting cames into > > picture (however sysmon does not do any IO, but it calls GQCS, so probably OS > > thinks that it's IO thread). > > Windows threads can be assigned different priorities. Can we use these to > provide better control? I would like to avoid thread priorities. > > ... And it all makes sense only if you run on a single > > core machine, because otherwise different threads should run in parallel. Is > the > > machine single core? > > It is. All other pcs I have are multiple cpus. It is the only pc that fails > TestVariousDeadlines1Proc. Aha! Then it's a busy loop with high probability. > > In any case please try the following patch, and temporary change wait to 0 for > > non-blocking netpoll calls, and check if it helps. > > > > It does not help. Your change does not build, so I had to change it to: > > diff -r 2bbca155a87f src/pkg/runtime/proc.c > --- a/src/pkg/runtime/proc.c Thu Jul 04 14:24:21 2013 +1000 > +++ b/src/pkg/runtime/proc.c Fri Jul 19 14:37:43 2013 +1000 > @@ -2043,7 +2043,7 @@ > sysmon(void) > { > uint32 idle, delay; > - int64 now, lastpoll; > + uint64 now, lastpoll; > G *gp; > uint32 ticks[MaxGomaxprocs]; > > @@ -2073,6 +2073,7 @@ > lastpoll = runtime·atomicload64(&runtime·sched.lastpoll); > now = runtime·nanotime(); > if(lastpoll != 0 && lastpoll + 10*1000*1000 > now) { > + runtime·cas64(&runtime·sched.lastpoll, &lastpoll, now); > gp = runtime·netpoll(false); // non-blocking > injectglist(gp); > } > > > > If it does not help, we will land current version. > > Sounds good. What should I do now? it LGTM it would be nice if somebody else take a look, but if not run the tests once again and submit in a day or two
Sign in to reply to this message.
*** Submitted as https://code.google.com/p/go/source/detail?r=e6a6dc0d9c22 *** net: implement netpoll for windows Moves the network poller from net package into runtime. benchmark old ns/op new ns/op delta BenchmarkTCP4OneShot 316386 287061 -9.27% BenchmarkTCP4OneShot-2 339822 313424 -7.77% BenchmarkTCP4OneShot-3 330057 306589 -7.11% BenchmarkTCP4OneShotTimeout 341775 287061 -16.01% BenchmarkTCP4OneShotTimeout-2 380835 295849 -22.32% BenchmarkTCP4OneShotTimeout-3 398412 328070 -17.66% BenchmarkTCP4Persistent 40622 33392 -17.80% BenchmarkTCP4Persistent-2 44528 35736 -19.74% BenchmarkTCP4Persistent-3 44919 36907 -17.84% BenchmarkTCP4PersistentTimeout 45309 33588 -25.87% BenchmarkTCP4PersistentTimeout-2 50289 38079 -24.28% BenchmarkTCP4PersistentTimeout-3 51559 37103 -28.04% BenchmarkTCP6OneShot 361305 345645 -4.33% BenchmarkTCP6OneShot-2 361305 331976 -8.12% BenchmarkTCP6OneShot-3 376929 347598 -7.78% BenchmarkTCP6OneShotTimeout 361305 322212 -10.82% BenchmarkTCP6OneShotTimeout-2 378882 333928 -11.86% BenchmarkTCP6OneShotTimeout-3 388647 335881 -13.58% BenchmarkTCP6Persistent 47653 35345 -25.83% BenchmarkTCP6Persistent-2 49215 35736 -27.39% BenchmarkTCP6Persistent-3 38474 37493 -2.55% BenchmarkTCP6PersistentTimeout 56637 34369 -39.32% BenchmarkTCP6PersistentTimeout-2 42575 38079 -10.56% BenchmarkTCP6PersistentTimeout-3 44137 37689 -14.61% R=dvyukov CC=golang-dev https://codereview.appspot.com/8670044
Sign in to reply to this message.
|