Issue 8670044: code review 8670044: net: implement netpoll for windows

Issue 8670044: code review 8670044: net: implement netpoll for windows (Closed)

Can't Edit
Can't Publish+Mail
Start Review

Created:
12 years ago by brainman

Modified:
11 years, 9 months ago

Reviewers:
dvyukov

CC:
dvyukov, golang-dev

Visibility:
Public.

Description

net: implement netpoll for windows Moves the network poller from net package into runtime. benchmark old ns/op new ns/op delta BenchmarkTCP4OneShot 316386 287061 -9.27% BenchmarkTCP4OneShot-2 339822 313424 -7.77% BenchmarkTCP4OneShot-3 330057 306589 -7.11% BenchmarkTCP4OneShotTimeout 341775 287061 -16.01% BenchmarkTCP4OneShotTimeout-2 380835 295849 -22.32% BenchmarkTCP4OneShotTimeout-3 398412 328070 -17.66% BenchmarkTCP4Persistent 40622 33392 -17.80% BenchmarkTCP4Persistent-2 44528 35736 -19.74% BenchmarkTCP4Persistent-3 44919 36907 -17.84% BenchmarkTCP4PersistentTimeout 45309 33588 -25.87% BenchmarkTCP4PersistentTimeout-2 50289 38079 -24.28% BenchmarkTCP4PersistentTimeout-3 51559 37103 -28.04% BenchmarkTCP6OneShot 361305 345645 -4.33% BenchmarkTCP6OneShot-2 361305 331976 -8.12% BenchmarkTCP6OneShot-3 376929 347598 -7.78% BenchmarkTCP6OneShotTimeout 361305 322212 -10.82% BenchmarkTCP6OneShotTimeout-2 378882 333928 -11.86% BenchmarkTCP6OneShotTimeout-3 388647 335881 -13.58% BenchmarkTCP6Persistent 47653 35345 -25.83% BenchmarkTCP6Persistent-2 49215 35736 -27.39% BenchmarkTCP6Persistent-3 38474 37493 -2.55% BenchmarkTCP6PersistentTimeout 56637 34369 -39.32% BenchmarkTCP6PersistentTimeout-2 42575 38079 -10.56% BenchmarkTCP6PersistentTimeout-3 44137 37689 -14.61%

Patch Set 1 #

Patch Set 2 : diff -r 0c9561ddf631 https://go.googlecode.com/hg/ #

Total comments: 17

Patch Set 3 : diff -r fab6ba2a2d10 https://go.googlecode.com/hg/ #

Patch Set 4 : diff -r 2d4825868d95 https://go.googlecode.com/hg/ #

Patch Set 5 : diff -r f809e1e845c4 https://go.googlecode.com/hg/ #

Total comments: 14

Patch Set 6 : diff -r 657760a60531 https://go.googlecode.com/hg/ #

Total comments: 2

Patch Set 7 : diff -r 5b2b5f8d89b0 https://go.googlecode.com/hg/ #

Patch Set 8 : diff -r 8de6d4c58935 https://go.googlecode.com/hg/ #

Total comments: 21

Patch Set 9 : diff -r d3f99f091748 https://go.googlecode.com/hg/ #

Total comments: 4

Patch Set 10 : diff -r afb7b199f074 https://go.googlecode.com/hg/ #

Patch Set 11 : diff -r fccd815ed3bb https://go.googlecode.com/hg/ #

Patch Set 12 : diff -r 2bbca155a87f https://go.googlecode.com/hg/ #

Patch Set 13 : diff -r 5ee81a14cdfe https://go.googlecode.com/hg/ #

Created: 11 years, 9 months ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+228 lines, -172 lines)			Patch
M	src/pkg/net/fd_poll_runtime.go	View	1 2 3 4 5 6 7	1 chunk	+1 line, -1 line	0 comments	Download
M	src/pkg/net/fd_windows.go	View	1 2 3 4 5 6 7 8	14 chunks	+88 lines, -147 lines	0 comments	Download
M	src/pkg/net/sendfile_windows.go	View	1	1 chunk	+1 line, -1 line	0 comments	Download
M	src/pkg/net/sockopt_windows.go	View	1	2 chunks	+0 lines, -19 lines	0 comments	Download
M	src/pkg/runtime/defs_windows.go	View	1	3 chunks	+6 lines, -2 lines	0 comments	Download
M	src/pkg/runtime/defs_windows_386.h	View	1 2 3 4	3 chunks	+10 lines, -0 lines	0 comments	Download
M	src/pkg/runtime/defs_windows_amd64.h	View	1 2 3 4	3 chunks	+10 lines, -0 lines	0 comments	Download
M	src/pkg/runtime/netpoll.goc	View	1 2 3 4 5 6 7 8	1 chunk	+1 line, -1 line	0 comments	Download
M	src/pkg/runtime/netpoll_stub.c	View	1	1 chunk	+1 line, -1 line	0 comments	Download
A	src/pkg/runtime/netpoll_windows.c	View	1 2 3 4 5 6 7 8 9 10 11	1 chunk	+110 lines, -0 lines	0 comments	Download

Messages

Total messages: 82

Expand All Messages | Collapse All Messages

dvyukov

LGTM (but this is not for submission, right) We can also experiment with GetQueueCompletionStatusEx() and ...

11 years, 11 months ago (2013-05-19 15:15:48 UTC) #1

dvyukov

https://codereview.appspot.com/8670044/diff/2001/src/pkg/net/fd_windows.go File src/pkg/net/fd_windows.go (right): https://codereview.appspot.com/8670044/diff/2001/src/pkg/net/fd_windows.go#newcode216 src/pkg/net/fd_windows.go:216: o.fd.pd.WaitCanceled(int(o.mode)) wait. you need the result from WaitCanceled. at ...

11 years, 11 months ago (2013-05-19 15:20:31 UTC) #2

brainman

Hello dvyukov@google.com (cc: golang-dev@googlegroups.com), I'd like you to review this change to https://go.googlecode.com/hg/

11 years, 11 months ago (2013-05-20 02:12:23 UTC) #3

brainman

Thank you for review. Alex https://codereview.appspot.com/8670044/diff/2001/src/pkg/net/fd_poll_runtime.go File src/pkg/net/fd_poll_runtime.go (right): https://codereview.appspot.com/8670044/diff/2001/src/pkg/net/fd_poll_runtime.go#newcode107 src/pkg/net/fd_poll_runtime.go:107: println("unreachable: ", res) I ...

11 years, 11 months ago (2013-05-20 02:12:45 UTC) #4

dvyukov

https://codereview.appspot.com/8670044/diff/2001/src/pkg/net/fd_windows.go File src/pkg/net/fd_windows.go (right): https://codereview.appspot.com/8670044/diff/2001/src/pkg/net/fd_windows.go#newcode216 src/pkg/net/fd_windows.go:216: o.fd.pd.WaitCanceled(int(o.mode)) On 2013/05/20 02:12:45, brainman wrote: > I don't ...

11 years, 11 months ago (2013-05-20 04:02:53 UTC) #5

brainman

On 2013/05/20 04:02:53, dvyukov wrote: > > ... you still need to wait for overlapped ...

11 years, 11 months ago (2013-05-20 05:12:08 UTC) #6

dvyukov

On 2013/05/20 05:12:08, brainman wrote: > On 2013/05/20 04:02:53, dvyukov wrote: > > > > ...

11 years, 11 months ago (2013-05-20 06:50:05 UTC) #7

brainman

On 2013/05/20 06:50:05, dvyukov wrote: > > I am pretty sure it must work that ...

11 years, 11 months ago (2013-05-20 07:45:08 UTC) #8

dvyukov

On 2013/05/20 07:45:08, brainman wrote: > On 2013/05/20 06:50:05, dvyukov wrote: > > > > ...

11 years, 11 months ago (2013-05-20 10:09:55 UTC) #9

dvyukov

On 2013/05/20 10:09:55, dvyukov wrote: > On 2013/05/20 07:45:08, brainman wrote: > > On 2013/05/20 ...

11 years, 11 months ago (2013-05-20 10:11:32 UTC) #10

brainman

On 2013/05/20 10:11:32, dvyukov wrote: > > But how do you want to handle old ...

11 years, 11 months ago (2013-05-20 11:23:15 UTC) #11

brainman

On 2013/05/20 04:02:53, dvyukov wrote: > > ... Even if timeout has happened and you ...

11 years, 11 months ago (2013-05-21 04:24:20 UTC) #12

brainman

Hello dvyukov@google.com (cc: golang-dev@googlegroups.com), Please take another look.

11 years, 11 months ago (2013-05-21 04:24:40 UTC) #13

dvyukov

On Tue, May 21, 2013 at 8:24 AM, <alex.brainman@gmail.com> wrote: > On 2013/05/20 04:02:53, dvyukov ...

11 years, 11 months ago (2013-05-21 06:47:30 UTC) #14

brainman

On 2013/05/21 06:47:30, dvyukov wrote: > > Is there any chance you do 0-length read/write? ...

11 years, 11 months ago (2013-05-21 07:02:11 UTC) #15

dvyukov

On Tue, May 21, 2013 at 11:02 AM, <alex.brainman@gmail.com> wrote: > On 2013/05/21 06:47:30, dvyukov ...

11 years, 11 months ago (2013-05-21 07:26:18 UTC) #16

brainman

I have been busy lately. And it is more complicated then I expected. But I ...

11 years, 10 months ago (2013-05-27 10:35:39 UTC) #17

brainman

Dmitriy, The way I see things is this: - ready to read, so net issues ...

11 years, 10 months ago (2013-06-03 00:18:26 UTC) #18

dvyukov

On 2013/06/03 00:18:26, brainman wrote: > Dmitriy, > > The way I see things is ...

11 years, 10 months ago (2013-06-03 04:07:22 UTC) #19

brainman

Like that: diff -r 2d4825868d95 src/pkg/runtime/netpoll.goc --- a/src/pkg/runtime/netpoll.goc Mon May 20 15:23:45 2013 +1000 +++ ...

11 years, 10 months ago (2013-06-03 04:57:05 UTC) #20

Like that:

diff -r 2d4825868d95 src/pkg/runtime/netpoll.goc
--- a/src/pkg/runtime/netpoll.goc	Mon May 20 15:23:45 2013 +1000
+++ b/src/pkg/runtime/netpoll.goc	Mon Jun 03 14:54:29 2013 +1000
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build darwin linux
+// +build darwin linux windows
 
 package net
 
@@ -48,7 +48,7 @@
 } pollcache;
 
 static void	netpollblock(PollDesc*, int32);
-static G*	netpollunblock(PollDesc*, int32);
+static G*	netpollunblock(PollDesc*, int32, bool);
 static void	deadline(int64, Eface);
 static void	readDeadline(int64, Eface);
 static void	writeDeadline(int64, Eface);
@@ -120,6 +120,12 @@
 	runtime·unlock(pd);
 }
 
+func runtime_pollWaitCanceled(pd *PollDesc, mode int) {
+	runtime·lock(pd);
+	netpollblock(pd, mode);
+	runtime·unlock(pd);
+}
+
 func runtime_pollSetDeadline(pd *PollDesc, d int64, mode int) {
 	runtime·lock(pd);
 	if(pd->closing)
@@ -179,8 +185,8 @@
 		runtime·throw("runtime_pollUnblock: already closing");
 	pd->closing = true;
 	pd->seq++;
-	rg = netpollunblock(pd, 'r');
-	wg = netpollunblock(pd, 'w');
+	rg = netpollunblock(pd, 'r', false);
+	wg = netpollunblock(pd, 'w', false);
 	if(pd->rt.fv) {
 		runtime·deltimer(&pd->rt);
 		pd->rt.fv = nil;
@@ -205,9 +211,9 @@
 	rg = wg = nil;
 	runtime·lock(pd);
 	if(mode == 'r' || mode == 'r'+'w')
-		rg = netpollunblock(pd, 'r');
+		rg = netpollunblock(pd, 'r', true);
 	if(mode == 'w' || mode == 'r'+'w')
-		wg = netpollunblock(pd, 'w');
+		wg = netpollunblock(pd, 'w', true);
 	runtime·unlock(pd);
 	if(rg) {
 		rg->schedlink = *gpp;
@@ -249,7 +255,7 @@
 }
 
 static G*
-netpollunblock(PollDesc *pd, int32 mode)
+netpollunblock(PollDesc *pd, int32 mode, bool setready)
 {
 	G **gpp, *old;
 
@@ -259,7 +265,8 @@
 	if(*gpp == READY)
 		return nil;
 	if(*gpp == nil) {
-		*gpp = READY;
+		if(setready)
+			*gpp = READY;
 		return nil;
 	}
 	old = *gpp;
@@ -291,14 +298,14 @@
 			runtime·throw("deadlineimpl: inconsistent read deadline");
 		pd->rd = -1;
 		pd->rt.fv = nil;
-		rg = netpollunblock(pd, 'r');
+		rg = netpollunblock(pd, 'r', false);
 	}
 	if(write) {
 		if(pd->wd <= 0 || (pd->wt.fv == nil && !read))
 			runtime·throw("deadlineimpl: inconsistent write deadline");
 		pd->wd = -1;
 		pd->wt.fv = nil;
-		wg = netpollunblock(pd, 'w');
+		wg = netpollunblock(pd, 'w', false);
 	}
 	runtime·unlock(pd);
 	if(rg)

? But, then I can see both IO completion and timeout gets fired during initial
wait, then second wait (after cancelation) gets no events and test timesout.

Alex

dvyukov

On 2013/06/03 04:57:05, brainman wrote: > Like that: > > diff -r 2d4825868d95 src/pkg/runtime/netpoll.goc > ...

11 years, 10 months ago (2013-06-03 06:44:24 UTC) #21

On 2013/06/03 04:57:05, brainman wrote:
> Like that:
> 
> diff -r 2d4825868d95 src/pkg/runtime/netpoll.goc
> --- a/src/pkg/runtime/netpoll.goc	Mon May 20 15:23:45 2013 +1000
> +++ b/src/pkg/runtime/netpoll.goc	Mon Jun 03 14:54:29 2013 +1000
> @@ -2,7 +2,7 @@
>  // Use of this source code is governed by a BSD-style
>  // license that can be found in the LICENSE file.
>  
> -// +build darwin linux
> +// +build darwin linux windows
>  
>  package net
>  
> @@ -48,7 +48,7 @@
>  } pollcache;
>  
>  static void	netpollblock(PollDesc*, int32);
> -static G*	netpollunblock(PollDesc*, int32);
> +static G*	netpollunblock(PollDesc*, int32, bool);
>  static void	deadline(int64, Eface);
>  static void	readDeadline(int64, Eface);
>  static void	writeDeadline(int64, Eface);
> @@ -120,6 +120,12 @@
>  	runtime·unlock(pd);
>  }
>  
> +func runtime_pollWaitCanceled(pd *PollDesc, mode int) {
> +	runtime·lock(pd);
> +	netpollblock(pd, mode);
> +	runtime·unlock(pd);
> +}
> +
>  func runtime_pollSetDeadline(pd *PollDesc, d int64, mode int) {
>  	runtime·lock(pd);
>  	if(pd->closing)
> @@ -179,8 +185,8 @@
>  		runtime·throw("runtime_pollUnblock: already closing");
>  	pd->closing = true;
>  	pd->seq++;
> -	rg = netpollunblock(pd, 'r');
> -	wg = netpollunblock(pd, 'w');
> +	rg = netpollunblock(pd, 'r', false);
> +	wg = netpollunblock(pd, 'w', false);
>  	if(pd->rt.fv) {
>  		runtime·deltimer(&pd->rt);
>  		pd->rt.fv = nil;
> @@ -205,9 +211,9 @@
>  	rg = wg = nil;
>  	runtime·lock(pd);
>  	if(mode == 'r' || mode == 'r'+'w')
> -		rg = netpollunblock(pd, 'r');
> +		rg = netpollunblock(pd, 'r', true);
>  	if(mode == 'w' || mode == 'r'+'w')
> -		wg = netpollunblock(pd, 'w');
> +		wg = netpollunblock(pd, 'w', true);
>  	runtime·unlock(pd);
>  	if(rg) {
>  		rg->schedlink = *gpp;
> @@ -249,7 +255,7 @@
>  }
>  
>  static G*
> -netpollunblock(PollDesc *pd, int32 mode)
> +netpollunblock(PollDesc *pd, int32 mode, bool setready)
>  {
>  	G **gpp, *old;
>  
> @@ -259,7 +265,8 @@
>  	if(*gpp == READY)
>  		return nil;
>  	if(*gpp == nil) {
> -		*gpp = READY;
> +		if(setready)
> +			*gpp = READY;
>  		return nil;
>  	}
>  	old = *gpp;
> @@ -291,14 +298,14 @@
>  			runtime·throw("deadlineimpl: inconsistent read deadline");
>  		pd->rd = -1;
>  		pd->rt.fv = nil;
> -		rg = netpollunblock(pd, 'r');
> +		rg = netpollunblock(pd, 'r', false);
>  	}
>  	if(write) {
>  		if(pd->wd <= 0 || (pd->wt.fv == nil && !read))
>  			runtime·throw("deadlineimpl: inconsistent write deadline");
>  		pd->wd = -1;
>  		pd->wt.fv = nil;
> -		wg = netpollunblock(pd, 'w');
> +		wg = netpollunblock(pd, 'w', false);
>  	}
>  	runtime·unlock(pd);
>  	if(rg)
> 
> ? But, then I can see both IO completion and timeout gets fired during initial
> wait, then second wait (after cancelation) gets no events and test timesout.


netpollunblock() is executed under the descriptor lock.
If IO completion unblocks the wait, then goroutine will not wait second time. It
will have the final result -- success or failure.
If cancel/timeout unblocks the wait, then IO completion will set rg/wg to READY.
And the second wait will instantly return.

There may be other caveats or course. Reasoning about asynchronous concurrent
logic is tricky.

brainman

> > netpollunblock() is executed under the descriptor lock. But, I think, it is possible ...

11 years, 10 months ago (2013-06-03 07:19:41 UTC) #22

dvyukov

On Mon, Jun 3, 2013 at 11:19 AM, <alex.brainman@gmail.com> wrote: > >> netpollunblock() is executed ...

11 years, 10 months ago (2013-06-03 08:17:15 UTC) #23

brainman

On 2013/06/03 08:17:15, dvyukov wrote: > > I do not understand the exact failure scenario. ...

11 years, 10 months ago (2013-06-03 09:26:06 UTC) #24

dvyukov

On 2013/06/03 09:26:06, brainman wrote: > On 2013/06/03 08:17:15, dvyukov wrote: > > > > ...

11 years, 10 months ago (2013-06-03 10:37:19 UTC) #25

brainman

> I see the problem. > The problem is that wait is preempted by IO ...

11 years, 10 months ago (2013-06-03 10:51:12 UTC) #26

brainman

On 2013/06/03 10:37:19, dvyukov wrote: > ... > I think you need to make ...

11 years, 10 months ago (2013-06-04 04:30:25 UTC) #27

brainman

Hello dvyukov@google.com (cc: golang-dev@googlegroups.com), Please take another look.

11 years, 10 months ago (2013-06-04 04:30:54 UTC) #28

dvyukov

On 2013/06/04 04:30:25, brainman wrote: > On 2013/06/03 10:37:19, dvyukov wrote: > > ... > ...

11 years, 10 months ago (2013-06-04 09:31:10 UTC) #29

dvyukov

https://codereview.appspot.com/8670044/diff/44001/src/pkg/net/fd_windows.go File src/pkg/net/fd_windows.go (right): https://codereview.appspot.com/8670044/diff/44001/src/pkg/net/fd_windows.go#newcode224 src/pkg/net/fd_windows.go:224: return int(o.qty), nil add a comment explaining that we've ...

11 years, 10 months ago (2013-06-04 09:47:33 UTC) #30

https://codereview.appspot.com/8670044/diff/44001/src/pkg/net/fd_windows.go
File src/pkg/net/fd_windows.go (right):

https://codereview.appspot.com/8670044/diff/44001/src/pkg/net/fd_windows.go#n...
src/pkg/net/fd_windows.go:224: return int(o.qty), nil
add a comment explaining that we've issued the cancellation request, but the
operation succeeded before, so we need to treat it as succeeded (the bytes are
actually sent/recv from network)

https://codereview.appspot.com/8670044/diff/44001/src/pkg/runtime/netpoll.goc
File src/pkg/runtime/netpoll.goc (right):

https://codereview.appspot.com/8670044/diff/44001/src/pkg/runtime/netpoll.goc...
src/pkg/runtime/netpoll.goc:117: err = checkerr(pd, mode);
check that err != 0
it must be, right? because we are unblocked not by ioready event
if the check does not fire, then, great, we have another invariant that we can
rely on in this complex logic. otherwise there is something we do not understand
in the code we wrote :)

https://codereview.appspot.com/8670044/diff/44001/src/pkg/runtime/netpoll.goc...
src/pkg/runtime/netpoll.goc:213: rg = netpollunblock(pd, 'r', true);
This subtly changes behaviour of kqueue/epoll.
If they get concurrent io ready and timeout/cancel (or all 3 at once :) ), they
will retry read/write syscall, and it will most likely succeed. While currently
they return timeout/cancel.
Difficult to say what is better. Since it's a race, both behaviors are legal I
think.
For cancel you probably want to cancel as soon as possible (rather than accept
and handle one more tcp conn).
For timeout you probably want to give read() the last chance.

I think the simplest way to revert to original behavior is to make
runtime_pollWait() call checkerr() unconditionally second time for epoll/kqueue
pollers.
I am not sure whether it's worth doing, it looks close to insignificant.
Leave it as is and please ask Brad.

https://codereview.appspot.com/8670044/diff/44001/src/pkg/runtime/netpoll.goc...
src/pkg/runtime/netpoll.goc:261: netpollunblock(PollDesc *pd, int32 mode, bool
setready)
probably call it 'ioready' to be consistent with return value of netpollblock()
netpollblock() returns whether IO is ready or not, which is determined by
ioready flag passed to netpollunblock()

https://codereview.appspot.com/8670044/diff/44001/src/pkg/runtime/netpoll.goc...
src/pkg/runtime/netpoll.goc:272: if(setready)
explain why we do not set READY when !ioready (runtime_pollWait checks for
timeout/cancel before waiting)

dvyukov

The high-level algorithm looks good. I actually like the way you changed netpoll.goc. I think ...

11 years, 10 months ago (2013-06-04 09:53:49 UTC) #31

dvyukov

https://codereview.appspot.com/8670044/diff/44001/src/pkg/runtime/netpoll_windows.c File src/pkg/runtime/netpoll_windows.c (right): https://codereview.appspot.com/8670044/diff/44001/src/pkg/runtime/netpoll_windows.c#newcode85 src/pkg/runtime/netpoll_windows.c:85: // TODO(brainman): Call GetQueuedCompletionStatusEx() here when possible. This is ...

11 years, 10 months ago (2013-06-04 09:56:42 UTC) #32

dvyukov

And later you may also look at removing memory allocation from Read/Write (by embedding some ...

11 years, 10 months ago (2013-06-04 10:08:10 UTC) #33

brainman

On 2013/06/04 09:56:42, dvyukov wrote: > This is unrelated to this change, but I am ...

11 years, 10 months ago (2013-06-05 07:37:15 UTC) #34

brainman

On 2013/06/04 10:08:10, dvyukov wrote: > And later you may also look at removing memory ...

11 years, 10 months ago (2013-06-05 07:39:22 UTC) #35

brainman

Hello dvyukov@google.com (cc: golang-dev@googlegroups.com), Please take another look.

11 years, 10 months ago (2013-06-06 05:37:22 UTC) #36

brainman

Thank you for review. Alex https://codereview.appspot.com/8670044/diff/44001/src/pkg/net/fd_windows.go File src/pkg/net/fd_windows.go (right): https://codereview.appspot.com/8670044/diff/44001/src/pkg/net/fd_windows.go#newcode224 src/pkg/net/fd_windows.go:224: return int(o.qty), nil On ...

11 years, 10 months ago (2013-06-06 05:37:56 UTC) #37

dvyukov

On 2013/06/20 23:04:55, brainman wrote: > ping I thought we are waiting for you: > ...

11 years, 10 months ago (2013-06-21 15:09:42 UTC) #39

brainman

On 2013/06/21 15:09:42, dvyukov wrote: > On 2013/06/20 23:04:55, brainman wrote: > > ping > ...

11 years, 10 months ago (2013-06-23 10:34:07 UTC) #40

dvyukov

On 2013/06/23 10:34:07, brainman wrote: > On 2013/06/21 15:09:42, dvyukov wrote: > > On 2013/06/20 ...

11 years, 10 months ago (2013-06-23 11:05:51 UTC) #41

dvyukov

https://codereview.appspot.com/8670044/diff/57001/src/pkg/runtime/netpoll.goc File src/pkg/runtime/netpoll.goc (right): https://codereview.appspot.com/8670044/diff/57001/src/pkg/runtime/netpoll.goc#newcode127 src/pkg/runtime/netpoll.goc:127: netpollblock(pd, mode); we expect that this returns only ioready, ...

11 years, 10 months ago (2013-06-23 11:09:43 UTC) #42

brainman

I also sent new CL https://codereview.appspot.com/10485043/ as per your request. Alex https://codereview.appspot.com/8670044/diff/57001/src/pkg/runtime/netpoll.goc File src/pkg/runtime/netpoll.goc (right): ...

11 years, 10 months ago (2013-06-24 04:37:27 UTC) #43

brainman

Hello dvyukov@google.com (cc: golang-dev@googlegroups.com), Please take another look.

11 years, 10 months ago (2013-06-24 04:37:54 UTC) #44

dvyukov

On 2013/06/24 04:37:54, brainman wrote: > Hello mailto:dvyukov@google.com (cc: mailto:golang-dev@googlegroups.com), > > Please take another ...

11 years, 10 months ago (2013-06-24 13:02:08 UTC) #45

brainman

Hello dvyukov@google.com (cc: golang-dev@googlegroups.com), Please take another look.

11 years, 10 months ago (2013-06-25 04:41:58 UTC) #46

dvyukov

https://codereview.appspot.com/8670044/diff/81001/src/pkg/net/fd_windows.go File src/pkg/net/fd_windows.go (right): https://codereview.appspot.com/8670044/diff/81001/src/pkg/net/fd_windows.go#newcode96 src/pkg/net/fd_windows.go:96: var i int guard the following lines by !canCancelIO, ...

11 years, 10 months ago (2013-06-25 12:00:28 UTC) #48

dvyukov

On 2013/06/25 04:43:00, brainman wrote: > Updated to the tip. Waiting for instructions. I think ...

11 years, 10 months ago (2013-06-25 12:01:37 UTC) #49

brainman

Hello dvyukov@google.com (cc: golang-dev@googlegroups.com), Please take another look.

11 years, 9 months ago (2013-06-26 07:17:56 UTC) #50

brainman

11 years, 9 months ago (2013-06-26 07:18:04 UTC) #51

https://codereview.appspot.com/8670044/diff/81001/src/pkg/net/fd_windows.go
File src/pkg/net/fd_windows.go (right):

https://codereview.appspot.com/8670044/diff/81001/src/pkg/net/fd_windows.go#n...
src/pkg/net/fd_windows.go:96: var i int
On 2013/06/25 12:00:32, dvyukov wrote:
> guard the following lines by !canCancelIO, we really don't want to allocate
> these chans on modern OSes
> and probably regroup it as:
> 
> o.fd = fd
> o.mode = mode
> o.runtimeCtx = fd.pd.runtimeCtx
> if !canCancelIO {
>     everything else
> }

Done.

https://codereview.appspot.com/8670044/diff/81001/src/pkg/net/fd_windows.go#n...
src/pkg/net/fd_windows.go:123: o.buf.Len = uint32(len(buf))
On 2013/06/25 12:00:32, dvyukov wrote:
> Is it a known issue that windows impl does not support >2GB net operations? I
> think we need to file an issue.

No, no one complained about that. Feel free to create an issue, if you like.

https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll.goc
File src/pkg/runtime/netpoll.goc (right):

https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll.goc...
src/pkg/runtime/netpoll.goc:273: if(*gpp == READY) {
On 2013/06/25 12:00:32, dvyukov wrote:
> drop {}

Done.

https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll_win...
File src/pkg/runtime/netpoll_windows.c (right):

https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll_win...
src/pkg/runtime/netpoll_windows.c:17: #pragma pack on
On 2013/06/25 12:00:32, dvyukov wrote:
> ... So pragma pack is either unnecessary here, or do the
> wrong thing...

Removed.

https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll_win...
src/pkg/runtime/netpoll_windows.c:21: struct net_anOp {
On 2013/06/25 12:00:32, dvyukov wrote:
> { on the next line

Done.

https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll_win...
src/pkg/runtime/netpoll_windows.c:23: Overlapped o;
On 2013/06/25 12:00:32, dvyukov wrote:
> ident 'o' with tab

Done.

https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll_win...
src/pkg/runtime/netpoll_windows.c:24: // my extras
On 2013/06/25 12:00:32, dvyukov wrote:
> I think a better comment would be along the lines of "used by netpoll"

I agree. Changed.

https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll_win...
src/pkg/runtime/netpoll_windows.c:33: static uintptr cpiohandle =
INVALID_HANDLE_VALUE;  // completion port io handle
On 2013/06/25 12:00:32, dvyukov wrote:
> it's usually abbreviated as iocp (IO Completion Port), cpio means something
else

Renamed.

https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll_win...
src/pkg/runtime/netpoll_windows.c:38: cpiohandle =
(uintptr)runtime·stdcall(runtime·CreateIoCompletionPort, 4,
INVALID_HANDLE_VALUE, (uintptr)0, (uintptr)0, (uintptr)0);
On 2013/06/25 12:00:32, dvyukov wrote:
> I think it's better to pass DWORD_MAX as max number of concurrent threads
> Windows does not understand what happens here and we do not have dedicated IO
> threads, so this model won't work. A worker thread can poll IOCP and then go
> into cgo. It will be a disaster if windows will decide to throttle Go
scheduler
> worker threads.

I have been thinking long and hard about this value. I did a lot of testing too.
From my tests, I can only see that setting affect number of threads allowed to
proceed into GetQueuedCompletionStatus. From what information I can gather, this
model is designed around concept that all threads are blocked inside of
GetQueuedCompletionStatus. Designers wanted to mimimize thread switching, so
waiting threads are managed in FIFO order in the hope that thread that proceeds
with IO will return back into GetQueuedCompletionStatus as soon as possible.
Giving everything I know, I thought that 0 for that value (is translated into
CPU number under covers) is most suitable.

http://msdn.microsoft.com/en-us/library/windows/desktop/aa365198%28v=vs.85%29...
and http://int64.org/2009/05/13/high-performance-io-on-windows/ are some links
on the subject. Also the fact that we are effectively running in 2 different
modes here (controlled by block parameter in runtime.netpoll) makes things even
more complicated.

Leaving it as is. But will change to whatever you like, if you insist. Do not
forget, we could always change it latter, once we gather more knowledge on the
matter.

https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll_win...
src/pkg/runtime/netpoll_windows.c:101: pd = o->runtimeCtx;
On 2013/06/25 12:00:32, dvyukov wrote:
> pd refers to PollDesc* in netpoll-related files
> please either (1) make pd PollDesc* or better just:
> runtime·netpollready(&gp, (void*)o->runtimeCtx, mode);

Done.

dvyukov

LGTM I think you can submit now (after addressing the comments), it is on review ...

11 years, 9 months ago (2013-06-26 10:23:22 UTC) #52

LGTM

I think you can submit now (after addressing the comments), it is on review for
enough time.

Hope you've done stress testing with GOARCH=386/amd64 GOMAXPROCS=1/2/4/8.

https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll_win...
File src/pkg/runtime/netpoll_windows.c (right):

https://codereview.appspot.com/8670044/diff/81001/src/pkg/runtime/netpoll_win...
src/pkg/runtime/netpoll_windows.c:38: cpiohandle =
(uintptr)runtime·stdcall(runtime·CreateIoCompletionPort, 4,
INVALID_HANDLE_VALUE, (uintptr)0, (uintptr)0, (uintptr)0);
On 2013/06/26 07:18:05, brainman wrote:
> On 2013/06/25 12:00:32, dvyukov wrote:
> > I think it's better to pass DWORD_MAX as max number of concurrent threads
> > Windows does not understand what happens here and we do not have dedicated
IO
> > threads, so this model won't work. A worker thread can poll IOCP and then go
> > into cgo. It will be a disaster if windows will decide to throttle Go
> scheduler
> > worker threads.
> 
> I have been thinking long and hard about this value. I did a lot of testing
too.
> From my tests, I can only see that setting affect number of threads allowed to
> proceed into GetQueuedCompletionStatus. From what information I can gather,
this
> model is designed around concept that all threads are blocked inside of
> GetQueuedCompletionStatus. Designers wanted to mimimize thread switching, so
> waiting threads are managed in FIFO order in the hope that thread that
proceeds
> with IO will return back into GetQueuedCompletionStatus as soon as possible.
> Giving everything I know, I thought that 0 for that value (is translated into
> CPU number under covers) is most suitable.
> 
>
http://msdn.microsoft.com/en-us/library/windows/desktop/aa365198%2528v=vs.85%...
> and http://int64.org/2009/05/13/high-performance-io-on-windows/ are some links
> on the subject. Also the fact that we are effectively running in 2 different
> modes here (controlled by block parameter in runtime.netpoll) makes things
even
> more complicated.
> 
> Leaving it as is. But will change to whatever you like, if you insist. Do not
> forget, we could always change it latter, once we gather more knowledge on the
> matter.

Yes, it is designed around a different model -- you have a thread pool (much
more threads than CPUs), all threads are blocked on GQCS, once a thread is
unblocked it processes the event and blocks again. Go model is different -- the
thread that gets events does not process them, instead it distributes the events
over a thread pool with number of CPUs threads (GOMAXPROCS). It's further
complicated by syscalls and cgo.

Please set the maximum number of threads to something very large.
Any potential performance problems that arise from that will be very difficult
to track down.
Disabling thread throttling has an additional benefit of aligning it with
epoll/kqueue (that do not do any throttling). So any performance problems and
changes will have more uniform effect across different OSes.

https://codereview.appspot.com/8670044/diff/90001/src/pkg/runtime/netpoll_win...
File src/pkg/runtime/netpoll_windows.c (right):

https://codereview.appspot.com/8670044/diff/90001/src/pkg/runtime/netpoll_win...
src/pkg/runtime/netpoll_windows.c:37: runtime·printf("netpoll: failed to create
cpio handle (errno=%d)\n", runtime·getlasterror());
s/cpio/iocp/

https://codereview.appspot.com/8670044/diff/90001/src/pkg/runtime/netpoll_win...
src/pkg/runtime/netpoll_windows.c:38: runtime·throw("netpoll: failed to create
cpio handle");
s/cpio/iocp/

dvyukov

On 2013/06/26 10:23:22, dvyukov wrote: > LGTM > > I think you can submit now ...

11 years, 9 months ago (2013-06-26 10:24:59 UTC) #53

brainman

On 2013/06/26 10:24:59, dvyukov wrote: > > I think you need to extend CL description ...

11 years, 9 months ago (2013-06-26 23:09:01 UTC) #54

brainman

On 2013/06/26 10:23:22, dvyukov wrote: > > Hope you've done stress testing with GOARCH=386/amd64 GOMAXPROCS=1/2/4/8. ...

11 years, 9 months ago (2013-06-27 06:52:18 UTC) #55

On 2013/06/26 10:23:22, dvyukov wrote:
> 
> Hope you've done stress testing with GOARCH=386/amd64 GOMAXPROCS=1/2/4/8.

Done as much as I could.

> 
> Please set the maximum number of threads to something very large.

Set to 0xffffffff. But now I have a problem - TestVariousDeadlines1Proc fails on
amd64:

...
--- FAIL: TestVariousDeadlines1Proc (5.87 seconds)
        timeout_test.go:483: 1ns run 1/3
        timeout_test.go:503: for 1ns run 1/3, good client timeout after
2.9292ms, reading 0 bytes
		...
        timeout_test.go:483: 750ns run 1/3
        timeout_test.go:508: for 750ns run 1/3: timeout (2s) waiting for client
to timeout (750ns) reading
FAIL		

I looked high and low, and I cannot see any flaws in our changes. Also, If I
change TestVariousDeadlines1Proc timeout time from 2s up to 10s, then problem
goes away (but test sometimes runs for too long). So, to me, that proves our
netpoll code is correct, it just scheduler is not good enough for this test.
Perhaps I am wrong about that. Not sure what to do here :-(

> 
> Please also include fresh benchmark results.

I have collected some benchmarks
https://code.google.com/p/go/issues/detail?id=5284#c4 (there is README to
explain what is what). Like I said before, they are not conclusive. If you
insist I publish something, help me pick benchmarks to include.

Alex

https://codereview.appspot.com/8670044/diff/90001/src/pkg/runtime/netpoll_win...
File src/pkg/runtime/netpoll_windows.c (right):

https://codereview.appspot.com/8670044/diff/90001/src/pkg/runtime/netpoll_win...
src/pkg/runtime/netpoll_windows.c:37: runtime·printf("netpoll: failed to create
cpio handle (errno=%d)\n", runtime·getlasterror());
On 2013/06/26 10:23:24, dvyukov wrote:
> s/cpio/iocp/

Done.

https://codereview.appspot.com/8670044/diff/90001/src/pkg/runtime/netpoll_win...
src/pkg/runtime/netpoll_windows.c:38: runtime·throw("netpoll: failed to create
cpio handle");
On 2013/06/26 10:23:24, dvyukov wrote:
> s/cpio/iocp/

Done.

brainman

Hello dvyukov@google.com (cc: golang-dev@googlegroups.com), Please take another look.

11 years, 9 months ago (2013-06-27 06:53:15 UTC) #56

dvyukov

On Thu, Jun 27, 2013 at 10:52 AM, <alex.brainman@gmail.com> wrote: > On 2013/06/26 10:23:22, dvyukov ...

11 years, 9 months ago (2013-06-27 11:25:45 UTC) #57

On Thu, Jun 27, 2013 at 10:52 AM,  <alex.brainman@gmail.com> wrote:
> On 2013/06/26 10:23:22, dvyukov wrote:
>
>> Hope you've done stress testing with GOARCH=386/amd64
>
> GOMAXPROCS=1/2/4/8.
>
> Done as much as I could.
>
>> Please set the maximum number of threads to something very large.
> Set to 0xffffffff. But now I have a problem - TestVariousDeadlines1Proc
> fails on amd64:
>
> ...
> --- FAIL: TestVariousDeadlines1Proc (5.87 seconds)
>         timeout_test.go:483: 1ns run 1/3
>         timeout_test.go:503: for 1ns run 1/3, good client timeout after
> 2.9292ms, reading 0 bytes
>                 ...
>         timeout_test.go:483: 750ns run 1/3
>         timeout_test.go:508: for 750ns run 1/3: timeout (2s) waiting for
> client to timeout (750ns) reading
> FAIL

Is it started happening when you set max threads to 0xffffffff? Or it
was happening before?


> I looked high and low, and I cannot see any flaws in our changes. Also,
> If I change TestVariousDeadlines1Proc timeout time from 2s up to 10s,
> then problem goes away (but test sometimes runs for too long). So, to
> me, that proves our netpoll code is correct, it just scheduler is not
> good enough for this test. Perhaps I am wrong about that. Not sure what
> to do here :-(
>
>
>
>> Please also include fresh benchmark results.
>
>
> I have collected some benchmarks
> https://code.google.com/p/go/issues/detail?id=5284#c4 (there is README
> to explain what is what). Like I said before, they are not conclusive.
> If you insist I publish something, help me pick benchmarks to include.
>
> Alex
>
>
>
>
https://codereview.appspot.com/8670044/diff/90001/src/pkg/runtime/netpoll_win...
> File src/pkg/runtime/netpoll_windows.c (right):
>
>
https://codereview.appspot.com/8670044/diff/90001/src/pkg/runtime/netpoll_win...
> src/pkg/runtime/netpoll_windows.c:37: runtime·printf("netpoll: failed to
> create cpio handle (errno=%d)\n", runtime·getlasterror());
> On 2013/06/26 10:23:24, dvyukov wrote:
>>
>> s/cpio/iocp/
>
>
> Done.
>
>
>
https://codereview.appspot.com/8670044/diff/90001/src/pkg/runtime/netpoll_win...
> src/pkg/runtime/netpoll_windows.c:38: runtime·throw("netpoll: failed to
> create cpio handle");
> On 2013/06/26 10:23:24, dvyukov wrote:
>>
>> s/cpio/iocp/
>
>
> Done.
>
> https://codereview.appspot.com/8670044/

dvyukov

On Thu, Jun 27, 2013 at 3:25 PM, Dmitry Vyukov <dvyukov@google.com> wrote: > On Thu, ...

11 years, 9 months ago (2013-06-27 11:27:19 UTC) #58

On Thu, Jun 27, 2013 at 3:25 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Thu, Jun 27, 2013 at 10:52 AM,  <alex.brainman@gmail.com> wrote:
>> On 2013/06/26 10:23:22, dvyukov wrote:
>>
>>> Hope you've done stress testing with GOARCH=386/amd64
>>
>> GOMAXPROCS=1/2/4/8.
>>
>> Done as much as I could.
>>
>>> Please set the maximum number of threads to something very large.
>> Set to 0xffffffff. But now I have a problem - TestVariousDeadlines1Proc
>> fails on amd64:
>>
>> ...
>> --- FAIL: TestVariousDeadlines1Proc (5.87 seconds)
>>         timeout_test.go:483: 1ns run 1/3
>>         timeout_test.go:503: for 1ns run 1/3, good client timeout after
>> 2.9292ms, reading 0 bytes
>>                 ...
>>         timeout_test.go:483: 750ns run 1/3
>>         timeout_test.go:508: for 750ns run 1/3: timeout (2s) waiting for
>> client to timeout (750ns) reading
>> FAIL
>
> Is it started happening when you set max threads to 0xffffffff? Or it
> was happening before?

What if you set max threads to 10000?
Does TestVariousDeadlines4Proc timeout?

>> I looked high and low, and I cannot see any flaws in our changes. Also,
>> If I change TestVariousDeadlines1Proc timeout time from 2s up to 10s,
>> then problem goes away (but test sometimes runs for too long). So, to
>> me, that proves our netpoll code is correct, it just scheduler is not
>> good enough for this test. Perhaps I am wrong about that. Not sure what
>> to do here :-(
>>
>>
>>
>>> Please also include fresh benchmark results.
>>
>>
>> I have collected some benchmarks
>> https://code.google.com/p/go/issues/detail?id=5284#c4 (there is README
>> to explain what is what). Like I said before, they are not conclusive.
>> If you insist I publish something, help me pick benchmarks to include.
>>
>> Alex
>>
>>
>>
>>
https://codereview.appspot.com/8670044/diff/90001/src/pkg/runtime/netpoll_win...
>> File src/pkg/runtime/netpoll_windows.c (right):
>>
>>
https://codereview.appspot.com/8670044/diff/90001/src/pkg/runtime/netpoll_win...
>> src/pkg/runtime/netpoll_windows.c:37: runtime·printf("netpoll: failed to
>> create cpio handle (errno=%d)\n", runtime·getlasterror());
>> On 2013/06/26 10:23:24, dvyukov wrote:
>>>
>>> s/cpio/iocp/
>>
>>
>> Done.
>>
>>
>>
https://codereview.appspot.com/8670044/diff/90001/src/pkg/runtime/netpoll_win...
>> src/pkg/runtime/netpoll_windows.c:38: runtime·throw("netpoll: failed to
>> create cpio handle");
>> On 2013/06/26 10:23:24, dvyukov wrote:
>>>
>>> s/cpio/iocp/
>>
>>
>> Done.
>>
>> https://codereview.appspot.com/8670044/

brainman

On 2013/06/27 11:25:45, dvyukov wrote: > > Is it started happening when you set max ...

11 years, 9 months ago (2013-06-27 11:36:11 UTC) #59

dvyukov

On Thu, Jun 27, 2013 at 3:26 PM, Dmitry Vyukov <dvyukov@google.com> wrote: > On Thu, ...

11 years, 9 months ago (2013-06-27 11:37:15 UTC) #60

brainman

On 2013/06/27 11:27:19, dvyukov wrote: > > What if you set max threads to 10000? ...

11 years, 9 months ago (2013-06-27 11:38:36 UTC) #61

dvyukov

On Thu, Jun 27, 2013 at 3:36 PM, <alex.brainman@gmail.com> wrote: > On 2013/06/27 11:25:45, dvyukov ...

11 years, 9 months ago (2013-06-27 11:39:26 UTC) #62

brainman

On 2013/06/27 11:37:15, dvyukov wrote: > > Does it happen deterministically or not? ... As ...

11 years, 9 months ago (2013-06-27 11:44:14 UTC) #63

brainman

On 2013/06/27 11:39:26, dvyukov wrote: > > I would not be surprised if kernel uses ...

11 years, 9 months ago (2013-06-27 11:45:44 UTC) #64

dvyukov

On 2013/06/27 11:45:44, brainman wrote: > On 2013/06/27 11:39:26, dvyukov wrote: > > > > ...

11 years, 9 months ago (2013-06-27 12:12:23 UTC) #65

brainman

On 2013/06/27 12:12:23, dvyukov wrote: > > Yeah, there is no good to add them, ...

11 years, 9 months ago (2013-06-27 12:14:48 UTC) #66

brainman

> ... TestVariousDeadlines4Proc timeout too? Yes. > ... set max threads to 10000, see if ...

11 years, 9 months ago (2013-06-28 02:43:12 UTC) #67

brainman

On 2013/06/28 02:43:12, brainman wrote: > > I have collected some benchmarks > https://code.google.com/p/go/issues/detail?id=5284#c4 (there ...

11 years, 9 months ago (2013-06-28 06:31:48 UTC) #68

dvyukov

On 2013/06/28 02:43:12, brainman wrote: > > ... TestVariousDeadlines4Proc timeout too? > > Yes. OK, ...

11 years, 9 months ago (2013-06-28 11:15:14 UTC) #69

On 2013/06/28 02:43:12, brainman wrote:
> > ... TestVariousDeadlines4Proc timeout too?
> 
> Yes.

OK, so it's not related to GOMAXPROCS value.

> > ... set max threads to 10000, see if that helps ...
> 
> No difference. It acts just like 0xffffffff.
> 
> I was mistaken yesterday to say that TestVariousDeadlines1Proc works with max
> threads set to 0 - it only works if test is run with -short flag (do 1, not 3
> runs each, and skip tests that go longer then 500*time.Microsecond). It also
> works with max threads set to 0, if I run ALL tests, not just
> TestVariousDeadlines1Proc, -short or no -short.
> 
> It works unconditionally on windows/386, regardless of max threads value ot
> -short flag or running together or single TestVariousDeadlines1Proc test.

and it's not related to NumberOfConcurrentThreads value

> > ... try this patch: https://codereview.appspot.com/10683044
> 
> It does not help. TestVariousDeadlines1Proc breaks as before.

and it's not a problem with scheduler (at least with this particular part of the
scheduler)

> > ... inline io.Copy, and add runtime.Gosched() into the loop, so that it
yields
> every 4K or so ...
> 
> That did the trick. Here https://codereview.appspot.com/10690045/ is my
changed
> test.

OK

> > ... I think you need to extend CL description a bit (for such a significant
> change)...
> 
> I cannot think of what else to say. Do you have a suggestion? It is kind of
all
> the same, but using netpoll functionality.
> 
> > ... Please also include fresh benchmark results.
> 
> I have collected some benchmarks
> https://code.google.com/p/go/issues/detail?id=5284#c4 (there is README to
> explain what is what). Like I said before, they are not conclusive. If you
> insist I publish something, help me pick benchmarks to include.

say that it moves the network poller from net package into runtime
I would include the following benchmarks:
BenchmarkTCP4OneShot	     500	    296856 ns/op
BenchmarkTCP4OneShot-2	     500	    324198 ns/op
BenchmarkTCP4OneShot-3	     500	    312480 ns/op
BenchmarkTCP4OneShotTimeout	    1000	    294903 ns/op
BenchmarkTCP4OneShotTimeout-2	    1000	    343728 ns/op
BenchmarkTCP4OneShotTimeout-3	     500	    355446 ns/op
BenchmarkTCP4Persistent	    5000	     33396 ns/op
BenchmarkTCP4Persistent-2	    5000	     34763 ns/op
BenchmarkTCP4Persistent-3	    5000	     36911 ns/op
BenchmarkTCP4PersistentTimeout	    5000	     34177 ns/op
BenchmarkTCP4PersistentTimeout-2	    5000	     35739 ns/op
BenchmarkTCP4PersistentTimeout-3	    5000	     36130 ns/op

dvyukov

Regarding the test failure, I don't have any good ideas now... You've said that if ...

11 years, 9 months ago (2013-06-28 11:22:28 UTC) #70

brainman

On 2013/06/28 11:15:14, dvyukov wrote: > I would include the following benchmarks: > BenchmarkTCP4OneShot 500 ...

11 years, 9 months ago (2013-06-28 12:00:55 UTC) #71

dvyukov

On Fri, Jun 28, 2013 at 4:00 PM, <alex.brainman@gmail.com> wrote: > On 2013/06/28 11:15:14, dvyukov ...

11 years, 9 months ago (2013-06-28 12:10:15 UTC) #72

brainman

On 2013/06/28 11:22:28, dvyukov wrote: > Regarding the test failure, I don't have any good ...

11 years, 9 months ago (2013-06-28 12:14:30 UTC) #73

dvyukov

On Fri, Jun 28, 2013 at 4:14 PM, <alex.brainman@gmail.com> wrote: > On 2013/06/28 11:22:28, dvyukov ...

11 years, 9 months ago (2013-06-28 12:28:27 UTC) #74

On Fri, Jun 28, 2013 at 4:14 PM,  <alex.brainman@gmail.com> wrote:
> On 2013/06/28 11:22:28, dvyukov wrote:
>>
>> Regarding the test failure, I don't have any good ideas now...
>
> How about changing the test as per my new CL?

Well, if it's a bug in implementation, it's not a good idea to just
update the test to account for the bug.


>> You've said that if you increase timeout to 10 seconds, then it
>
> passes, but
>>
>> sometimes takes long time, right?
>
>
> Yes, but what is the limit? 2s? 10s? 1m? 1h? Who came up with 2s? Why
> should it end in 2s? Also, there are many short tests there (start with
> 1ns or something). If each takes 10s, the total test takes too long.

I think it's 2s, because the max timeout in the test is 1s. So 2s must
be enough for the timeout to happen.



>> I do not understand what is the potential difference between amd64 and
>
> 386...
>
> It is not really the arch, but the way OS works. Remember old OS version
> (386) does not have CancelEx function, so the execution path is
> completely different. For example on 386 all IO is started and canceled
> on single thread. So all thread switching pattern is very different.

Ah, I see, makes sense.


> I am just guessing. :-)
>
>
>> I would add debug prints into netpoller and timers, and try to
>
> understand what
>>
>> exactly happens wrong.
>> E.g. (1) client/server just continue the communication, timeout does
>
> not fire
>>
>> or (2) client/server just continue the communication, timeout fire but
>
> take no
>>
>> effect
>> or (3) client/server hang, where?
>> or (4) something else
>
>
> I put panic("CRASH") in the test where t.Fatal is, so I can see stack
> trace. I can see goroutine waiting in park() in runtime inside
> runtime_WaitCancelled (or whatever it is called). So, it means, timeout
> is fired, and we called CancelIO and now we waiting for IO to get
> canceled, but it has not happened yet. But, given that it works if I
> extend wait till 10s it works, it means that IO eventually competes. So,
> I would say it is (3), but it does not hang, just takes longer then we
> expect it to. Probably, because it is busy doing other things.


It's already something.
What other things can it do? There must be not much other things...

brainman

On 2013/06/28 12:28:27, dvyukov wrote: > What other things can it do? There must be ...

11 years, 9 months ago (2013-06-28 12:32:28 UTC) #75

dvyukov

Maybe it correlates with some cold error paths, e.g. WSARecv returns nil instead of ERROR_IO_PENDING ...

11 years, 9 months ago (2013-06-28 12:51:15 UTC) #76

brainman

message: On 2013/06/28 12:28:27, dvyukov wrote: > ... > What other things can it do? ...

11 years, 9 months ago (2013-07-05 02:58:58 UTC) #77

brainman

Hello dvyukov@google.com (cc: golang-dev@googlegroups.com), Please take another look.

11 years, 9 months ago (2013-07-05 02:59:33 UTC) #78

dvyukov

On 2013/07/05 02:58:58, brainman wrote: > message: On 2013/06/28 12:28:27, dvyukov wrote: > > ...

11 years, 9 months ago (2013-07-17 10:42:15 UTC) #79

On 2013/07/05 02:58:58, brainman wrote:
> message: On 2013/06/28 12:28:27, dvyukov wrote:
> > ...
> > What other things can it do? There must be not much other things...
> 
> I can see that net timeouts aren't fired. They fired but they take longer then
> they should. So I started looking at what they do, and they happened to be
> sitting in WaitForSingleObject in runtime.semasleep for much longer then they
> should - WaitForSingleObject gets passed *timeout* parameters, but it times
out
> after much longer time then *timeout* says. I was thinking about why, and the
> only good idea I had is that something else is hogging CPU while we waiting in
> WaitForSingleObject. So I started looking for any new busy loops that my
change
> could introduce, and the only thing that I could come up with is
> GetQueuedCompletionStatus function. I changed GetQueuedCompletionStatus "wait"
> parameter from 0 to 1, and that fixes broken test. Here
> 
> 79c79,80
> < 		wait = 0;
> ---
> > 		// TODO(brainman): should use 0 here instead, but scheduler hogs CPU
> > 		wait = 1;
> 
> is the netpoll_windows.c change. Perhaps when GetQueuedCompletionStatus gets
> called with 0 (do-not-wait) it is very quick when data is not there, but
calling
> it with 1 (wait for 1ms) makes it go into OS scheduler or something.
> 
> I am not convinced it is correct solution. So I put TODO to look at it later.
> Perhaps Go scheduler needs to be adjusted to accommodate for what is happening
> here. Why should we wait in GetQueuedCompletionStatus unless we deliberately
> want to do so? One thing I did noticed is that windows runtime.nanotime works
in
> ~15ms increments - maybe that confuses Go scheduler.
> 
> Anyway, I am happy to submit the CL with the latest change. There is still
> plenty more work to do, but we have something that works for the moment. I
have
> updated CL description and added benchmark output.
> 
> Please let me now what to do next. Thank you.


Good idea about busy loops.
scheduler sysmon thread can potentially call netpoll() in an almost busy loop.
However in between it calls usleep(). Probably windows ignores small values in
usleep() and lets it run all the time. Different runnable thread still should be
alternated at 15ms intervals, so probably IO priority boosting cames into
picture (however sysmon does not do any IO, but it calls GQCS, so probably OS
thinks that it's IO thread). And it all makes sense only if you run on a single
core machine, because otherwise different threads should run in parallel. Is the
machine single core?

In any case please try the following patch, and temporary change wait to 0 for
non-blocking netpoll calls, and check if it helps.

diff -r dc24634de6c5 src/pkg/runtime/proc.c
--- a/src/pkg/runtime/proc.c	Wed Jul 17 11:04:34 2013 +0200
+++ b/src/pkg/runtime/proc.c	Wed Jul 17 14:40:15 2013 +0400
@@ -2086,6 +2086,7 @@
 		lastpoll = runtime·atomicload64(&runtime·sched.lastpoll);
 		now = runtime·nanotime();
 		if(lastpoll != 0 && lastpoll + 10*1000*1000 > now) {
+			runtime·cas64(&runtime·sched.lastpoll, lastpoll, now);
 			gp = runtime·netpoll(false);  // non-blocking
 			injectglist(gp);
 		}

If it does not help, we will land current version.

brainman

On 2013/07/17 10:42:15, dvyukov wrote: > ... Different runnable thread still should be > alternated ...

11 years, 9 months ago (2013-07-19 04:50:16 UTC) #80

dvyukov

On 2013/07/19 04:50:16, brainman wrote: > On 2013/07/17 10:42:15, dvyukov wrote: > > ... Different ...

11 years, 9 months ago (2013-07-19 12:00:02 UTC) #81

On 2013/07/19 04:50:16, brainman wrote:
> On 2013/07/17 10:42:15, dvyukov wrote:
> > ... Different runnable thread still should be
> > alternated at 15ms intervals, so probably IO priority boosting cames into
> > picture (however sysmon does not do any IO, but it calls GQCS, so probably
OS
> > thinks that it's IO thread).
> 
> Windows threads can be assigned different priorities. Can we use these to
> provide better control?


I would like to avoid thread priorities.


> > ... And it all makes sense only if you run on a single
> > core machine, because otherwise different threads should run in parallel. Is
> the
> > machine single core?
> 
> It is. All other pcs I have are multiple cpus. It is the only pc that fails
> TestVariousDeadlines1Proc.


Aha!
Then it's a busy loop with high probability.



> > In any case please try the following patch, and temporary change wait to 0
for
> > non-blocking netpoll calls, and check if it helps.
> > 
> 
> It does not help. Your change does not build, so I had to change it to:
> 
> diff -r 2bbca155a87f src/pkg/runtime/proc.c
> --- a/src/pkg/runtime/proc.c	Thu Jul 04 14:24:21 2013 +1000
> +++ b/src/pkg/runtime/proc.c	Fri Jul 19 14:37:43 2013 +1000
> @@ -2043,7 +2043,7 @@
>  sysmon(void)
>  {
>  	uint32 idle, delay;
> -	int64 now, lastpoll;
> +	uint64 now, lastpoll;
>  	G *gp;
>  	uint32 ticks[MaxGomaxprocs];
>  
> @@ -2073,6 +2073,7 @@
>  		lastpoll = runtime·atomicload64(&runtime·sched.lastpoll);
>  		now = runtime·nanotime();
>  		if(lastpoll != 0 && lastpoll + 10*1000*1000 > now) {
> +			runtime·cas64(&runtime·sched.lastpoll, &lastpoll, now);
>  			gp = runtime·netpoll(false);  // non-blocking
>  			injectglist(gp);
>  		}
> > 
> > If it does not help, we will land current version.
> 
> Sounds good. What should I do now?


it LGTM
it would be nice if somebody else take a look, but if not run the tests once
again and submit in a day or two

brainman

11 years, 9 months ago (2013-07-22 02:50:04 UTC) #82

*** Submitted as https://code.google.com/p/go/source/detail?r=e6a6dc0d9c22 ***

net: implement netpoll for windows

Moves the network poller from net package into runtime.

benchmark                           old ns/op    new ns/op    delta
BenchmarkTCP4OneShot                   316386       287061   -9.27%
BenchmarkTCP4OneShot-2                 339822       313424   -7.77%
BenchmarkTCP4OneShot-3                 330057       306589   -7.11%
BenchmarkTCP4OneShotTimeout            341775       287061  -16.01%
BenchmarkTCP4OneShotTimeout-2          380835       295849  -22.32%
BenchmarkTCP4OneShotTimeout-3          398412       328070  -17.66%
BenchmarkTCP4Persistent                 40622        33392  -17.80%
BenchmarkTCP4Persistent-2               44528        35736  -19.74%
BenchmarkTCP4Persistent-3               44919        36907  -17.84%
BenchmarkTCP4PersistentTimeout          45309        33588  -25.87%
BenchmarkTCP4PersistentTimeout-2        50289        38079  -24.28%
BenchmarkTCP4PersistentTimeout-3        51559        37103  -28.04%
BenchmarkTCP6OneShot                   361305       345645   -4.33%
BenchmarkTCP6OneShot-2                 361305       331976   -8.12%
BenchmarkTCP6OneShot-3                 376929       347598   -7.78%
BenchmarkTCP6OneShotTimeout            361305       322212  -10.82%
BenchmarkTCP6OneShotTimeout-2          378882       333928  -11.86%
BenchmarkTCP6OneShotTimeout-3          388647       335881  -13.58%
BenchmarkTCP6Persistent                 47653        35345  -25.83%
BenchmarkTCP6Persistent-2               49215        35736  -27.39%
BenchmarkTCP6Persistent-3               38474        37493   -2.55%
BenchmarkTCP6PersistentTimeout          56637        34369  -39.32%
BenchmarkTCP6PersistentTimeout-2        42575        38079  -10.56%
BenchmarkTCP6PersistentTimeout-3        44137        37689  -14.61%

R=dvyukov
CC=golang-dev
https://codereview.appspot.com/8670044

Expand All Messages | Collapse All Messages