Issue 4997044: code review 4997044: runtime: eliminate handle churn when churning channels ...

Issue 4997044: code review 4997044: runtime: eliminate handle churn when churning channels ... (Closed)

Can't Edit
Can't Publish+Mail
Start Review

Created:
13 years, 10 months ago by hector

Modified:
13 years, 10 months ago

Reviewers:

CC:
rsc, dvyukov, brainman, jp, golang-dev

Visibility:
Public.

Description

runtime: eliminate handle churn when churning channels on Windows The Windows implementation of the net package churns through a couple of channels for every read/write operation. This translates into a lot of time spent in the kernel creating and deleting event objects.

Patch Set 1 #

Patch Set 2 : diff -r 546f21eebee8 https://go.googlecode.com/hg/ #

Total comments: 4

Patch Set 3 : diff -r 546f21eebee8 https://go.googlecode.com/hg/ #

Total comments: 10

Patch Set 4 : diff -r 546f21eebee8 https://go.googlecode.com/hg/ #

Total comments: 6

Patch Set 5 : diff -r 2302c9faa3ff https://go.googlecode.com/hg/ #

Total comments: 2

Created: 13 years, 10 months ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+43 lines, -21 lines)			Patch
M	src/pkg/runtime/mgc0.c	View	1 2 3 4	1 chunk	+0 lines, -2 lines	0 comments	Download
M	src/pkg/runtime/runtime.h	View	1 2 3 4	2 chunks	+8 lines, -3 lines	0 comments	Download
M	src/pkg/runtime/windows/thread.c	View	1 2 3 4	3 chunks	+35 lines, -16 lines	2 comments	Download

Messages

Total messages: 18

Expand All Messages | Collapse All Messages

hector

Hello rsc@golang.org, dvyukov@google.com, alex.brainman@gmail.com (cc: golang-dev@googlegroups.com), I'd like you to review this change to https://go.googlecode.com/hg/

13 years, 10 months ago (2011-09-10 17:46:25 UTC) #1

Nice. http://codereview.appspot.com/4997044/diff/9001/src/pkg/runtime/windows/thread.c File src/pkg/runtime/windows/thread.c (right): http://codereview.appspot.com/4997044/diff/9001/src/pkg/runtime/windows/thread.c#newcode133 src/pkg/runtime/windows/thread.c:133: M *m; rename to avoid confusion with global ...

13 years, 10 months ago (2011-09-11 07:23:27 UTC) #2

hector

http://codereview.appspot.com/4997044/diff/9001/src/pkg/runtime/windows/thread.c File src/pkg/runtime/windows/thread.c (right): http://codereview.appspot.com/4997044/diff/9001/src/pkg/runtime/windows/thread.c#newcode133 src/pkg/runtime/windows/thread.c:133: M *m; On 2011/09/11 07:23:27, jp wrote: > rename ...

13 years, 10 months ago (2011-09-11 08:37:05 UTC) #3

hector

This program doesn't terminate with this change. package main import ( "fmt" "runtime" "sync" "time" ...

13 years, 10 months ago (2011-09-11 12:03:52 UTC) #4

dvyukov

http://codereview.appspot.com/4997044/diff/12001/src/pkg/runtime/windows/thread.c File src/pkg/runtime/windows/thread.c (right): http://codereview.appspot.com/4997044/diff/12001/src/pkg/runtime/windows/thread.c#newcode126 src/pkg/runtime/windows/thread.c:126: while(m->waitm != m) this has to be load-acquire, because ...

13 years, 10 months ago (2011-09-11 13:00:04 UTC) #5

dvyukov

On 2011/09/11 12:03:52, hector wrote: > This program doesn't terminate with this change. > I'll ...

13 years, 10 months ago (2011-09-11 13:03:03 UTC) #6

hector

PTAL. I looked into using a FIFO queue, but the implementation I looked at (Michael ...

13 years, 10 months ago (2011-09-12 10:34:18 UTC) #7

PTAL. I looked into using a FIFO queue, but the implementation I looked at
(Michael and Scott) requires a dummy node, which implies that it depends on
malloc.

http://codereview.appspot.com/4997044/diff/12001/src/pkg/runtime/windows/thre...
File src/pkg/runtime/windows/thread.c (right):

http://codereview.appspot.com/4997044/diff/12001/src/pkg/runtime/windows/thre...
src/pkg/runtime/windows/thread.c:126: while(m->waitm != m)
On 2011/09/11 13:00:05, dvyukov wrote:
> this has to be load-acquire, because it acquires the lock

Done.

http://codereview.appspot.com/4997044/diff/12001/src/pkg/runtime/windows/thre...
src/pkg/runtime/windows/thread.c:140: if(m = runtime·atomicloadp(&l->waitm))
On 2011/09/11 13:00:05, dvyukov wrote:
> Unbounded active spinning is bad for performance. Especially on single core
> machines. Here you basically say, if l->waitm is not yet filled, eat whole my
> quantum in this loop, and only them switch to that other thread to finally
fill
> it.
> Add runtime.procyield() into the loop + episodic runtime.osyield().

Done.

http://codereview.appspot.com/4997044/diff/12001/src/pkg/runtime/windows/thre...
src/pkg/runtime/windows/thread.c:144: m->waitm = m;
On 2011/09/11 13:00:05, dvyukov wrote:
> this has to be store-release, because it releases the lock

Done.

http://codereview.appspot.com/4997044/diff/12001/src/pkg/runtime/windows/thre...
src/pkg/runtime/windows/thread.c:145: runtime·stdcall(runtime·NtAlertThread, 1,
m->hthread);
On 2011/09/11 13:00:05, dvyukov wrote:
> If NtAlertThread(T) is called before NtDelayExecution(T), NtDelayExecution(T)
> will return immediately, right? It's not clear from docs.

I just tested this and it turns out it __doesn't__.  I will go back to using
event objects, but only one per M.

http://codereview.appspot.com/4997044/diff/12001/src/pkg/runtime/windows/thre...
src/pkg/runtime/windows/thread.c:198: m, (uintptr)4/*CREATE_SUSPENDED*/, nil);
On 2011/09/11 13:00:05, dvyukov wrote:
> What is reason to create a suspended thread and then to resume it?

So there isn't a race to set m->hthread = thandle.

hector

I think the spinning in unlock() when l->waitm == nil is unnecessary - it should ...

13 years, 10 months ago (2011-09-12 16:08:28 UTC) #8

I think the spinning in unlock()  when l->waitm == nil is unnecessary - it
should be enough to cas it to -1, and let the thread in lock() pick this up
and run with it.  Am I off base here?
On Sep 12, 2011 11:34 AM, <hectorchu@gmail.com> wrote:
> PTAL. I looked into using a FIFO queue, but the implementation I looked
> at (Michael and Scott) requires a dummy node, which implies that it
> depends on malloc.
>
>
>
http://codereview.appspot.com/4997044/diff/12001/src/pkg/runtime/windows/thre...
> File src/pkg/runtime/windows/thread.c (right):
>
>
http://codereview.appspot.com/4997044/diff/12001/src/pkg/runtime/windows/thre...
> src/pkg/runtime/windows/thread.c:126: while(m->waitm != m)
> On 2011/09/11 13:00:05, dvyukov wrote:
>> this has to be load-acquire, because it acquires the lock
>
> Done.
>
>
http://codereview.appspot.com/4997044/diff/12001/src/pkg/runtime/windows/thre...
> src/pkg/runtime/windows/thread.c:140: if(m =
> runtime·atomicloadp(&l->waitm))
> On 2011/09/11 13:00:05, dvyukov wrote:
>> Unbounded active spinning is bad for performance. Especially on single
> core
>> machines. Here you basically say, if l->waitm is not yet filled, eat
> whole my
>> quantum in this loop, and only them switch to that other thread to
> finally fill
>> it.
>> Add runtime.procyield() into the loop + episodic runtime.osyield().
>
> Done.
>
>
http://codereview.appspot.com/4997044/diff/12001/src/pkg/runtime/windows/thre...
> src/pkg/runtime/windows/thread.c:144: m->waitm = m;
> On 2011/09/11 13:00:05, dvyukov wrote:
>> this has to be store-release, because it releases the lock
>
> Done.
>
>
http://codereview.appspot.com/4997044/diff/12001/src/pkg/runtime/windows/thre...
> src/pkg/runtime/windows/thread.c:145:
> runtime·stdcall(runtime·NtAlertThread, 1, m->hthread);
> On 2011/09/11 13:00:05, dvyukov wrote:
>> If NtAlertThread(T) is called before NtDelayExecution(T),
> NtDelayExecution(T)
>> will return immediately, right? It's not clear from docs.
>
> I just tested this and it turns out it __doesn't__. I will go back to
> using event objects, but only one per M.
>
>
http://codereview.appspot.com/4997044/diff/12001/src/pkg/runtime/windows/thre...
> src/pkg/runtime/windows/thread.c:198: m, (uintptr)4/*CREATE_SUSPENDED*/,
> nil);
> On 2011/09/11 13:00:05, dvyukov wrote:
>> What is reason to create a suspended thread and then to resume it?
>
> So there isn't a race to set m->hthread = thandle.
>
> http://codereview.appspot.com/4997044/

rsc

I agree with Hector's idea about changing l.waitm from nil to LOCK_HELD as a quick ...

13 years, 10 months ago (2011-09-12 17:37:16 UTC) #9

I agree with Hector's idea about changing 
l.waitm from nil to LOCK_HELD as a quick exit.
That should eliminate the spinning nicely.

http://codereview.appspot.com/4997044/diff/15001/src/pkg/runtime/windows/thre...
File src/pkg/runtime/windows/thread.c (right):

http://codereview.appspot.com/4997044/diff/15001/src/pkg/runtime/windows/thre...
src/pkg/runtime/windows/thread.c:158: // someone else has it; wait
This code deserves a bit more explanation.  I suggest:

Insert above this function:
#define LOCK_HELD ((M*)-1)

Rename M.waitm to M.nextwaitm.

This becomes:

// Someone else has it.
// l->waitm points to a linked list of M's waiting for this lock,
// chained through m->nextwaitm.
// To pass the lock to this m, another M will set m->waitm = LOCK_HELD
// and signal m->event.

// Queue.
for(;;) {
...
}

// Wait.
while(runtime...)

http://codereview.appspot.com/4997044/diff/15001/src/pkg/runtime/windows/thre...
src/pkg/runtime/windows/thread.c:172: M *m;
Please call this M *mp;

There is a global m, and it is easier to read this code
if you understand that it is not referring to the global m.

http://codereview.appspot.com/4997044/diff/15001/src/pkg/runtime/windows/thre...
src/pkg/runtime/windows/thread.c:173: uint32 i = 0, spin = 0;
initializations separate from declaration please.
move spin = 0 down before if(proccount > 1)
and i = 0 down before for(;;).

http://codereview.appspot.com/4997044/diff/15001/src/pkg/runtime/windows/thre...
src/pkg/runtime/windows/thread.c:175: if(runtime·xadd(&l->key, -1) == 0)
Same comments about explanation.
The comments I'd want to see as a reader are:

// Other M's are waiting for the lock.  Wake one.

if(proccount == 0)
...

// Wait for an M to appear on the waiting list and dequeue it.
for(;;) {
...
}

// Wake that M.
runtime.atomicstorep(&m->nextwaitm, LOCK_HELD);
runtime.stdcall(runtime.SetEvent, 1, m->event);

http://codereview.appspot.com/4997044/diff/15001/src/pkg/runtime/windows/thre...
src/pkg/runtime/windows/thread.c:185: while(m = runtime·atomicloadp(&l->waitm))
Please write while((m = ...) != nil)
just so that it's clear that's not an ==.

http://codereview.appspot.com/4997044/diff/15001/src/pkg/runtime/windows/thre...
src/pkg/runtime/windows/thread.c:188: if(++i % (spin + 1) > 0)
Easier to read:

if(i++ < spin)
    runtime.procyield(ACTIVE_SPIN_CNT);
else {
    i = 0;
    runtime.osyield();
}

hector

Hello rsc@golang.org, dvyukov@google.com, alex.brainman@gmail.com, jp@webmaster.ms (cc: golang-dev@googlegroups.com), Please take another look.

13 years, 10 months ago (2011-09-12 23:24:09 UTC) #10

hector

http://codereview.appspot.com/4997044/diff/27001/src/pkg/runtime/windows/thread.c File src/pkg/runtime/windows/thread.c (right): http://codereview.appspot.com/4997044/diff/27001/src/pkg/runtime/windows/thread.c#newcode132 src/pkg/runtime/windows/thread.c:132: m->nextwaitm = runtime·atomicloadp(&l->waitm); Does this need to be atomicstorep(&m->nextwaitm, ...

13 years, 10 months ago (2011-09-13 11:06:12 UTC) #11

brainman

On 2011/09/13 11:06:12, hector wrote: > http://codereview.appspot.com/4997044/diff/27001/src/pkg/runtime/windows/thread.c > File src/pkg/runtime/windows/thread.c (right): > Sorry Hector, I ...

13 years, 10 months ago (2011-09-13 13:02:57 UTC) #12

hector

No problem and perfectly understood Alex. I agree it is a tricky topic, these lock-free ...

13 years, 10 months ago (2011-09-13 13:22:24 UTC) #13

hector

13 years, 10 months ago (2011-09-13 15:54:48 UTC) #14

dvyukov

On 2011/09/13 13:22:24, hector wrote: > No problem and perfectly understood Alex. I agree it ...

13 years, 10 months ago (2011-09-13 17:42:32 UTC) #15

hector

Correct me if I'm wrong, but as far as I'm aware there remains no more ...

13 years, 10 months ago (2011-09-15 00:03:29 UTC) #16

rsc

13 years, 10 months ago (2011-09-15 00:23:27 UTC) #18

*** Submitted as http://code.google.com/p/go/source/detail?r=af0ac80bbb92 ***

runtime: eliminate handle churn when churning channels on Windows

The Windows implementation of the net package churns through a couple of
channels for every read/write operation.  This translates into a lot of time
spent in the kernel creating and deleting event objects.

R=rsc, dvyukov, alex.brainman, jp
CC=golang-dev
http://codereview.appspot.com/4997044

Committer: Russ Cox <rsc@golang.org>

Expand All Messages | Collapse All Messages