Issue 9868044: code review 9868044: runtime: refactor gogo

Issue 9868044: code review 9868044: runtime: refactor gogo (Closed)

Can't Edit
Can't Publish+Mail
Start Review

Created:
12 years, 1 month ago by dvyukov

Modified:
12 years ago

Reviewers:
golang-dev, capnm, rsc, ality, iant

Visibility:
Public.

Description

runtime: refactor gogo Instead of 3 overlapping functions (gogo, gogocall, gogocallfn) introduce 2 non-overlapping functions (gogo, returnto). Required for preemptive scheduler, otherwise it would need to introduce yet another function -- gogo2 (same as gogo, but restores DX instead of AX).

Patch Set 1 #

Patch Set 2 : diff -r 63b01ec901a6 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 3 : diff -r 63b01ec901a6 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 4 : diff -r 63b01ec901a6 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 5 : diff -r 63b01ec901a6 https://dvyukov%40google.com@code.google.com/p/go/ #

Total comments: 4

Patch Set 6 : diff -r e3f2b59f2404 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 7 : diff -r e3f2b59f2404 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 8 : diff -r baa90b763ecd https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 9 : diff -r baa90b763ecd https://dvyukov%40google.com@code.google.com/p/go/ #

Total comments: 6

Patch Set 10 : diff -r baa90b763ecd https://dvyukov%40google.com@code.google.com/p/go/ #

Total comments: 2

Created: 12 years, 1 month ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+106 lines, -138 lines)			Patch
M	src/pkg/runtime/asm_386.s	View	1 2 3 4 5 6 7	5 chunks	+20 lines, -42 lines	2 comments	Download
M	src/pkg/runtime/asm_amd64.s	View	1 2 3 4 5 6 7	4 chunks	+19 lines, -41 lines	0 comments	Download
M	src/pkg/runtime/asm_arm.s	View	1 2 3 4 5 6 7	5 chunks	+16 lines, -37 lines	0 comments	Download
M	src/pkg/runtime/mgc0.c	View	1 2 3 4 5 6	1 chunk	+1 line, -1 line	0 comments	Download
M	src/pkg/runtime/panic.c	View	1	1 chunk	+1 line, -1 line	0 comments	Download
M	src/pkg/runtime/proc.c	View	1 2 3 4 5 6 7 8 9	5 chunks	+35 lines, -4 lines	0 comments	Download
M	src/pkg/runtime/runtime.h	View	1 2 3 4 5 6 7	2 chunks	+3 lines, -4 lines	0 comments	Download
M	src/pkg/runtime/stack.c	View	1 2 3 4 5 6 7	3 chunks	+11 lines, -8 lines	0 comments	Download

Messages

Total messages: 23

Expand All Messages | Collapse All Messages

dvyukov

Hello golang-dev@googlegroups.com, I'd like you to review this change to https://dvyukov%40google.com@code.google.com/p/go/

12 years, 1 month ago (2013-05-30 13:17:23 UTC) #1

dvyukov

I absolutely have no idea what I am doing wrt arm. Can somebody please test ...

12 years, 1 month ago (2013-05-30 13:18:11 UTC) #2

iant

I think we need some good docs for gogo and returnto somewhere. https://codereview.appspot.com/9868044/diff/11001/src/pkg/runtime/asm_arm.s File src/pkg/runtime/asm_arm.s ...

12 years, 1 month ago (2013-05-31 22:35:52 UTC) #3

dvyukov

PTAL >I think we need some good docs for gogo and returnto somewhere. I am ...

12 years, 1 month ago (2013-06-01 19:59:18 UTC) #4

iant

On 2013/06/01 19:59:18, dvyukov wrote: > PTAL > > >I think we need some good ...

12 years, 1 month ago (2013-06-03 04:30:20 UTC) #5

dvyukov

On 2013/06/03 04:30:20, iant wrote: > On 2013/06/01 19:59:18, dvyukov wrote: > > PTAL > ...

12 years, 1 month ago (2013-06-03 13:59:33 UTC) #6

iant

https://codereview.appspot.com/9868044/diff/28001/src/pkg/runtime/proc.c File src/pkg/runtime/proc.c (right): https://codereview.appspot.com/9868044/diff/28001/src/pkg/runtime/proc.c#newcode30 src/pkg/runtime/proc.c:30: // AX holds return value from C functions and ...

12 years, 1 month ago (2013-06-03 14:58:56 UTC) #7

dvyukov

On 2013/05/30 13:18:11, dvyukov wrote: > I absolutely have no idea what I am doing ...

12 years, 1 month ago (2013-06-03 15:06:15 UTC) #8

dvyukov

12 years, 1 month ago (2013-06-03 15:09:27 UTC) #9

capnm

On 2013/06/03 15:06:15, dvyukov wrote: > Can somebody with ARM access please test this? Linux ...

12 years, 1 month ago (2013-06-03 15:46:13 UTC) #10

dvyukov

On 2013/06/03 15:46:13, capnm wrote: > On 2013/06/03 15:06:15, dvyukov wrote: > > Can somebody ...

12 years, 1 month ago (2013-06-03 15:54:32 UTC) #11

rsc

NOT LGTM I don't understand what is going on here. Why is the old API ...

12 years, 1 month ago (2013-06-03 20:33:48 UTC) #12

dvyukov

https://codereview.appspot.com/9868044/diff/29009/src/pkg/runtime/asm_386.s File src/pkg/runtime/asm_386.s (right): https://codereview.appspot.com/9868044/diff/29009/src/pkg/runtime/asm_386.s#newcode156 src/pkg/runtime/asm_386.s:156: // void returnto(Gobuf*, byte*) On 2013/06/03 20:33:49, rsc wrote: ...

12 years, 1 month ago (2013-06-04 10:31:00 UTC) #13

dvyukov

On 2013/06/03 20:33:48, rsc wrote: > NOT LGTM > > I don't understand what is ...

12 years, 1 month ago (2013-06-04 10:59:51 UTC) #14

On 2013/06/03 20:33:48, rsc wrote:
> NOT LGTM
> 
> I don't understand what is going on here.
> Why is the old API so bad it needs to be replaced wholesale?
> 
> The old API was carefully crafted to create an abstract interface that could
be
> implemented on both x86 and ARM, despite the different calling conventions.
> 
> It is fine to add a context argument to runtime.gogo.
> It sounds like maybe that's all you really need.

The early prototype of preemptive scheduler contained the following code in
schedule():

 	if(gp->sched.pc == (byte*)runtime·goexit)  // kickoff
 		runtime·gogocallfn(&gp->sched, gp->fnstart);
+	if(gp->preempt) {
+		if(gp->preempt == 1) {
+			gp->preempt = 0;
+			// preempted w/o new stack segment creation, just need to restore DX
+			runtime·gogo2(&gp->sched, gp->preemptcret);
+		} else {
+			gp->preempt = 0;
+			// preempted with new stack segment creation, need to restore DX and return
to lessstack
+			runtime·gogocall(&gp->sched, gp->preemptpc, gp->preemptcret);
+		}
+	}
 	runtime·gogo(&gp->sched, 0);

That's 4 (four!) different ways to switch to a goroutine.
Much more permutations of "and return to"/"just switch", "restore AX"/"do not
restore AX", "restore DX"/"do not restore DX", "use separate PC and
context"/"use Func" are possible in future.
On top of that, non-started goroutines looks different from started, so we have
special cases for that in gentraceback() and addstackroots().
On top of that, 2 cases of preempted goroutines (with new stack segment and w/o)
will require yet more special handling in gentraceback() and addstackroots().

All this stems form the non-orthogonal API. What we essentially need is: (1)
arrange to return to a PC when stack segment underflows w/o switching right now
(this is returnto()) and switch to a full goroutine context (PC, SP, AX and DX,
this is gogo()). This allows to:
- arrange to return to goexit() directly in newproc()
- arrange to return to lessstack() directly in newstack()
This is turn allows to:
- switch goroutines in schedule() with a single gogo() call
- remove special handling on non-started goroutines in gentraceback() and
addstackroots()
- do not add any special handling for preempted goroutines to gentraceback() and
addstackroots()
And I think it's just a better orthogonal API, which provides better foundation
for potential future changes.

I do not get it right for ARM at first try. But the idea is to push PC onto
stack in returnto(), then gogo() retrieves the PC from stack and restores it in
LR. I think it should work.

rsc

Okay, I see what you are getting at. But the naming is confusing, and the ...

12 years, 1 month ago (2013-06-05 14:39:08 UTC) #15

dvyukov

On Wed, Jun 5, 2013 at 6:39 PM, Russ Cox <rsc@golang.org> wrote: > Okay, I ...

12 years, 1 month ago (2013-06-05 14:56:36 UTC) #16

dvyukov

On Wed, Jun 5, 2013 at 6:56 PM, Dmitry Vyukov <dvyukov@google.com> wrote: > On Wed, ...

12 years, 1 month ago (2013-06-05 15:00:13 UTC) #17

rsc

On Wed, Jun 5, 2013 at 11:00 AM, Dmitry Vyukov <dvyukov@google.com> wrote: > On Wed, ...

12 years, 1 month ago (2013-06-05 15:09:02 UTC) #18

On Wed, Jun 5, 2013 at 11:00 AM, Dmitry Vyukov <dvyukov@google.com> wrote:

> On Wed, Jun 5, 2013 at 6:56 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> > On Wed, Jun 5, 2013 at 6:39 PM, Russ Cox <rsc@golang.org> wrote:
> >> Okay, I see what you are getting at. But the naming is confusing, and
> the
> >> new API isn't great either. Instead of tacking more arguments onto gogo,
> >> that stuff really belongs in the Gobuf
> >
> > Agree.
> >
> >
> >> , which represents the paused
> >> goroutine's state. What you're calling returnto is really the beginning
> of a
> >> call. I understand how you got to "return" but that's not really what it
> >> does: it's gogocall not gogoreturn. A better name would be gostartcall,
> so
> >> that the current gogocall = gostartcall + gogo. And if we do this, the
> whole
> >> concept of an unstarted goroutine should be removed completely.
> >
> > Agree.
> >
> >> Can you try something like CL 10036044 instead? It mostly works but some
> >> tests fail due to garbage collection problems that I don't understand.
> >
> > I will look into making your CL work.
>
> Shouldn't it be the other way around? Start call to goexit, then switch to
> fn?
>

Maybe it's the other way around in your CL (I didn't look closely), but the
way I would like to approach this is to think of


> 1547 newg->sched.pc = (byte*)runtime·goexit;
> 1548 newg->sched.g = newg;
>

as creating a goroutine that is about to run goexit.


> 1549 » runtime·gostartcallfn(&newg->sched, fn);
>

And then this pushes a call to fn ahead of whatever the goroutine was going
to do. So the net result is that the goroutine calls fn, and then when fn
returns, the goroutine calls goexit.


> This can have problems with GC, because it does not understand
> non-started functions yet. For the non-started goroutine fn has not
> yet decremented SP to form stack frame, but the GC assumes it does.
>

The stack walker that we had before all the new GC work understood that if
pc == f->entry then there are no locals, only arguments. I attempted to add
that to addframeroots (see the CL), but it is possible other fixes are
needed elsewhere.

Russ

dvyukov

On Wed, Jun 5, 2013 at 7:09 PM, Russ Cox <rsc@golang.org> wrote: > On Wed, ...

12 years, 1 month ago (2013-06-05 15:14:20 UTC) #19

On Wed, Jun 5, 2013 at 7:09 PM, Russ Cox <rsc@golang.org> wrote:
> On Wed, Jun 5, 2013 at 11:00 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>
>> On Wed, Jun 5, 2013 at 6:56 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> > On Wed, Jun 5, 2013 at 6:39 PM, Russ Cox <rsc@golang.org> wrote:
>> >> Okay, I see what you are getting at. But the naming is confusing, and
>> >> the
>> >> new API isn't great either. Instead of tacking more arguments onto
>> >> gogo,
>> >> that stuff really belongs in the Gobuf
>> >
>> > Agree.
>> >
>> >
>> >> , which represents the paused
>> >> goroutine's state. What you're calling returnto is really the beginning
>> >> of a
>> >> call. I understand how you got to "return" but that's not really what
>> >> it
>> >> does: it's gogocall not gogoreturn. A better name would be gostartcall,
>> >> so
>> >> that the current gogocall = gostartcall + gogo. And if we do this, the
>> >> whole
>> >> concept of an unstarted goroutine should be removed completely.
>> >
>> > Agree.
>> >
>> >> Can you try something like CL 10036044 instead? It mostly works but
>> >> some
>> >> tests fail due to garbage collection problems that I don't understand.
>> >
>> > I will look into making your CL work.
>>
>> Shouldn't it be the other way around? Start call to goexit, then switch to
>> fn?
>
>
> Maybe it's the other way around in your CL (I didn't look closely), but the
> way I would like to approach this is to think of
>
>>
>> 1547 newg->sched.pc = (byte*)runtime·goexit;
>> 1548 newg->sched.g = newg;
>
>
> as creating a goroutine that is about to run goexit.
>
>>
>> 1549 » runtime·gostartcallfn(&newg->sched, fn);
>
>
> And then this pushes a call to fn ahead of whatever the goroutine was going
> to do. So the net result is that the goroutine calls fn, and then when fn
> returns, the goroutine calls goexit.

OK


>> This can have problems with GC, because it does not understand
>> non-started functions yet. For the non-started goroutine fn has not
>> yet decremented SP to form stack frame, but the GC assumes it does.
>
>
> The stack walker that we had before all the new GC work understood that if
> pc == f->entry then there are no locals, only arguments. I attempted to add
> that to addframeroots (see the CL), but it is possible other fixes are
> needed elsewhere.

I will try to debug it.

rsc

I looked at this CL again after your question about setup being the other way ...

12 years, 1 month ago (2013-06-05 18:05:24 UTC) #20

I looked at this CL again after your question about setup being the other
way around, and now I understand the code and why it's called returnto.
With that understanding, I wanted to explain a little more why I prefer my
proposal.

In general, context switching APIs are not easy to design, but I've learned
a few lessons from others' mistakes and my own.

In early versions of Unix, the context switch switched the stack pointer
but not the program counter! Flipping just the stack pointer was completely
manageable to program against in hand-written assembly, where that kind of
thing is not uncommon. When C came along, however, things did not go a
smoothly. After a context restore inside a C function you had the program
counter of the function that invoked the context restore running in the
stack frame of the function that invoked the context save. If the old pc
was expecting to find any local variables on the stack after the call, it
wouldn't. Worse, each C function epilogue adjusts the stack pointer upward
to remove the frame allocated in the prologue. For the one function's
epilogue to execute a correct return from the other function's stack frame,
the two functions must have stack frames of equal size. Add a local
variable to one or the other and the kernel stops working, because the
context switch restore created a state expected by assembly programs but
unexpected in C programs. In essence, the context switch API had latent
assembly assumptions and broke when it had to be "ported" to work with C.
This mess is what prompted the famous "you are not expected to understand
this" comment (see swtch.com/unix for a scan). The lesson I learned from
taking the time to understand this is that context switching functions
should work with a description of an actual snapshot of the CPU. Save and
restore both the stack pointer and the program counter and any other live
registers, all together. In the Go runtime, gosave copies the current CPU's
execution state into the Gobuf, and gogo does the reverse.

When we started doing Go on ARM, we ran into places where the code had
latent x86 assumpsions, and we had to design more portable APIs to remove
those assumptions. After trying a handful of things that didn't work, we
realized that the APIs that did work were the ones that could be explained
in terms of hardware instructions. In retrospect, this is obvious: if the
contexts are valid CPU snapshots, the operations on them have to use valid
CPU snapshots as input and output, and that's what hardware instructions
do. So gogocall is a gogo (context restore) followed by the simulation of
one call instruction before letting the CPU run with the context.

If these routines need to be refactored, then the suggestion in my CL is to
have an operation gostartcall that follows these two rules. First, it takes
as input a valid CPU snapshot (a Gobuf describing, in practice, the
beginning of an execution of lessstack or goexit) and adjusts it to return
another valid CPU snapshot. Second, it can be explained in terms of
hardware instructions: it executes a single call instruction in that Gobuf
context and saves the resulting context back into the Gobuf.

In contrast, the returnto API does not follow either of these rules. The
Gobuf passed into returnto is not a working CPU snapshot: on the x86, the
input stack pointer and program counter do not correspond. And because the
input is not a well-formed execution context, returnto cannot be described
in terms of CPU execution. It is certainly possible to build a working
system using this primitive, as I'm sure you have, but because it breaks
the rules, I worry that it will turn out to be fragile for some unexpected
reason later. That's why I prefer gostartcall.

Thanks.
Russ

dvyukov

On Wed, Jun 5, 2013 at 10:05 PM, Russ Cox <rsc@golang.org> wrote: > I looked ...

12 years, 1 month ago (2013-06-06 16:03:19 UTC) #21

On Wed, Jun 5, 2013 at 10:05 PM, Russ Cox <rsc@golang.org> wrote:
> I looked at this CL again after your question about setup being the other
> way around, and now I understand the code and why it's called returnto. With
> that understanding, I wanted to explain a little more why I prefer my
> proposal.
>
> In general, context switching APIs are not easy to design, but I've learned
> a few lessons from others' mistakes and my own.
>
> In early versions of Unix, the context switch switched the stack pointer but
> not the program counter! Flipping just the stack pointer was completely
> manageable to program against in hand-written assembly, where that kind of
> thing is not uncommon. When C came along, however, things did not go a
> smoothly. After a context restore inside a C function you had the program
> counter of the function that invoked the context restore running in the
> stack frame of the function that invoked the context save. If the old pc was
> expecting to find any local variables on the stack after the call, it
> wouldn't. Worse, each C function epilogue adjusts the stack pointer upward
> to remove the frame allocated in the prologue. For the one function's
> epilogue to execute a correct return from the other function's stack frame,
> the two functions must have stack frames of equal size. Add a local variable
> to one or the other and the kernel stops working, because the context switch
> restore created a state expected by assembly programs but unexpected in C
> programs. In essence, the context switch API had latent assembly assumptions
> and broke when it had to be "ported" to work with C. This mess is what
> prompted the famous "you are not expected to understand this" comment (see
> swtch.com/unix for a scan). The lesson I learned from taking the time to
> understand this is that context switching functions should work with a
> description of an actual snapshot of the CPU. Save and restore both the
> stack pointer and the program counter and any other live registers, all
> together. In the Go runtime, gosave copies the current CPU's execution state
> into the Gobuf, and gogo does the reverse.
>
> When we started doing Go on ARM, we ran into places where the code had
> latent x86 assumpsions, and we had to design more portable APIs to remove
> those assumptions. After trying a handful of things that didn't work, we
> realized that the APIs that did work were the ones that could be explained
> in terms of hardware instructions. In retrospect, this is obvious: if the
> contexts are valid CPU snapshots, the operations on them have to use valid
> CPU snapshots as input and output, and that's what hardware instructions do.
> So gogocall is a gogo (context restore) followed by the simulation of one
> call instruction before letting the CPU run with the context.
>
> If these routines need to be refactored, then the suggestion in my CL is to
> have an operation gostartcall that follows these two rules. First, it takes
> as input a valid CPU snapshot (a Gobuf describing, in practice, the
> beginning of an execution of lessstack or goexit) and adjusts it to return
> another valid CPU snapshot. Second, it can be explained in terms of hardware
> instructions: it executes a single call instruction in that Gobuf context
> and saves the resulting context back into the Gobuf.
>
> In contrast, the returnto API does not follow either of these rules. The
> Gobuf passed into returnto is not a working CPU snapshot: on the x86, the
> input stack pointer and program counter do not correspond. And because the
> input is not a well-formed execution context, returnto cannot be described
> in terms of CPU execution. It is certainly possible to build a working
> system using this primitive, as I'm sure you have, but because it breaks the
> rules, I worry that it will turn out to be fragile for some unexpected
> reason later. That's why I prefer gostartcall.


Thanks for this insight! Now your API makes perfect sense.

My version does not have a deep idea behind, I just wanted to split
the startcall/returnto part from the actual switch (gogo).

With the small fix that I described in the CL, your version passes
all.bash. So please send it for review.

ality

Thanks for taking the time to explain the motivation. I enjoyed reading it. Anthony

12 years, 1 month ago (2013-06-06 21:33:03 UTC) #22

*** Abandoned ***

Expand All Messages | Collapse All Messages