|
|
Description9g: fix under-zeroing in clearfat
All three cases of clearfat were wrong on power64x.
The cases that handle 1032 bytes and up and 32 bytes and up
both use MOVDU (one directly generated in a loop and the other
via duffzero), which leaves the pointer register pointing at
the *last written* address. The generated code was not
accounting for this, so the byte fill loop was re-zeroing the
last zeroed dword, rather than the bytes following the last
zeroed dword. Fix this by simply adding an additional 8 byte
offset to the byte zeroing loop.
The case that handled under 32 bytes was also wrong. It
didn't update the pointer register at all, so the byte zeroing
loop was simply re-zeroing the beginning of region. Again,
the fix is to add an offset to the byte zeroing loop to
account for this.
Patch Set 1 #Patch Set 2 : diff -r 38a2a61c4a2ee2e8bd17296e415d6013011991b4 https://code.google.com/p/go #Patch Set 3 : diff -r 38a2a61c4a2ee2e8bd17296e415d6013011991b4 https://code.google.com/p/go #Patch Set 4 : diff -r 38a2a61c4a2ee2e8bd17296e415d6013011991b4 https://code.google.com/p/go #Patch Set 5 : diff -r 38a2a61c4a2ee2e8bd17296e415d6013011991b4 https://code.google.com/p/go #
MessagesTotal messages: 11
Hello rsc@golang.org, dave@cheney.net (cc: golang-codereviews@googlegroups.com), I'd like you to review this change to the dev.power64 branch of https://code.google.com/p/go
Sign in to reply to this message.
Some minor commentary nits. Your fix looks good, but I'm a bit wary of all the subtracting and adding of word widths, it looks fragile. Dave (phone) On 31 Oct 2014 08:36, <austin@google.com> wrote: > > Reviewers: rsc, dfc, > > Message: > Hello rsc@golang.org, dave@cheney.net (cc: > golang-codereviews@googlegroups.com), > > I'd like you to review this change to the dev.power64 branch of > https://code.google.com/p/go > > > Description: > 9g: fix under-zeroing in clearfat > > All three cases of clearfat were wrong on power64x. > > The cases that handle 1032 bytes and up and 32 bytes and up > both use MOVDU (one directly generated in a loop and the other > via duffzero), which leaves the pointer register pointing at > the *last written* address. The generated code was not > accounting for this, so the byte fill loop was re-zeroing the > last zeroed dword, rather than the bytes following the last > zeroed dword. Fix this by simply adding an additional 8 byte > offset to the byte zeroing loop. > > The case that handled under 32 bytes was also wrong. It > didn't update the pointer register at all, so the byte zeroing > loop was simply re-zeroing the beginning of region. Again, > the fix is to add an offset to the byte zeroing loop to > account for this. > > Please review this at https://codereview.appspot.com/168870043/ > > Affected files (+14, -7 lines): > M src/cmd/9g/ggen.c > M src/runtime/asm_power64x.s > > > Index: src/cmd/9g/ggen.c > =================================================================== > --- a/src/cmd/9g/ggen.c > +++ b/src/cmd/9g/ggen.c > @@ -900,7 +900,7 @@ ret: > void > clearfat(Node *nl) > { > - uint64 w, c, q, t; > + uint64 w, c, q, t, boff; > Node dst, end, r0, *f; > Prog *p, *pl; > > @@ -944,6 +944,8 @@ clearfat(Node *nl) > patch(gbranch(ABNE, T, 0), pl); > > regfree(&end); > + // The above loop leaves R3 on the last zeroed dword S/above// > + boff = 8; > } else if(q >= 4) { > p = gins(ASUB, N, &dst); > p->from.type = D_CONST; > @@ -953,17 +955,21 @@ clearfat(Node *nl) > afunclit(&p->to, f); > // 4 and 128 = magic constants: see ../../runtime/asm_power64x.s > p->to.offset = 4*(128-q); > - } else > - for(t = 0; t < q; t++) { > - p = gins(AMOVD, &r0, &dst); > - p->to.type = D_OREG; > - p->to.offset = 8*t; > + // duffzero leaves R3 on the last zeroed dword > + boff = 8; > + } else { > + for(t = 0; t < q; t++) { > + p = gins(AMOVD, &r0, &dst); > + p->to.type = D_OREG; > + p->to.offset = 8*t; > + } > + boff = 8*q; > } > > for(t = 0; t < c; t++) { > p = gins(AMOVB, &r0, &dst); > p->to.type = D_OREG; > - p->to.offset = t; > + p->to.offset = t+boff; > } > reg[REGRT1]--; > } > Index: src/runtime/asm_power64x.s > =================================================================== > --- a/src/runtime/asm_power64x.s > +++ b/src/runtime/asm_power64x.s > @@ -830,6 +830,7 @@ notfound: > // R0: always zero > // R3 (aka REGRT1): ptr to memory to be zeroed - 8 > // R3 is updated as a side effect. Drop the line, the following line is enough > +// On return, R3 points to the last zeroed dword. > TEXT runtime·duffzero(SB), NOSPLIT, $-8-0 > MOVDU R0, 8(R3) > MOVDU R0, 8(R3) > >
Sign in to reply to this message.
And is this something observable from Go code? Possible to add a test? On Oct 30, 2014 7:16 PM, "Dave Cheney" <dave@cheney.net> wrote: > Some minor commentary nits. > > Your fix looks good, but I'm a bit wary of all the subtracting and adding > of word widths, it looks fragile. > > Dave (phone) > > On 31 Oct 2014 08:36, <austin@google.com> wrote: > > > > Reviewers: rsc, dfc, > > > > Message: > > Hello rsc@golang.org, dave@cheney.net (cc: > > golang-codereviews@googlegroups.com), > > > > I'd like you to review this change to the dev.power64 branch of > > https://code.google.com/p/go > > > > > > Description: > > 9g: fix under-zeroing in clearfat > > > > All three cases of clearfat were wrong on power64x. > > > > The cases that handle 1032 bytes and up and 32 bytes and up > > both use MOVDU (one directly generated in a loop and the other > > via duffzero), which leaves the pointer register pointing at > > the *last written* address. The generated code was not > > accounting for this, so the byte fill loop was re-zeroing the > > last zeroed dword, rather than the bytes following the last > > zeroed dword. Fix this by simply adding an additional 8 byte > > offset to the byte zeroing loop. > > > > The case that handled under 32 bytes was also wrong. It > > didn't update the pointer register at all, so the byte zeroing > > loop was simply re-zeroing the beginning of region. Again, > > the fix is to add an offset to the byte zeroing loop to > > account for this. > > > > Please review this at https://codereview.appspot.com/168870043/ > > > > Affected files (+14, -7 lines): > > M src/cmd/9g/ggen.c > > M src/runtime/asm_power64x.s > > > > > > Index: src/cmd/9g/ggen.c > > =================================================================== > > --- a/src/cmd/9g/ggen.c > > +++ b/src/cmd/9g/ggen.c > > @@ -900,7 +900,7 @@ ret: > > void > > clearfat(Node *nl) > > { > > - uint64 w, c, q, t; > > + uint64 w, c, q, t, boff; > > Node dst, end, r0, *f; > > Prog *p, *pl; > > > > @@ -944,6 +944,8 @@ clearfat(Node *nl) > > patch(gbranch(ABNE, T, 0), pl); > > > > regfree(&end); > > + // The above loop leaves R3 on the last zeroed dword > > S/above// > > > + boff = 8; > > } else if(q >= 4) { > > p = gins(ASUB, N, &dst); > > p->from.type = D_CONST; > > @@ -953,17 +955,21 @@ clearfat(Node *nl) > > afunclit(&p->to, f); > > // 4 and 128 = magic constants: see > ../../runtime/asm_power64x.s > > p->to.offset = 4*(128-q); > > - } else > > - for(t = 0; t < q; t++) { > > - p = gins(AMOVD, &r0, &dst); > > - p->to.type = D_OREG; > > - p->to.offset = 8*t; > > + // duffzero leaves R3 on the last zeroed dword > > + boff = 8; > > + } else { > > + for(t = 0; t < q; t++) { > > + p = gins(AMOVD, &r0, &dst); > > + p->to.type = D_OREG; > > + p->to.offset = 8*t; > > + } > > + boff = 8*q; > > } > > > > for(t = 0; t < c; t++) { > > p = gins(AMOVB, &r0, &dst); > > p->to.type = D_OREG; > > - p->to.offset = t; > > + p->to.offset = t+boff; > > } > > reg[REGRT1]--; > > } > > Index: src/runtime/asm_power64x.s > > =================================================================== > > --- a/src/runtime/asm_power64x.s > > +++ b/src/runtime/asm_power64x.s > > @@ -830,6 +830,7 @@ notfound: > > // R0: always zero > > // R3 (aka REGRT1): ptr to memory to be zeroed - 8 > > // R3 is updated as a side effect. > > Drop the line, the following line is enough > > +// On return, R3 points to the last zeroed dword. > > TEXT runtime·duffzero(SB), NOSPLIT, $-8-0 > > MOVDU R0, 8(R3) > > MOVDU R0, 8(R3) > > > > > > -- > You received this message because you are subscribed to the Google Groups > "golang-codereviews" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to golang-codereviews+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. >
Sign in to reply to this message.
I've been able to come up with one failing test case http://paste.ubuntu.com/8753330/ If this sounds sensible I can expand this to be a real test in the runtime package. On Fri, Oct 31, 2014 at 9:24 AM, Brad Fitzpatrick <bradfitz@golang.org> wrote: > And is this something observable from Go code? Possible to add a test? > > On Oct 30, 2014 7:16 PM, "Dave Cheney" <dave@cheney.net> wrote: >> >> Some minor commentary nits. >> >> Your fix looks good, but I'm a bit wary of all the subtracting and adding >> of word widths, it looks fragile. >> >> Dave (phone) >> >> On 31 Oct 2014 08:36, <austin@google.com> wrote: >> > >> > Reviewers: rsc, dfc, >> > >> > Message: >> > Hello rsc@golang.org, dave@cheney.net (cc: >> > golang-codereviews@googlegroups.com), >> > >> > I'd like you to review this change to the dev.power64 branch of >> > https://code.google.com/p/go >> > >> > >> > Description: >> > 9g: fix under-zeroing in clearfat >> > >> > All three cases of clearfat were wrong on power64x. >> > >> > The cases that handle 1032 bytes and up and 32 bytes and up >> > both use MOVDU (one directly generated in a loop and the other >> > via duffzero), which leaves the pointer register pointing at >> > the *last written* address. The generated code was not >> > accounting for this, so the byte fill loop was re-zeroing the >> > last zeroed dword, rather than the bytes following the last >> > zeroed dword. Fix this by simply adding an additional 8 byte >> > offset to the byte zeroing loop. >> > >> > The case that handled under 32 bytes was also wrong. It >> > didn't update the pointer register at all, so the byte zeroing >> > loop was simply re-zeroing the beginning of region. Again, >> > the fix is to add an offset to the byte zeroing loop to >> > account for this. >> > >> > Please review this at https://codereview.appspot.com/168870043/ >> > >> > Affected files (+14, -7 lines): >> > M src/cmd/9g/ggen.c >> > M src/runtime/asm_power64x.s >> > >> > >> > Index: src/cmd/9g/ggen.c >> > =================================================================== >> > --- a/src/cmd/9g/ggen.c >> > +++ b/src/cmd/9g/ggen.c >> > @@ -900,7 +900,7 @@ ret: >> > void >> > clearfat(Node *nl) >> > { >> > - uint64 w, c, q, t; >> > + uint64 w, c, q, t, boff; >> > Node dst, end, r0, *f; >> > Prog *p, *pl; >> > >> > @@ -944,6 +944,8 @@ clearfat(Node *nl) >> > patch(gbranch(ABNE, T, 0), pl); >> > >> > regfree(&end); >> > + // The above loop leaves R3 on the last zeroed dword >> >> S/above// >> >> > + boff = 8; >> > } else if(q >= 4) { >> > p = gins(ASUB, N, &dst); >> > p->from.type = D_CONST; >> > @@ -953,17 +955,21 @@ clearfat(Node *nl) >> > afunclit(&p->to, f); >> > // 4 and 128 = magic constants: see >> > ../../runtime/asm_power64x.s >> > p->to.offset = 4*(128-q); >> > - } else >> > - for(t = 0; t < q; t++) { >> > - p = gins(AMOVD, &r0, &dst); >> > - p->to.type = D_OREG; >> > - p->to.offset = 8*t; >> > + // duffzero leaves R3 on the last zeroed dword >> > + boff = 8; >> > + } else { >> > + for(t = 0; t < q; t++) { >> > + p = gins(AMOVD, &r0, &dst); >> > + p->to.type = D_OREG; >> > + p->to.offset = 8*t; >> > + } >> > + boff = 8*q; >> > } >> > >> > for(t = 0; t < c; t++) { >> > p = gins(AMOVB, &r0, &dst); >> > p->to.type = D_OREG; >> > - p->to.offset = t; >> > + p->to.offset = t+boff; >> > } >> > reg[REGRT1]--; >> > } >> > Index: src/runtime/asm_power64x.s >> > =================================================================== >> > --- a/src/runtime/asm_power64x.s >> > +++ b/src/runtime/asm_power64x.s >> > @@ -830,6 +830,7 @@ notfound: >> > // R0: always zero >> > // R3 (aka REGRT1): ptr to memory to be zeroed - 8 >> > // R3 is updated as a side effect. >> >> Drop the line, the following line is enough >> > +// On return, R3 points to the last zeroed dword. >> > TEXT runtime·duffzero(SB), NOSPLIT, $-8-0 >> > MOVDU R0, 8(R3) >> > MOVDU R0, 8(R3) >> > >> > >> >> -- >> You received this message because you are subscribed to the Google Groups >> "golang-codereviews" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to golang-codereviews+unsubscribe@googlegroups.com. >> For more options, visit https://groups.google.com/d/optout.
Sign in to reply to this message.
LGTM. https://codereview.appspot.com/162570043 contains the beginnings of a test which failed prior to this CL and now passes.
Sign in to reply to this message.
On Thu, Oct 30, 2014 at 6:16 PM, Dave Cheney <dave@cheney.net> wrote: > Some minor commentary nits. > > Your fix looks good, but I'm a bit wary of all the subtracting and adding > of word widths, it looks fragile. > This code is already keenly aware of low level details like this, so I think it would merely obscure it if I tried to be more abstract for these changes, but I'm open to suggestions. > Dave (phone) > > On 31 Oct 2014 08:36, <austin@google.com> wrote: > > > > Reviewers: rsc, dfc, > > > > Message: > > Hello rsc@golang.org, dave@cheney.net (cc: > > golang-codereviews@googlegroups.com), > > > > I'd like you to review this change to the dev.power64 branch of > > https://code.google.com/p/go > > > > > > Description: > > 9g: fix under-zeroing in clearfat > > > > All three cases of clearfat were wrong on power64x. > > > > The cases that handle 1032 bytes and up and 32 bytes and up > > both use MOVDU (one directly generated in a loop and the other > > via duffzero), which leaves the pointer register pointing at > > the *last written* address. The generated code was not > > accounting for this, so the byte fill loop was re-zeroing the > > last zeroed dword, rather than the bytes following the last > > zeroed dword. Fix this by simply adding an additional 8 byte > > offset to the byte zeroing loop. > > > > The case that handled under 32 bytes was also wrong. It > > didn't update the pointer register at all, so the byte zeroing > > loop was simply re-zeroing the beginning of region. Again, > > the fix is to add an offset to the byte zeroing loop to > > account for this. > > > > Please review this at https://codereview.appspot.com/168870043/ > > > > Affected files (+14, -7 lines): > > M src/cmd/9g/ggen.c > > M src/runtime/asm_power64x.s > > > > > > Index: src/cmd/9g/ggen.c > > =================================================================== > > --- a/src/cmd/9g/ggen.c > > +++ b/src/cmd/9g/ggen.c > > @@ -900,7 +900,7 @@ ret: > > void > > clearfat(Node *nl) > > { > > - uint64 w, c, q, t; > > + uint64 w, c, q, t, boff; > > Node dst, end, r0, *f; > > Prog *p, *pl; > > > > @@ -944,6 +944,8 @@ clearfat(Node *nl) > > patch(gbranch(ABNE, T, 0), pl); > > > > regfree(&end); > > + // The above loop leaves R3 on the last zeroed dword > > S/above// > Done. > > + boff = 8; > > } else if(q >= 4) { > > p = gins(ASUB, N, &dst); > > p->from.type = D_CONST; > > @@ -953,17 +955,21 @@ clearfat(Node *nl) > > afunclit(&p->to, f); > > // 4 and 128 = magic constants: see > ../../runtime/asm_power64x.s > > p->to.offset = 4*(128-q); > > - } else > > - for(t = 0; t < q; t++) { > > - p = gins(AMOVD, &r0, &dst); > > - p->to.type = D_OREG; > > - p->to.offset = 8*t; > > + // duffzero leaves R3 on the last zeroed dword > > + boff = 8; > > + } else { > > + for(t = 0; t < q; t++) { > > + p = gins(AMOVD, &r0, &dst); > > + p->to.type = D_OREG; > > + p->to.offset = 8*t; > > + } > > + boff = 8*q; > > } > > > > for(t = 0; t < c; t++) { > > p = gins(AMOVB, &r0, &dst); > > p->to.type = D_OREG; > > - p->to.offset = t; > > + p->to.offset = t+boff; > > } > > reg[REGRT1]--; > > } > > Index: src/runtime/asm_power64x.s > > =================================================================== > > --- a/src/runtime/asm_power64x.s > > +++ b/src/runtime/asm_power64x.s > > @@ -830,6 +830,7 @@ notfound: > > // R0: always zero > > // R3 (aka REGRT1): ptr to memory to be zeroed - 8 > > // R3 is updated as a side effect. > > Drop the line, the following line is enough > Done. > > +// On return, R3 points to the last zeroed dword. > > TEXT runtime·duffzero(SB), NOSPLIT, $-8-0 > > MOVDU R0, 8(R3) > > MOVDU R0, 8(R3) > > > > >
Sign in to reply to this message.
My initial reaction was an overreaction. I agree with Brad that adding a test is probably sufficient coverage for this bug. On Sat, Nov 1, 2014 at 12:39 AM, Austin Clements <austin@google.com> wrote: > On Thu, Oct 30, 2014 at 6:16 PM, Dave Cheney <dave@cheney.net> wrote: >> >> Some minor commentary nits. >> >> Your fix looks good, but I'm a bit wary of all the subtracting and adding >> of word widths, it looks fragile. > > This code is already keenly aware of low level details like this, so I think > it would merely obscure it if I tried to be more abstract for these changes, > but I'm open to suggestions. >> >> Dave (phone) >> >> On 31 Oct 2014 08:36, <austin@google.com> wrote: >> > >> > Reviewers: rsc, dfc, >> > >> > Message: >> > Hello rsc@golang.org, dave@cheney.net (cc: >> > golang-codereviews@googlegroups.com), >> > >> > I'd like you to review this change to the dev.power64 branch of >> > https://code.google.com/p/go >> > >> > >> > Description: >> > 9g: fix under-zeroing in clearfat >> > >> > All three cases of clearfat were wrong on power64x. >> > >> > The cases that handle 1032 bytes and up and 32 bytes and up >> > both use MOVDU (one directly generated in a loop and the other >> > via duffzero), which leaves the pointer register pointing at >> > the *last written* address. The generated code was not >> > accounting for this, so the byte fill loop was re-zeroing the >> > last zeroed dword, rather than the bytes following the last >> > zeroed dword. Fix this by simply adding an additional 8 byte >> > offset to the byte zeroing loop. >> > >> > The case that handled under 32 bytes was also wrong. It >> > didn't update the pointer register at all, so the byte zeroing >> > loop was simply re-zeroing the beginning of region. Again, >> > the fix is to add an offset to the byte zeroing loop to >> > account for this. >> > >> > Please review this at https://codereview.appspot.com/168870043/ >> > >> > Affected files (+14, -7 lines): >> > M src/cmd/9g/ggen.c >> > M src/runtime/asm_power64x.s >> > >> > >> > Index: src/cmd/9g/ggen.c >> > =================================================================== >> > --- a/src/cmd/9g/ggen.c >> > +++ b/src/cmd/9g/ggen.c >> > @@ -900,7 +900,7 @@ ret: >> > void >> > clearfat(Node *nl) >> > { >> > - uint64 w, c, q, t; >> > + uint64 w, c, q, t, boff; >> > Node dst, end, r0, *f; >> > Prog *p, *pl; >> > >> > @@ -944,6 +944,8 @@ clearfat(Node *nl) >> > patch(gbranch(ABNE, T, 0), pl); >> > >> > regfree(&end); >> > + // The above loop leaves R3 on the last zeroed dword >> >> S/above// > > Done. >> >> > + boff = 8; >> > } else if(q >= 4) { >> > p = gins(ASUB, N, &dst); >> > p->from.type = D_CONST; >> > @@ -953,17 +955,21 @@ clearfat(Node *nl) >> > afunclit(&p->to, f); >> > // 4 and 128 = magic constants: see >> > ../../runtime/asm_power64x.s >> > p->to.offset = 4*(128-q); >> > - } else >> > - for(t = 0; t < q; t++) { >> > - p = gins(AMOVD, &r0, &dst); >> > - p->to.type = D_OREG; >> > - p->to.offset = 8*t; >> > + // duffzero leaves R3 on the last zeroed dword >> > + boff = 8; >> > + } else { >> > + for(t = 0; t < q; t++) { >> > + p = gins(AMOVD, &r0, &dst); >> > + p->to.type = D_OREG; >> > + p->to.offset = 8*t; >> > + } >> > + boff = 8*q; >> > } >> > >> > for(t = 0; t < c; t++) { >> > p = gins(AMOVB, &r0, &dst); >> > p->to.type = D_OREG; >> > - p->to.offset = t; >> > + p->to.offset = t+boff; >> > } >> > reg[REGRT1]--; >> > } >> > Index: src/runtime/asm_power64x.s >> > =================================================================== >> > --- a/src/runtime/asm_power64x.s >> > +++ b/src/runtime/asm_power64x.s >> > @@ -830,6 +830,7 @@ notfound: >> > // R0: always zero >> > // R3 (aka REGRT1): ptr to memory to be zeroed - 8 >> > // R3 is updated as a side effect. >> >> Drop the line, the following line is enough > > Done. >> >> > +// On return, R3 points to the last zeroed dword. >> > TEXT runtime·duffzero(SB), NOSPLIT, $-8-0 >> > MOVDU R0, 8(R3) >> > MOVDU R0, 8(R3) >> > >> >
Sign in to reply to this message.
Hello rsc@golang.org, dave@cheney.net, bradfitz@golang.org (cc: golang-codereviews@googlegroups.com), Please take another look.
Sign in to reply to this message.
I took this test and sort of ran with it and combined it with the ideas from test/fixedbugs/bug449.go (which is what led me to the problem in the first place). The test in the updated CL uses stack poisoning instead of depending on garbage on the heap, which makes it fairly precise, and the only thing it hard-codes about the interesting cases of clearfat is an upper bound on interesting sizes. On Thu, Oct 30, 2014 at 8:42 PM, <dave@cheney.net> wrote: > LGTM. > > https://codereview.appspot.com/162570043 > > contains the beginnings of a test which failed prior to this CL and now > passes. > > https://codereview.appspot.com/168870043/ >
Sign in to reply to this message.
LGTM for the test
Sign in to reply to this message.
*** Submitted as https://code.google.com/p/go/source/detail?r=1e9dfa88c1e2 *** [dev.power64] 9g: fix under-zeroing in clearfat All three cases of clearfat were wrong on power64x. The cases that handle 1032 bytes and up and 32 bytes and up both use MOVDU (one directly generated in a loop and the other via duffzero), which leaves the pointer register pointing at the *last written* address. The generated code was not accounting for this, so the byte fill loop was re-zeroing the last zeroed dword, rather than the bytes following the last zeroed dword. Fix this by simply adding an additional 8 byte offset to the byte zeroing loop. The case that handled under 32 bytes was also wrong. It didn't update the pointer register at all, so the byte zeroing loop was simply re-zeroing the beginning of region. Again, the fix is to add an offset to the byte zeroing loop to account for this. LGTM=dave, bradfitz R=rsc, dave, bradfitz CC=golang-codereviews https://codereview.appspot.com/168870043
Sign in to reply to this message.
|