Code review - Issue 5987063: code review 5987063: cmd/5l: use GOARM=7 to gate the VFPv3-only "vmov (imm)"...https://codereview.appspot.com/2012-08-06T02:29:44+00:00rietveld
Message from unknown
2012-04-08T12:07:55+00:00minux1urn:md5:7e9e79a0edbd8e8f10e19c8a5f8f1965
Message from unknown
2012-04-08T12:08:11+00:00minux1urn:md5:342d779db29df759a43dee6adbccdd9d
Message from unknown
2012-04-08T12:20:01+00:00minux1urn:md5:79fd7474d4f990988ca2b1e65fa839df
Message from unknown
2012-04-08T12:23:50+00:00minux1urn:md5:52d5d4bbcdeac7969c8db43397122faa
Message from unknown
2012-04-17T07:50:49+00:00minux1urn:md5:c64ea7805711be976ae5cc716bd9a36f
Message from unknown
2012-06-02T16:29:45+00:00minux1urn:md5:04215dc5be6e04b086d41cea0063fca2
Message from unknown
2012-06-13T15:19:25+00:00minux1urn:md5:3c5bffa23a8591d3471a13021ad0451a
Message from minux.ma@gmail.com
2012-06-13T15:20:02+00:00minux1urn:md5:2fb5cd69952a0e4669bef8f3b4a09d91
Hello dave@cheney.net (cc: golang-dev@googlegroups.com),
I'd like you to review this change to
https://code.google.com/p/go/
Message from rsc@golang.org
2012-06-13T15:33:56+00:00rscurn:md5:903d3f38916fc0be9f096fc06dfb8ffc
I would rather stop using the instruction altogether than introduce a
new GOARM setting.
Russ
Message from unknown
2012-07-03T18:06:07+00:00minux1urn:md5:d481f953f1868022c7a9f601e950920b
Message from unknown
2012-07-05T19:37:10+00:00minux1urn:md5:4355dc5a57e2986f8b6b20e4d6274cfa
Message from dave@cheney.net
2012-07-10T21:54:49+00:00dfcurn:md5:28ea2a52558265c9c0a3c085d3e84a8e
On a pandaboard (ARMv7) the performance hit for applying this patch is quite high for some math benchmarks.
benchmark old ns/op new ns/op delta
BenchmarkAcos 377 378 +0.27%
BenchmarkAcosh 487 505 +3.70%
BenchmarkAsin 359 357 -0.56%
BenchmarkAsinh 557 559 +0.36%
BenchmarkAtan 253 253 +0.00%
BenchmarkAtanh 469 496 +5.76%
BenchmarkAtan2 406 403 -0.74%
BenchmarkCbrt 837 873 +4.30%
BenchmarkCeil 182 177 -2.75%
BenchmarkCopysign 70 77 +10.90%
BenchmarkCos 355 354 -0.28%
BenchmarkCosh 1031 1043 +1.16%
BenchmarkErf 234 234 +0.00%
BenchmarkErfc 246 246 +0.00%
BenchmarkExp 514 506 -1.56%
BenchmarkExpGo 516 510 -1.16%
BenchmarkExpm1 291 293 +0.69%
BenchmarkExp2 469 471 +0.43%
BenchmarkExp2Go 474 475 +0.21%
BenchmarkAbs 16 22 +31.71%
BenchmarkDim 124 128 +3.23%
BenchmarkFloor 140 128 -8.57%
BenchmarkMax 103 103 +0.00%
BenchmarkMin 102 103 +0.98%
BenchmarkMod 1044 1072 +2.68%
BenchmarkFrexp 180 179 -0.56%
BenchmarkGamma 518 512 -1.16%
BenchmarkHypot 236 233 -1.27%
BenchmarkHypotGo 243 238 -2.06%
BenchmarkIlogb 187 188 +0.53%
BenchmarkJ0 1881 1828 -2.82%
BenchmarkJ1 1773 1817 +2.48%
BenchmarkJn 3908 3875 -0.84%
BenchmarkLdexp 181 191 +5.52%
BenchmarkLgamma 367 373 +1.63%
BenchmarkLog 429 420 -2.10%
BenchmarkLogb 183 184 +0.55%
BenchmarkLog1p 344 355 +3.20%
BenchmarkLog10 461 452 -1.95%
BenchmarkLog2 461 458 -0.65%
BenchmarkModf 110 116 +5.45%
BenchmarkNextafter 115 114 -0.87%
BenchmarkPowInt 758 773 +1.98%
BenchmarkPowFrac 1698 1708 +0.59%
BenchmarkPow10Pos 1266 1437 +13.51%
BenchmarkPow10Neg 1316 1504 +14.29%
BenchmarkRemainder 868 868 +0.00%
BenchmarkSignbit 35 38 +7.34%
BenchmarkSin 356 352 -1.12%
BenchmarkSincos 445 453 +1.80%
BenchmarkSinh 1090 1085 -0.46%
BenchmarkSqrt 44 44 +0.00%
BenchmarkSqrtGo 1188 1197 +0.76%
BenchmarkTan 387 401 +3.62%
BenchmarkTanh 2169 2169 +0.00%
BenchmarkTrunc 129 125 -3.10%
BenchmarkY0 1872 1817 -2.94%
BenchmarkY1 1810 1862 +2.87%
BenchmarkYn 3844 3975 +3.41%
The overall impact to the go1 benchmarks was less conclusive
benchmark old ns/op new ns/op delta
BenchmarkBinaryTree17 37239166000 37314759000 +0.20%
BenchmarkFannkuch11 36177002000 34104889000 -5.73%
BenchmarkGobDecode 122143600 117872600 -3.50%
BenchmarkGobEncode 63118280 61765140 -2.14%
BenchmarkGzip 5508117000 5607391000 +1.80%
BenchmarkGunzip 1210297000 1193390000 -1.40%
BenchmarkJSONEncode 763110400 774969600 +1.55%
BenchmarkJSONDecode 2389526000 2373870000 -0.66%
BenchmarkMandelbrot200 45587760 45641480 +0.12%
BenchmarkParse 58891600 58216560 -1.15%
BenchmarkRevcomp 139398200 146197600 +4.88%
BenchmarkTemplate 5396057000 5423462000 +0.51%
benchmark old MB/s new MB/s speedup
BenchmarkGobDecode 6.28 6.51 1.04x
BenchmarkGobEncode 12.16 12.43 1.02x
BenchmarkGzip 3.52 3.46 0.98x
BenchmarkGunzip 16.03 16.26 1.01x
BenchmarkJSONEncode 2.54 2.50 0.98x
BenchmarkJSONDecode 0.81 0.82 1.01x
BenchmarkParse 0.98 0.99 1.01x
BenchmarkRevcomp 18.23 17.39 0.95x
BenchmarkTemplate 0.36 0.36 1.00x
I'm going to spend some time looking at the regressions in math to see if this patch can be applied in a way that doesn't impact ARMv7 performance. If this is not possible, and I do realise this is against rsc's suggestion, I would like to propose that GOARM default to 7 (not 6), and 6 be used to inhibit the use of vfp3 instructions.
Message from unknown
2012-07-27T01:44:31+00:00minux1urn:md5:dcf22b7aebc0dd3ce059a22c74b953de
Message from rsc@golang.org
2012-08-05T23:00:45+00:00rscurn:md5:24a14c284bb33063ccbedc3f4f0dad88
I'd be a bit less uneasy about this if we could make the binaries detect when they are being used on systems that are too old.
Message from minux.ma@gmail.com
2012-08-06T02:29:44+00:00minux1urn:md5:2b5195bb6d4851c00f1a2b38a11a3d2d
On Mon, Aug 6, 2012 at 7:00 AM, <rsc@golang.org> wrote:
> I'd be a bit less uneasy about this if we could make the binaries detect
> when they are being used on systems that are too old.
>
this detection would be easy given that we already parse runtime·hwcap from
auxv.
or, better yet, we detect this from sigill handler, and this would be more
reliable
and portable (by portable, i mean portable to my Darwin/ARM port, although
if
you think judging from runtime·hwcap is ok, i will be fine, because
Darwin/ARM
that is capable of running Go would be always Cortex-A8 or higher so this
is a
non issue). the downside of this is that maybe we need to parse the
instruction
to see if it is vmov imm, or some other floating point instructions; but
the main
benefit would be, if the program doesn't use the offending instructions, we
don't
need to tell the user to change anything (this is not the case for our
current runtime,
because it will always use floating point instruction in runtime.check(),
but at least
we can run successfully on VFPv2 systems). of course, from another
perspective,
you might think this kind of uncertainty is the worst feature.
which solution do you think is better?