Descriptioncmd/6g: change sbop swap logic
I added the nl->op == OLITERAL case during the recent
performance round, and while it helps for small integer constants,
it hurts for floating point constants. In the Mandelbrot benchmark
it causes 2*Zr*Zi to compile like Zr*2*Zi:
0x000000000042663d <+249>: movsd %xmm6,%xmm0
0x0000000000426641 <+253>: movsd $2,%xmm1
0x000000000042664a <+262>: mulsd %xmm1,%xmm0
0x000000000042664e <+266>: mulsd %xmm5,%xmm0
instead of:
0x0000000000426835 <+276>: movsd $2,%xmm0
0x000000000042683e <+285>: mulsd %xmm6,%xmm0
0x0000000000426842 <+289>: mulsd %xmm5,%xmm0
It is unclear why that has such a dramatic performance effect
in a tight loop, but it's obviously slightly better code, so go with it.
benchmark old ns/op new ns/op delta
BenchmarkBinaryTree17 5957470000 5973924000 +0.28%
BenchmarkFannkuch11 3811295000 3869128000 +1.52%
BenchmarkGobDecode 26001900 25670500 -1.27%
BenchmarkGobEncode 12051430 11948590 -0.85%
BenchmarkGzip 177432 174821 -1.47%
BenchmarkGunzip 10967 10756 -1.92%
BenchmarkJSONEncode 78924750 79746900 +1.04%
BenchmarkJSONDecode 313606400 307081600 -2.08%
BenchmarkMandelbrot200 13670860 8200725 -40.01% !!!
BenchmarkRevcomp25M 1179194000 1206539000 +2.32%
BenchmarkTemplate 447931200 443948200 -0.89%
BenchmarkMD5Hash1K 2856 2873 +0.60%
BenchmarkMD5Hash8K 22083 22029 -0.24%
benchmark old MB/s new MB/s speedup
BenchmarkGobDecode 29.52 29.90 1.01x
BenchmarkGobEncode 63.69 64.24 1.01x
BenchmarkJSONEncode 24.59 24.33 0.99x
BenchmarkJSONDecode 6.19 6.32 1.02x
BenchmarkRevcomp25M 215.54 210.66 0.98x
BenchmarkTemplate 4.33 4.37 1.01x
BenchmarkMD5Hash1K 358.54 356.31 0.99x
BenchmarkMD5Hash8K 370.95 371.86 1.00x
Patch Set 1 #Patch Set 2 : diff -r 1195a9780464 https://go.googlecode.com/hg/ #Patch Set 3 : diff -r 1195a9780464 https://go.googlecode.com/hg/ #MessagesTotal messages: 3
|