Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(1073)

Delta Between Two Patch Sets: src/pkg/runtime/asm_amd64.s

Issue 8056043: code review 8056043: runtime: Implement faster equals for strings and bytes. (Closed)
Left Patch Set: diff -r d040d5f08d5d https://khr%40golang.org@code.google.com/p/go/ Created 11 years ago
Right Patch Set: diff -r 52e3407d249f https://khr%40golang.org@code.google.com/p/go/ Created 10 years, 12 months ago
Left:
Right:
Use n/p to move between diff chunks; N/P to move between comments. Please Sign in to add in-line comments.
Jump to:
Left: Side by side diff | Download
Right: Side by side diff | Download
« no previous file with change/comment | « src/pkg/runtime/asm_386.s ('k') | src/pkg/runtime/asm_arm.s » ('j') | no next file with change/comment »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
LEFTRIGHT
1 // Copyright 2009 The Go Authors. All rights reserved. 1 // Copyright 2009 The Go Authors. All rights reserved.
2 // Use of this source code is governed by a BSD-style 2 // Use of this source code is governed by a BSD-style
3 // license that can be found in the LICENSE file. 3 // license that can be found in the LICENSE file.
4 4
5 #include "zasm_GOOS_GOARCH.h" 5 #include "zasm_GOOS_GOARCH.h"
6 6
7 TEXT _rt0_amd64(SB),7,$-8 7 TEXT _rt0_amd64(SB),7,$-8
8 // copy arguments forward on an even stack 8 // copy arguments forward on an even stack
9 MOVQ DI, AX // argc 9 MOVQ DI, AX // argc
10 MOVQ SI, BX // argv 10 MOVQ SI, BX // argv
(...skipping 891 matching lines...) Expand 10 before | Expand all | Expand 10 after
902 QUAD $0x0b0a090807060504 902 QUAD $0x0b0a090807060504
903 QUAD $0xffffffff0f0e0d0c 903 QUAD $0xffffffff0f0e0d0c
904 QUAD $0x0a09080706050403 904 QUAD $0x0a09080706050403
905 QUAD $0xffffff0f0e0d0c0b 905 QUAD $0xffffff0f0e0d0c0b
906 QUAD $0x0908070605040302 906 QUAD $0x0908070605040302
907 QUAD $0xffff0f0e0d0c0b0a 907 QUAD $0xffff0f0e0d0c0b0a
908 QUAD $0x0807060504030201 908 QUAD $0x0807060504030201
909 QUAD $0xff0f0e0d0c0b0a09 909 QUAD $0xff0f0e0d0c0b0a09
910 910
911 TEXT runtime·memeq(SB),7,$0 911 TEXT runtime·memeq(SB),7,$0
912 » MOVQ» 8(SP), SI // a 912 » MOVQ» a+0(FP), SI
913 » MOVQ» 16(SP), DI // b 913 » MOVQ» b+8(FP), DI
914 » MOVQ» 24(SP), CX // count 914 » MOVQ» count+16(FP), BX
915 » MOVQ» CX, BX 915 » JMP» runtime·memeqbody(SB)
916 » SHRQ» $3, CX 916
917 » REP 917
918 » CMPSQ 918 TEXT bytes·Equal(SB),7,$0
919 » JNE» notequal 919 » MOVQ» a_len+8(FP), BX
920 » MOVQ» BX, CX 920 » MOVQ» b_len+32(FP), CX
921 » ANDQ» $7, CX
922 » REP
923 » CMPSB
924 » JNE» notequal
925 » MOVQ» $1, AX
926 » RET
927 notequal:
928 XORQ AX, AX 921 XORQ AX, AX
929 » RET 922 » CMPQ» BX, CX
923 » JNE» eqret
924 » MOVQ» a+0(FP), SI
925 » MOVQ» b+24(FP), DI
926 » CALL» runtime·memeqbody(SB)
927 eqret:
928 » MOVB» AX, ret+48(FP)
929 » RET
930
931 // a in SI
932 // b in DI
933 // count in BX
934 TEXT runtime·memeqbody(SB),7,$0
935 » XORQ» AX, AX
936
937 » CMPQ» BX, $8
938 » JB» small
939 »·······
940 » // 64 bytes at a time using xmm registers
941 hugeloop:
942 » CMPQ» BX, $64
943 » JB» bigloop
944 » MOVOU» (SI), X0
945 » MOVOU» (DI), X1
946 » MOVOU» 16(SI), X2
947 » MOVOU» 16(DI), X3
948 » MOVOU» 32(SI), X4
949 » MOVOU» 32(DI), X5
950 » MOVOU» 48(SI), X6
951 » MOVOU» 48(DI), X7
952 » PCMPEQB»X1, X0
953 » PCMPEQB»X3, X2
954 » PCMPEQB»X5, X4
955 » PCMPEQB»X7, X6
956 » PAND» X2, X0
957 » PAND» X6, X4
958 » PAND» X4, X0
959 » PMOVMSKB X0, DX
960 » ADDQ» $64, SI
961 » ADDQ» $64, DI
962 » SUBQ» $64, BX
963 » CMPL» DX, $0xffff
964 » JEQ» hugeloop
965 » RET
966
967 » // 8 bytes at a time using 64-bit register
968 bigloop:
969 » CMPQ» BX, $8
970 » JBE» leftover
971 » MOVQ» (SI), CX
972 » MOVQ» (DI), DX
973 » ADDQ» $8, SI
974 » ADDQ» $8, DI
975 » SUBQ» $8, BX
976 » CMPQ» CX, DX
977 » JEQ» bigloop
978 » RET
979
980 » // remaining 0-8 bytes
981 leftover:
982 » MOVQ» -8(SI)(BX*1), CX
983 » MOVQ» -8(DI)(BX*1), DX
984 » CMPQ» CX, DX
985 » SETEQ» AX
986 » RET
987
988 small:
989 » CMPQ» BX, $0
990 » JEQ» equal
991
992 » LEAQ» 0(BX*8), CX
993 » NEGQ» CX
994
995 » CMPB» SI, $0xf8
996 » JA» si_high
997
998 » // load at SI won't cross a page boundary.
999 » MOVQ» (SI), SI
1000 » JMP» si_finish
1001 si_high:
1002 » // address ends in 11111xxx. Load up to bytes we want, move to correct position.
1003 » MOVQ» -8(SI)(BX*1), SI
1004 » SHRQ» CX, SI
1005 si_finish:
1006
1007 » // same for DI.
1008 » CMPB» DI, $0xf8
1009 » JA» di_high
1010 » MOVQ» (DI), DI
1011 » JMP» di_finish
1012 di_high:
1013 » MOVQ» -8(DI)(BX*1), DI
1014 » SHRQ» CX, DI
1015 di_finish:
1016
1017 » SUBQ» SI, DI
1018 » SHLQ» CX, DI
1019 equal:
1020 » SETEQ» AX
1021 » RET
LEFTRIGHT

Powered by Google App Engine
RSS Feeds Recent Issues | This issue
This is Rietveld f62528b