|
strings: optimize Replace.
benchmark old ns/op new ns/op delta
BenchmarkGenericNoMatch 76550 18709 -75.56%
BenchmarkGenericMatch 88093 29899 -66.06%
Four specific optimizations were made.
Baseline:
BenchmarkGenericNoMatch 20000 76550 ns/op
replace_test.go:129: 20000 iterations, 403 mallocs per iteration
BenchmarkGenericMatch 20000 88093 ns/op
replace_test.go:149: 20000 iterations, 343 mallocs per iteration
Step 1:
Calling Write once per byte is expensive. Batch appending the non-matching
bytes, introducing two indexes (i, j) instead of one (i).
BenchmarkGenericNoMatch 50000 39913 ns/op
replace_test.go:129: 50000 iterations, 5 mallocs per iteration
BenchmarkGenericMatch 20000 80236 ns/op
replace_test.go:149: 20000 iterations, 205 mallocs per iteration
Step 2:
Make Replace call WriteString only once, not twice. Previously, the first
time writes to the equivalent of /dev/null, just to allocate a destination
buffer of exactly the right size for the second time. Instead, just
allocate original length + 50%. It's no big deal if the slice isn't sized
exactly to fit.
BenchmarkGenericNoMatch 100000 20249 ns/op
replace_test.go:129: 100000 iterations, 4 mallocs per iteration
BenchmarkGenericMatch 50000 41951 ns/op
replace_test.go:149: 50000 iterations, 104 mallocs per iteration
Step 3:
Sniff for a writer.WriteString method. This avoids all the string to []byte
conversions for the common cases of calling Replace or calling WriteString
with a bytes.Buffer.
BenchmarkGenericNoMatch 100000 19510 ns/op
replace_test.go:129: 100000 iterations, 3 mallocs per iteration
BenchmarkGenericMatch 50000 32389 ns/op
replace_test.go:149: 50000 iterations, 3 mallocs per iteration
Step 4:
Hoist the s[j:] out of the range r.p loop.
BenchmarkGenericNoMatch 100000 18709 ns/op
replace_test.go:129: 100000 iterations, 3 mallocs per iteration
BenchmarkGenericMatch 50000 29899 ns/op
replace_test.go:149: 50000 iterations, 3 mallocs per iteration
|