|
|
Created:
6 years, 5 months ago by markus.icu Modified:
6 years, 4 months ago Reviewers:
sffc CC:
mark_macchiato.com, andy.heninger, srl295 Base URL:
http://source.icu-project.org/repos/icu/trunk/ Visibility:
Public. |
DescriptionICU ticket #13467: make U8_NEXT() never call a function
Patch Set 1 #
Total comments: 4
Patch Set 2 : smaller U8_NEXT() with no function call #Patch Set 3 : expand validity marcros: do not leave it up to the compiler to detect that masking the lead bytes i… #MessagesTotal messages: 13
I looked at this for a while and I believe it is correct. I want to check one more time before you submit it to make sure that i is incremented the right number of times. Also, see the ticket for another possible implementation of U8_NEXT. https://codereview.appspot.com/331290043/diff/1/icu4c/source/common/unicode/u... File icu4c/source/common/unicode/utf8.h (right): https://codereview.appspot.com/331290043/diff/1/icu4c/source/common/unicode/u... icu4c/source/common/unicode/utf8.h:351: #define U8_NEXT(s, i, length, c) U8_INTERNAL_NEXT_OR_SUB(s, i, length, c, U_SENTINEL) The function utf8_nextCharSafeBody has a call to U_IS_UNICODE_NONCHAR that I don't see here. Is that case covered somewhere else? https://codereview.appspot.com/331290043/diff/1/icu4c/source/common/unicode/u... icu4c/source/common/unicode/utf8.h:428: uint32_t __uc=(c); \ Presumably the compiler will optimize out the assignment operation? Any risk of name collision with another variable?
Sign in to reply to this message.
https://codereview.appspot.com/331290043/diff/1/icu4c/source/common/unicode/u... File icu4c/source/common/unicode/utf8.h (right): https://codereview.appspot.com/331290043/diff/1/icu4c/source/common/unicode/u... icu4c/source/common/unicode/utf8.h:351: #define U8_NEXT(s, i, length, c) U8_INTERNAL_NEXT_OR_SUB(s, i, length, c, U_SENTINEL) On 2017/11/17 08:55:22, sffc wrote: > The function utf8_nextCharSafeBody has a call to U_IS_UNICODE_NONCHAR that I > don't see here. Is that case covered somewhere else? That function supports obsolete behaviors (for utf_old.h compatibility) as well as current behaviors. Look at what it does for strict=-1 and strict=-3. https://codereview.appspot.com/331290043/diff/1/icu4c/source/common/unicode/u... icu4c/source/common/unicode/utf8.h:428: uint32_t __uc=(c); \ On 2017/11/17 08:55:22, sffc wrote: > Presumably the compiler will optimize out the assignment operation? It should. Same value, just different interpretation of what less-than means. > Any risk of name collision with another variable? Shouldn't, because it's inside a { block }. We have not had that sort of trouble. The double underscore prefix is probably overkill, too, just extra careful. I guess some compiler could warn about hiding a member field or something. See other macros in these files.
Sign in to reply to this message.
I made some benchmarks. I compared four implementations: 1. U8_NEXT on trunk 2. U8_NEXT_MARKUS from this code review (as in the old code, 3-byte is consumed before 2-byte) 3. U8_NEXT_SHANE for my first implementation, which consumes in byte order (2-byte is consumed before 3-byte) 4. QU8_NEXT_SHANE for a second implementation I was testing, which reduces the number of bytes of machine code on x86 to the lowest of these options except for trunk U8_NEXT Results are below. Although it's basically a toss-up, there are a few interesting results: - The new implementations, the ones that don't depend on the library function utf8_nextCharSafeBody, are significantly faster on 4-byte characters. My chart only shows 12% faster, but this is on a corpus of mostly English with a few Emoji; on a string of all 4-byte characters, trunk U8_NEXT is more than twice as slow than any of the other implementations. - U8_NEXT and U8_NEXT_MARKUS are slightly faster on Chinese (3-byte characters) and slightly slower on Russian (2-byte characters) than QU8_NEXT_SHANE, consistent with the logic of the functions. - U8_NEXT_MARKUS is a bit slower on Russian than any of the other implementations. It's not immediately obvious to me why that would be the case. Maybe something hard to measure like code locality? My code: http://go/bit/sffc/5784002938011648 Shane On Fri, Nov 17, 2017 at 8:58 AM, <markus.icu@gmail.com> wrote: > > https://codereview.appspot.com/331290043/diff/1/icu4c/source > /common/unicode/utf8.h > File icu4c/source/common/unicode/utf8.h (right): > > https://codereview.appspot.com/331290043/diff/1/icu4c/source > /common/unicode/utf8.h#newcode351 > icu4c/source/common/unicode/utf8.h:351: #define U8_NEXT(s, i, length, c) > U8_INTERNAL_NEXT_OR_SUB(s, i, length, c, U_SENTINEL) > On 2017/11/17 08:55:22, sffc wrote: > >> The function utf8_nextCharSafeBody has a call to U_IS_UNICODE_NONCHAR >> > that I > >> don't see here. Is that case covered somewhere else? >> > > That function supports obsolete behaviors (for utf_old.h compatibility) > as well as current behaviors. Look at what it does for strict=-1 and > strict=-3. > > https://codereview.appspot.com/331290043/diff/1/icu4c/source > /common/unicode/utf8.h#newcode428 > icu4c/source/common/unicode/utf8.h:428: uint32_t __uc=(c); \ > On 2017/11/17 08:55:22, sffc wrote: > >> Presumably the compiler will optimize out the assignment operation? >> > > It should. Same value, just different interpretation of what less-than > means. > > Any risk of name collision with another variable? >> > > Shouldn't, because it's inside a { block }. We have not had that sort of > trouble. The double underscore prefix is probably overkill, too, just > extra careful. I guess some compiler could warn about hiding a member > field or something. See other macros in these files. > > https://codereview.appspot.com/331290043/ >
Sign in to reply to this message.
On Thu, Nov 30, 2017 at 12:39 AM, Shane Carr <sffc@google.com> wrote: > I made some benchmarks. > Thanks! Note that Steven and readers of the Rietveld comment cannot access your Google-internal link. (But your summary says most of it.) AFAICT none of your code is functionally equivalent with the existing API: - major: Your code increments the index too far for the new, standard error handling. - minor: I test deliberately for i!=length not i<length in order to support a negative length for NUL-terminated text You have an interesting idea moving the action code (code point assembly, index increment) into the condition, which makes that very convoluted but allows a single else branch for error handling. It appears that this is part of what makes the code smaller. markus
Sign in to reply to this message.
I looked at the size of my proposed U8_NEXT() code with Linux clang 3.4 with -O2 optimization: extern "C" int read(const char *s, int32_t i, int32_t length) { int c; U8_NEXT_OR_FFFD(s, i, length, c); return c; } With $ clang -O2 -c -o u8_next.o u8_next.cpp && objdump -d u8_next.o | tail -n 1 I get 12b: c3 retq That is, the function is 0x12c=300 bytes long. markus
Sign in to reply to this message.
I realized that since the test function did not return the final index, the last ++i got optimized away. I changed it to extern "C" int read(const char *s, int32_t i, int32_t length) { int c; U8_NEXT_OR_FFFD(s, i, length, c); return c^i; // Return both c and i: Make the compiler not drop the last ++i. } With that, the size of the -O2 optimized function using my current patch version of U8_NEXT is: clang 5.0 366 bytes clang 3.4 369 bytes g++ 4.8.4 407 bytes A new version of U8_NEXT, not yet functionally tested: clang 5.0 310 bytes clang 3.4 297 bytes g++ 4.8.4 265 bytes markus
Sign in to reply to this message.
smaller U8_NEXT() with no function call
Sign in to reply to this message.
expand validity marcros: do not leave it up to the compiler to detect that masking the lead bytes is redundant; I do not want to just change the macros because they are used in many places, and I do not want to update all of the places
Sign in to reply to this message.
This passes all ICU tests. Shane, maybe you can plug this into your little benchmark. Otherwise I am all out of ideas for making this shorter/faster/simpler.
Sign in to reply to this message.
LGTM When I run this with -O3, the newest implementation (patch set 3) is faster than every other implementation. I'm especially impressed by the speedup relative to the trunk U8_NEXT. The time spent reducing bytecode size I think was a good investment. U8_NEXT = old code U8_NEXT_MARKUS = patch set 1 MU8_NEXT_MARKUS = patch set 3 QU8_NEXT_SHANE = the better of my two implementations English: 004, U8_NEXT: 0.864 ns/CP 004, U8_NEXT_MARKUS: 0.833 ns/CP 004, MU8_NEXT_MARKUS: 0.652 ns/CP 004, QU8_NEXT_SHANE: 0.652 ns/CP French: 008, U8_NEXT: 1.152 ns/CP 008, U8_NEXT_MARKUS: 1.017 ns/CP 008, MU8_NEXT_MARKUS: 0.874 ns/CP 008, QU8_NEXT_SHANE: 0.936 ns/CP Russian: 068, U8_NEXT: 2.329 ns/CP 068, U8_NEXT_MARKUS: 2.927 ns/CP 068, MU8_NEXT_MARKUS: 1.893 ns/CP 068, QU8_NEXT_SHANE: 2.416 ns/CP Chinese: 511, U8_NEXT: 2.398 ns/CP 511, U8_NEXT_MARKUS: 2.621 ns/CP 511, MU8_NEXT_MARKUS: 2.299 ns/CP 511, QU8_NEXT_SHANE: 2.477 ns/CP English+Emoji: 183, U8_NEXT: 1.632 ns/CP 183, U8_NEXT_MARKUS: 1.148 ns/CP 183, MU8_NEXT_MARKUS: 0.995 ns/CP 183, QU8_NEXT_SHANE: 1.091 ns/CP This is with Clang 3.4. Results with GCC 4.8 are similar.
Sign in to reply to this message.
Nice work! Mark <https://twitter.com/mark_e_davis> On Sat, Dec 2, 2017 at 3:24 AM, <sffc@google.com> wrote: > LGTM > > When I run this with -O3, the newest implementation (patch set 3) is > faster than every other implementation. I'm especially impressed by the > speedup relative to the trunk U8_NEXT. The time spent reducing bytecode > size I think was a good investment. > > U8_NEXT = old code > U8_NEXT_MARKUS = patch set 1 > MU8_NEXT_MARKUS = patch set 3 > QU8_NEXT_SHANE = the better of my two implementations > > English: > 004, U8_NEXT: 0.864 ns/CP 004, U8_NEXT_MARKUS: 0.833 ns/CP 004, > MU8_NEXT_MARKUS: 0.652 ns/CP 004, QU8_NEXT_SHANE: 0.652 ns/CP > > French: > 008, U8_NEXT: 1.152 ns/CP 008, U8_NEXT_MARKUS: 1.017 ns/CP 008, > MU8_NEXT_MARKUS: 0.874 ns/CP 008, QU8_NEXT_SHANE: 0.936 ns/CP > > Russian: > 068, U8_NEXT: 2.329 ns/CP 068, U8_NEXT_MARKUS: 2.927 ns/CP 068, > MU8_NEXT_MARKUS: 1.893 ns/CP 068, QU8_NEXT_SHANE: 2.416 ns/CP > > Chinese: > 511, U8_NEXT: 2.398 ns/CP 511, U8_NEXT_MARKUS: 2.621 ns/CP 511, > MU8_NEXT_MARKUS: 2.299 ns/CP 511, QU8_NEXT_SHANE: 2.477 ns/CP > > English+Emoji: > 183, U8_NEXT: 1.632 ns/CP 183, U8_NEXT_MARKUS: 1.148 ns/CP 183, > MU8_NEXT_MARKUS: 0.995 ns/CP 183, QU8_NEXT_SHANE: 1.091 ns/CP > > This is with Clang 3.4. Results with GCC 4.8 are similar. > > https://codereview.appspot.com/331290043/ >
Sign in to reply to this message.
|