|
|
Created:
7 years, 9 months ago by sffc Modified:
7 years, 5 months ago Reviewers:
andy.heninger CC:
markdavis Base URL:
svn+icussh://source.icu-project.org/repos/icu/icu4j/trunk/ Visibility:
Public. |
Descriptionicu4j version of SpoofChecker
Patch Set 1 #Patch Set 2 : Second version of WSC algorithm. #Patch Set 3 : Version of my code with public API that generates whole-script confusables. #Patch Set 4 : Version of code with IdentifierInfo2 #Patch Set 5 : Version of code with MA tables removed #Patch Set 6 : More small API changes #Patch Set 7 : Code cleanup #Patch Set 8 : Removing the big TR39_CHARS UnicodeSet #
Total comments: 69
Patch Set 9 : Updating with Andy's first round of feedback. #
Total comments: 16
Patch Set 10 : Re-writing documentation. #Patch Set 11 : Small documentation updates. #Patch Set 12 : Merging trunk #Patch Set 13 : Resolving merge conflicts #Patch Set 14 : Adding hashCode function to fix ant check error. Committed to svn. #Patch Set 15 : Adding additional test coverage. #Patch Set 16 : Adding coverage for copy constructor test. #Patch Set 17 : Replying to some Andy comments that I missed. #
Total comments: 8
Patch Set 18 : Minor revisions to SpoofChecker comments and documentation. #MessagesTotal messages: 29
Second version of WSC algorithm.
Sign in to reply to this message.
Version of my code with public API that generates whole-script confusables.
Sign in to reply to this message.
Version of code with IdentifierInfo2
Sign in to reply to this message.
Version of code with MA tables removed
Sign in to reply to this message.
More small API changes
Sign in to reply to this message.
Code cleanup
Sign in to reply to this message.
Removing the big TR39_CHARS UnicodeSet
Sign in to reply to this message.
Hi Andy, Here are my SpoofChecker changes in Java. I will send the C++ code review shortly. There are four main types of changes: 1) Removing Whole Script Confusables plumbing. 2) Replacing IdentifierInfo with the updated algorithms in UTS 39. 3) Removing ML/MA/SL/SA and string lengths plumbing. 4) Other assorted code health / refactoring changes. I left comments throughout the modified files explaining some of my reasoning. Thanks! https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... File main/classes/core/src/com/ibm/icu/text/SpoofChecker.java (right): https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:330: public static final int CONFUSABLE = SINGLE_SCRIPT_CONFUSABLE | MIXED_SCRIPT_CONFUSABLE | WHOLE_SCRIPT_CONFUSABLE; This is a new API, which was included in the proposal. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:458: if (fSpoofData == null && (fChecks & CONFUSABLE) != 0) { This is where the default SpoofData won't be read if it won't be needed for the tests. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:650: UnicodeSet tmpSet = new UnicodeSet(); Don't know why this method got changed. I'll change it back. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:691: fChecks |= RESTRICTION_LEVEL | MIXED_NUMBERS; Enabling MIXED_NUMBERS here at Mark's request. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:728: private Hashtable<Integer, SPUString> fTable; This is the start of the section of the code that was changed in order to remove the SL/SA/ML/MA tables. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:864: s.fCharOrStrTableIndex = s.fStr.charAt(0); I renamed the variable; it serves the same purpose as before https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:892: fValueVec.add(value); This section contains the relevant lines from "addKeyEntry". There was a lot of code in "addKeyEntry" to handle characters that were in multiple tables, and also string lengths. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:973: final static SPUStringComparator INSTANCE = new SPUStringComparator(); I think a final static is slightly more efficient for this purpose https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1118: * @stable ICU 58 More API changes. Looks like I put in "@stable" instead of "@draft". Let me know if you catch this anywhere else. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1206: RestrictionLevel textRestrictionLevel = getRestrictionLevel(text); Removing IdentifierInfo and replacing it with individual functions, like "getRestrictionLevel" and "getNumerics" (a few lines down). There should be very little change in performance, if not slightly better performance since the SOSS no longer needs to be computed. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1261: continue; Rietveld is a little bit confused here; this section of the code is identical to the "INVISIBLE" test from before, down through line 1277. The WSC and MSC tests were removed. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1331: // Compute the skeletons and check for confusability. I re-wrote this method mostly from scratch. It should have the same result as the old version. Probably would be good for you to double-check. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1474: The next several methods are my new additions for UTS 39, which cover logic formerly in IdentifierInfo.java. The method "confusableLookup" on the left has been overhauled and moved into the "SpoofData" class, which can be found further down, around line 1791. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1640: private static final class ConfusableDataUtils { I made this singleton class to contain all of the logic involving reading and writing from the bitmask described in the comment above. Previously, there were several places around the code that all had hard-coded binary operations. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1641: public static final int FORMAT_VERSION = 58; // version for ICU 58 I double-checked, and this should probably be a 2 instead of a 58, as discussed. I'll change it. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1702: EXCEPTION = e; Saving the exception so that it can be thrown later. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1791: public void confusableLookup(int inChar, StringBuilder dest) { This is the new home of confusableLookup. I re-wrote it from scratch, removing the logic involving the ML/MA/SL/SA tables. I also added methods "length()", "codePointAt()", and "appendValueTo()", to further abstract away the underlying data structure. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1861: // Extends Java BitSet with input/output support and a few helper methods. I made the ScriptSet actually extend BitSet. I was able to eliminate a lot of the code in the original version of ScriptSet. This is a Java-only change. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1912: public void appendStringTo(StringBuilder sb) { I made a nicer toString() method. https://codereview.appspot.com/303100043/diff/140001/main/tests/core/src/com/... File main/tests/core/src/com/ibm/icu/dev/test/text/SpoofCheckerTest.java (right): https://codereview.appspot.com/303100043/diff/140001/main/tests/core/src/com/... main/tests/core/src/com/ibm/icu/dev/test/text/SpoofCheckerTest.java:93: rsc = new SpoofChecker.Builder().setData(confusables).build(); Can use this simpler line since there is no more confusablesWholeScript. https://codereview.appspot.com/303100043/diff/140001/main/tests/core/src/com/... main/tests/core/src/com/ibm/icu/dev/test/text/SpoofCheckerTest.java:176: SpoofChecker sc = new SpoofChecker.Builder().setChecks(SpoofChecker.CHAR_LIMIT).build(); Here and elsewhere, I explicitly list the checks that are relevant to the test method. Previously, these would have defaulted to ALL_CHECKS. This test file probably needs a little bit more clean-up in wake of my changes to the implementation file. Let me know what sorts of changes you would like to see. https://codereview.appspot.com/303100043/diff/140001/main/tests/core/src/com/... main/tests/core/src/com/ibm/icu/dev/test/text/SpoofCheckerTest.java:542: assertEquals("", expected, checkResult.numerics); Numerics is accessible directly from CheckResult. Switched from IdentifierInfo to the public API. https://codereview.appspot.com/303100043/diff/140001/main/tests/core/src/com/... main/tests/core/src/com/ibm/icu/dev/test/text/SpoofCheckerTest.java:551: //// contains(BitSet, BitSet) No more IdentifierInfo. These methods were removed and no longer have an implementation within ICU. https://codereview.appspot.com/303100043/diff/140001/main/tests/core/src/com/... main/tests/core/src/com/ibm/icu/dev/test/text/SpoofCheckerTest.java:581: {"a1١", "MODERATELY_RESTRICTIVE", "[0٠]", "Latn", "Arab Thaa", "Arab Thaa"}, These are functionality changes involving the decoupling of the MIXED_NUMBERS and RESTRICTION_LEVEL tests. https://codereview.appspot.com/303100043/diff/140001/main/tests/core/src/com/... main/tests/core/src/com/ibm/icu/dev/test/text/SpoofCheckerTest.java:606: // BitSet scripts = IdentifierInfo.parseScripts(test[3]); Another function that was removed since IdentifierInfo isn't here any more. https://codereview.appspot.com/303100043/diff/140001/main/tests/core/src/com/... main/tests/core/src/com/ibm/icu/dev/test/text/SpoofCheckerTest.java:631: assertEquals("", RestrictionLevel.SINGLE_SCRIPT_RESTRICTIVE, checkResult.restrictionLevel); Switching this test from IdentifierInfo to public API. https://codereview.appspot.com/303100043/diff/140001/main/tests/core/src/com/... main/tests/core/src/com/ibm/icu/dev/test/text/SpoofCheckerTest.java:634: // @Test Another IdentifierInfo function. https://codereview.appspot.com/303100043/diff/140001/main/tests/core/src/com/... main/tests/core/src/com/ibm/icu/dev/test/text/SpoofCheckerTest.java:773: actual = sc.getSkeleton(from); No more tables.
Sign in to reply to this message.
First batch of comments. I haven't looked at the tests yet. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... File main/classes/core/src/com/ibm/icu/text/SpoofChecker.java (left): https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:2: // License & terms of use: http://www.unicode.org/copyright.html#License This should be in the new also, it shouldn't be a diff. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... File main/classes/core/src/com/ibm/icu/text/SpoofChecker.java (right): https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:328: * CONFUSABLE flags. * @draft ICU 58 * @provisional This API might change or be removed in a future release. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:343: * {@link RestrictionLevel#MODERATELY_RESTRICTIVE}. keep as HIGHLY_RESTRICTIVE, as requested in the review. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:345: * @stable ICU 58 make it be * @draft ICU 58 * @provisional This API might change or be removed in a future release. Even though it should have been API all along, the generic policy is not to move things directly from @internal to @stable https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:380: * @stable ICU 58 * @draft ICU 58 * @provisional This API might change or be removed in a future release. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:430: fRestrictionLevel = RestrictionLevel.MODERATELY_RESTRICTIVE; Revert this change. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:492: * @stable ICU 58 * @draft ICU 58 * @provisional This API might change or be removed in a future release. (things aren't allowed to be born stable) https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:527: * To whitelist specific checks and disable all others, the whitelisted checks should be ORed together. For Use of whitelist and blacklist isn't particularly clear to me. Maybe stick with enable and disable. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:687: * @stable ICU 58 * @draft ICU 58 * @provisional This API might change or be removed in a future release. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:713: // 3. Build a list of keys (UChar32s) from the four mapping tables. Sort the no longer four mapping tables. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:873: // For each key code point, check which mapping tables it applies to, redo this and the following comments to remove discussion of multiple mapping tables. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:892: fValueVec.add(value); On 2016/08/19 03:51:41, sffc wrote: > This section contains the relevant lines from "addKeyEntry". There was a lot of > code in "addKeyEntry" to handle characters that were in multiple tables, and > also string lengths. Acknowledged. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1118: * @stable ICU 58 On 2016/08/19 03:51:41, sffc wrote: > More API changes. Looks like I put in "@stable" instead of "@draft". Let me > know if you catch this anywhere else. Yes. Make it * @draft ICU 58 * @provisional This API might change or be removed in a future release. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1125: * @stable ICU 58 * @draft ICU 58 * @provisional This API might change or be removed in a future release. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1357: // Turn off flags that the user doesn't want could this be result &= fChecks; https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1384: * @stable ICU 58 * @draft ICU 58 * @provisional This API might change or be removed in a future release. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1388: throw new IllegalArgumentException("In order to use getSkeleton(), you must enable the CONFUSABLE check."); This is actually a change in behavior. It should probably be raised in the API change proposal, if it wasn't already. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1416: * @stable ICU 4.6 Delete the @stable line @deprecated ICU 58 https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1640: private static final class ConfusableDataUtils { On 2016/08/19 03:51:41, sffc wrote: > I made this singleton class to contain all of the logic involving reading and > writing from the bitmask described in the comment above. Previously, there were > several places around the code that all had hard-coded binary operations. Acknowledged. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1904: this.clear(); No need to clear first.
Sign in to reply to this message.
https://codereview.appspot.com/303100043/diff/140001/main/tests/core/src/com/... File main/tests/core/src/com/ibm/icu/dev/test/text/SpoofCheckerTest.java (right): https://codereview.appspot.com/303100043/diff/140001/main/tests/core/src/com/... main/tests/core/src/com/ibm/icu/dev/test/text/SpoofCheckerTest.java:93: rsc = new SpoofChecker.Builder().setData(confusables).build(); On 2016/08/19 03:51:42, sffc wrote: > Can use this simpler line since there is no more confusablesWholeScript. We should include at least one use of the old, now deprecated, function. For code coverage, to verify that it is still there and and functioning. https://codereview.appspot.com/303100043/diff/140001/main/tests/core/src/com/... main/tests/core/src/com/ibm/icu/dev/test/text/SpoofCheckerTest.java:551: //// contains(BitSet, BitSet) On 2016/08/19 03:51:42, sffc wrote: > No more IdentifierInfo. These methods were removed and no longer have an > implementation within ICU. Go ahead and delete them, rather than just commenting them out. https://codereview.appspot.com/303100043/diff/140001/main/tests/core/src/com/... main/tests/core/src/com/ibm/icu/dev/test/text/SpoofCheckerTest.java:634: // @Test On 2016/08/19 03:51:42, sffc wrote: > Another IdentifierInfo function. It's fine to delete it entirely.
Sign in to reply to this message.
Updating with Andy's first round of feedback.
Sign in to reply to this message.
Updating with Andy's first round of feedback.
Sign in to reply to this message.
Updating with Andy's first round of feedback.
Sign in to reply to this message.
https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... File main/classes/core/src/com/ibm/icu/text/SpoofChecker.java (left): https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:2: // License & terms of use: http://www.unicode.org/copyright.html#License On 2016/08/26 01:06:39, andy.heninger wrote: > This should be in the new also, it shouldn't be a diff. Done. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... File main/classes/core/src/com/ibm/icu/text/SpoofChecker.java (right): https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:328: * CONFUSABLE flags. On 2016/08/26 01:06:37, andy.heninger wrote: > * @draft ICU 58 > * @provisional This API might change or be removed in a future release. Done. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:343: * {@link RestrictionLevel#MODERATELY_RESTRICTIVE}. On 2016/08/26 01:06:40, andy.heninger wrote: > keep as HIGHLY_RESTRICTIVE, as requested in the review. Done. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:345: * @stable ICU 58 On 2016/08/26 01:06:43, andy.heninger wrote: > make it be > * @draft ICU 58 > * @provisional This API might change or be removed in a future release. > Even though it should have been API all along, the generic policy is not to move > things directly from @internal to @stable Done. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:380: * @stable ICU 58 On 2016/08/26 01:06:40, andy.heninger wrote: > * @draft ICU 58 > * @provisional This API might change or be removed in a future release. Done. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:430: fRestrictionLevel = RestrictionLevel.MODERATELY_RESTRICTIVE; On 2016/08/26 01:06:44, andy.heninger wrote: > Revert this change. Done. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:492: * @stable ICU 58 On 2016/08/26 01:06:43, andy.heninger wrote: > * @draft ICU 58 > * @provisional This API might change or be removed in a future release. > > (things aren't allowed to be born stable) Done. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:527: * To whitelist specific checks and disable all others, the whitelisted checks should be ORed together. For On 2016/08/26 01:06:37, andy.heninger wrote: > Use of whitelist and blacklist isn't particularly clear to me. Maybe stick with > enable and disable. Does this look a little better? https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1357: // Turn off flags that the user doesn't want On 2016/08/26 01:06:42, andy.heninger wrote: > could this be > result &= fChecks; > Indeed! Thanks! :) https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1384: * @stable ICU 58 On 2016/08/26 01:06:41, andy.heninger wrote: > * @draft ICU 58 > * @provisional This API might change or be removed in a future release. Done. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1388: throw new IllegalArgumentException("In order to use getSkeleton(), you must enable the CONFUSABLE check."); On 2016/08/26 01:06:41, andy.heninger wrote: > This is actually a change in behavior. It should probably be raised in the API > change proposal, if it wasn't already. Moved discussion about this to the email thread with Mark. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1416: * @stable ICU 4.6 On 2016/08/26 01:06:38, andy.heninger wrote: > Delete the @stable line > @deprecated ICU 58 Done. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1904: this.clear(); On 2016/08/26 01:06:36, andy.heninger wrote: > No need to clear first. Done. https://codereview.appspot.com/303100043/diff/140001/main/tests/core/src/com/... File main/tests/core/src/com/ibm/icu/dev/test/text/SpoofCheckerTest.java (right): https://codereview.appspot.com/303100043/diff/140001/main/tests/core/src/com/... main/tests/core/src/com/ibm/icu/dev/test/text/SpoofCheckerTest.java:93: rsc = new SpoofChecker.Builder().setData(confusables).build(); On 2016/08/26 18:19:02, andy.heninger wrote: > On 2016/08/19 03:51:42, sffc wrote: > > Can use this simpler line since there is no more confusablesWholeScript. > > We should include at least one use of the old, now deprecated, function. For > code coverage, to verify that it is still there and and functioning. Added a deprecation code coverage test further down in the file. https://codereview.appspot.com/303100043/diff/140001/main/tests/core/src/com/... main/tests/core/src/com/ibm/icu/dev/test/text/SpoofCheckerTest.java:551: //// contains(BitSet, BitSet) On 2016/08/26 18:19:01, andy.heninger wrote: > On 2016/08/19 03:51:42, sffc wrote: > > No more IdentifierInfo. These methods were removed and no longer have an > > implementation within ICU. > > Go ahead and delete them, rather than just commenting them out. Done. https://codereview.appspot.com/303100043/diff/200001/main/tests/core/src/com/... File main/tests/core/src/com/ibm/icu/dev/test/text/SpoofCheckerTest.java (right): https://codereview.appspot.com/303100043/diff/200001/main/tests/core/src/com/... main/tests/core/src/com/ibm/icu/dev/test/text/SpoofCheckerTest.java:483: SpoofChecker.WHOLE_SCRIPT_CONFUSABLE }, This here is the case that passes in my version but fails on ICU 57. https://codereview.appspot.com/303100043/diff/200001/main/tests/core/src/com/... main/tests/core/src/com/ibm/icu/dev/test/text/SpoofCheckerTest.java:561: {"aアー\u303c1१١۱", RestrictionLevel.MINIMALLY_RESTRICTIVE}, I moved the relevant content from the old "IdentifierInfo" test into this test and the mixed numbers test.
Sign in to reply to this message.
https://codereview.appspot.com/303100043/diff/200001/main/classes/core/src/co... File main/classes/core/src/com/ibm/icu/text/SpoofChecker.java (right): https://codereview.appspot.com/303100043/diff/200001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:187: * This is the default restriction level as of ICU 58. I thought we switched the default back again. https://codereview.appspot.com/303100043/diff/200001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:328: * <p>Note: if you wish to use {@link SpoofChecker#getSkeleton}, it is required that you enable at least one of the I thought we decided against this. https://codereview.appspot.com/303100043/diff/200001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:339: * @deprecated ICU 58 This API was deprecated in UTS 39 revision 11 and is no longer used. Clarify the wording. Any case confusable mappings were removed from UTS 39; the corresponding ICU API was deprecated. https://codereview.appspot.com/303100043/diff/200001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1126: * The numerics found in the string, if MIXED_NUMBERS was set; otherwise null; Expand the description. The set will contain the zero digit from each decimal number system found in the input string. https://codereview.appspot.com/303100043/diff/200001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1639: // The array is arranged by length of the strings - all strings of the same length Ordering the string table by length is no longer needed, although it doesn't hurt anything. https://codereview.appspot.com/303100043/diff/200001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1863: // ScriptSet - Script code bit sets. Used with the whole script confusable data. comment about whole script confusable data is no longer relevant. https://codereview.appspot.com/303100043/diff/200001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1873: private static final long serialVersionUID = 1L; Hmm. With ScriptSet being internal, I guess its OK to ignore serialization. But lets check with Markus. Damn BitSet for being serializable.
Sign in to reply to this message.
Re-writing documentation.
Sign in to reply to this message.
Small documentation updates.
Sign in to reply to this message.
Updated documentation in the Java version. Anything else I need to do?
Sign in to reply to this message.
Merging trunk
Sign in to reply to this message.
Resolving merge conflicts
Sign in to reply to this message.
Resolving merge conflicts
Sign in to reply to this message.
Adding hashCode function to fix ant check error. Committed to svn.
Sign in to reply to this message.
Adding additional test coverage.
Sign in to reply to this message.
Adding coverage for copy constructor test.
Sign in to reply to this message.
Replying to some Andy comments that I missed.
Sign in to reply to this message.
I noticed some comments you made in the files that went undetected. I fixed them and committed the changes as r39327. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... File main/classes/core/src/com/ibm/icu/text/SpoofChecker.java (right): https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:713: // 3. Build a list of keys (UChar32s) from the four mapping tables. Sort the On 2016/08/26 01:06:42, andy.heninger wrote: > no longer four mapping tables. Done from before. https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:873: // For each key code point, check which mapping tables it applies to, On 2016/08/26 01:06:42, andy.heninger wrote: > redo this and the following comments to remove discussion of multiple mapping > tables. Done from before. https://codereview.appspot.com/303100043/diff/200001/main/classes/core/src/co... File main/classes/core/src/com/ibm/icu/text/SpoofChecker.java (right): https://codereview.appspot.com/303100043/diff/200001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:187: * This is the default restriction level as of ICU 58. On 2016/09/09 23:52:49, andy.heninger wrote: > I thought we switched the default back again. Done. https://codereview.appspot.com/303100043/diff/200001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:328: * <p>Note: if you wish to use {@link SpoofChecker#getSkeleton}, it is required that you enable at least one of the On 2016/09/09 23:52:49, andy.heninger wrote: > I thought we decided against this. Done. https://codereview.appspot.com/303100043/diff/200001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:339: * @deprecated ICU 58 This API was deprecated in UTS 39 revision 11 and is no longer used. On 2016/09/09 23:52:49, andy.heninger wrote: > Clarify the wording. Any case confusable mappings were removed from UTS 39; the > corresponding ICU API was deprecated. Done. https://codereview.appspot.com/303100043/diff/200001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1126: * The numerics found in the string, if MIXED_NUMBERS was set; otherwise null; On 2016/09/09 23:52:49, andy.heninger wrote: > Expand the description. The set will contain the zero digit from each decimal > number system found in the input string. Done. https://codereview.appspot.com/303100043/diff/200001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1639: // The array is arranged by length of the strings - all strings of the same length On 2016/09/09 23:52:49, andy.heninger wrote: > Ordering the string table by length is no longer needed, although it doesn't > hurt anything. Done. Also fixed in C++. https://codereview.appspot.com/303100043/diff/200001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1863: // ScriptSet - Script code bit sets. Used with the whole script confusable data. On 2016/09/09 23:52:49, andy.heninger wrote: > comment about whole script confusable data is no longer relevant. Done. https://codereview.appspot.com/303100043/diff/200001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1873: private static final long serialVersionUID = 1L; On 2016/09/09 23:52:49, andy.heninger wrote: > Hmm. With ScriptSet being internal, I guess its OK to ignore serialization. But > lets check with Markus. Damn BitSet for being serializable. Do we still need to check on this?
Sign in to reply to this message.
https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... File main/classes/core/src/com/ibm/icu/text/SpoofChecker.java (right): https://codereview.appspot.com/303100043/diff/140001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:527: * To whitelist specific checks and disable all others, the whitelisted checks should be ORed together. For On 2016/08/27 01:19:24, sffc wrote: > On 2016/08/26 01:06:37, andy.heninger wrote: > > Use of whitelist and blacklist isn't particularly clear to me. Maybe stick > with > > enable and disable. > > Does this look a little better? Yes, thanks. https://codereview.appspot.com/303100043/diff/370001/main/classes/core/src/co... File main/classes/core/src/com/ibm/icu/text/SpoofChecker.java (right): https://codereview.appspot.com/303100043/diff/370001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:84: * be thought of as a "hash code". {@link SpoofChecker#getSkeleton} computes the skeleton for a particular string, so It's not quite like a hash because two non-confusable strings must never map to the same skeleton. No Collisions. https://codereview.appspot.com/303100043/diff/370001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:769: * ***************************************************************************** Internal classes for Formatting went wrong https://codereview.appspot.com/303100043/diff/370001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1005: * ***************************************************************************** Internal classes for Formatting trouble again. https://codereview.appspot.com/303100043/diff/370001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1030: private static class SPUStringComparator implements Comparator<SPUString> { With the change to the way lengths are kept, there's no longer a need to sort the strings by length. Probably too big of a change to do at this point in the release, though. Would need to do for C++ also.
Sign in to reply to this message.
Minor revisions to SpoofChecker comments and documentation.
Sign in to reply to this message.
Updated and committed as r39344 in Java and r39345 in C++. https://codereview.appspot.com/303100043/diff/370001/main/classes/core/src/co... File main/classes/core/src/com/ibm/icu/text/SpoofChecker.java (right): https://codereview.appspot.com/303100043/diff/370001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:84: * be thought of as a "hash code". {@link SpoofChecker#getSkeleton} computes the skeleton for a particular string, so On 2016/09/23 18:09:43, andy.heninger wrote: > It's not quite like a hash because two non-confusable strings must never map to > the same skeleton. No Collisions. Done. https://codereview.appspot.com/303100043/diff/370001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:769: * ***************************************************************************** Internal classes for On 2016/09/23 18:09:43, andy.heninger wrote: > Formatting went wrong Done. https://codereview.appspot.com/303100043/diff/370001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1005: * ***************************************************************************** Internal classes for On 2016/09/23 18:09:43, andy.heninger wrote: > Formatting trouble again. Done. https://codereview.appspot.com/303100043/diff/370001/main/classes/core/src/co... main/classes/core/src/com/ibm/icu/text/SpoofChecker.java:1030: private static class SPUStringComparator implements Comparator<SPUString> { On 2016/09/23 18:09:43, andy.heninger wrote: > With the change to the way lengths are kept, there's no longer a need to sort > the strings by length. Probably too big of a change to do at this point in the > release, though. Would need to do for C++ also. Good point. I'll add a TODO to the code. Should there be a ticket, too?
Sign in to reply to this message.
|