Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(1558)

Issue 331690043: ticket:13569 RBBI state table optmizations

Can't Edit
Can't Publish+Mail
Start Review
Created:
9 months, 1 week ago by andy.heninger
Modified:
7 months, 1 week ago
Reviewers:
sffc
Base URL:
svn+ssh://source.icu-project.org/repos/icu/trunk/
Visibility:
Public.

Description

ticket:13569 RBBI state table optmizations

Patch Set 1 #

Total comments: 29
Unified diffs Side-by-side diffs Delta from patch set Stats (+680 lines, -83 lines) Patch
M . View 0 chunks +0 lines, -0 lines 0 comments Download
M icu4c View 0 chunks +0 lines, -0 lines 0 comments Download
M icu4c/source View 0 chunks +0 lines, -0 lines 0 comments Download
M icu4c/source/common/rbbi.cpp View 1 chunk +4 lines, -0 lines 0 comments Download
M icu4c/source/common/rbbidata.h View 1 chunk +0 lines, -5 lines 0 comments Download
M icu4c/source/common/rbbidata.cpp View 2 chunks +2 lines, -2 lines 0 comments Download
M icu4c/source/common/rbbirb.h View 1 chunk +8 lines, -0 lines 0 comments Download
M icu4c/source/common/rbbirb.cpp View 3 chunks +30 lines, -2 lines 3 comments Download
M icu4c/source/common/rbbisetb.h View 2 chunks +9 lines, -1 line 0 comments Download
M icu4c/source/common/rbbisetb.cpp View 5 chunks +38 lines, -17 lines 0 comments Download
M icu4c/source/common/rbbitblb.h View 4 chunks +41 lines, -4 lines 2 comments Download
M icu4c/source/common/rbbitblb.cpp View 6 chunks +125 lines, -4 lines 7 comments Download
M icu4c/source/common/unicode/rbbi.h View 2 chunks +13 lines, -1 line 0 comments Download
M icu4c/source/test View 0 chunks +0 lines, -0 lines 0 comments Download
M icu4c/source/test/intltest View 0 chunks +0 lines, -0 lines 0 comments Download
M icu4c/source/test/intltest/rbbitst.h View 1 chunk +1 line, -0 lines 0 comments Download
M icu4c/source/test/intltest/rbbitst.cpp View 4 chunks +65 lines, -0 lines 1 comment Download
M icu4c/source/test/testdata/rbbitst.txt View 1 chunk +10 lines, -12 lines 0 comments Download
M icu4j View 0 chunks +0 lines, -0 lines 0 comments Download
M icu4j/main/classes/core/src/com/ibm/icu/text/RBBIDataWrapper.java View 7 chunks +31 lines, -22 lines 8 comments Download
M icu4j/main/classes/core/src/com/ibm/icu/text/RBBIRuleBuilder.java View 3 chunks +30 lines, -2 lines 1 comment Download
M icu4j/main/classes/core/src/com/ibm/icu/text/RBBISetBuilder.java View 6 chunks +32 lines, -3 lines 0 comments Download
M icu4j/main/classes/core/src/com/ibm/icu/text/RBBITableBuilder.java View 6 chunks +155 lines, -6 lines 4 comments Download
M icu4j/main/classes/core/src/com/ibm/icu/text/RuleBasedBreakIterator.java View 1 chunk +4 lines, -2 lines 1 comment Download
M icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/RBBITest.java View 3 chunks +82 lines, -0 lines 2 comments Download

Messages

Total messages: 2
andy.heninger
9 months, 1 week ago (2018-03-05 22:49:00 UTC) #1
sffc
9 months, 1 week ago (2018-03-06 00:33:29 UTC) #2
Another nice thing to add would be a test that checks that the behavior of the
RBBI instances is the same before and after the optimization; for example, run
them on some corpus, and the breaks should be the same.  I think you already
have a test like this (RBBI Monkey Test); instrumenting that with a
non-optimized version might be nice.

https://codereview.appspot.com/331690043/diff/1/icu4c/source/common/rbbirb.cpp
File icu4c/source/common/rbbirb.cpp (right):

https://codereview.appspot.com/331690043/diff/1/icu4c/source/common/rbbirb.cp...
icu4c/source/common/rbbirb.cpp:360: leftClass = 3;
Why does leftClass = 3 and rightClass = 0?  (Add a comment)

https://codereview.appspot.com/331690043/diff/1/icu4c/source/common/rbbirb.cp...
icu4c/source/common/rbbirb.cpp:361: rightClass = 0;
It looks like the value set here doesn't matter; consider leaving it
uninitialized?  (Or add a comment)

https://codereview.appspot.com/331690043/diff/1/icu4c/source/common/rbbirb.cp...
icu4c/source/common/rbbirb.cpp:362: while
(fForwardTables->findDuplCharClassFrom(leftClass, rightClass)) {
Is it safe to check only the forward-table for duplicates but then remove it
from the other three?

https://codereview.appspot.com/331690043/diff/1/icu4c/source/common/rbbitblb.cpp
File icu4c/source/common/rbbitblb.cpp (right):

https://codereview.appspot.com/331690043/diff/1/icu4c/source/common/rbbitblb....
icu4c/source/common/rbbitblb.cpp:1088: uint16_t table_base;
Optional: make a typedef for this

https://codereview.appspot.com/331690043/diff/1/icu4c/source/common/rbbitblb....
icu4c/source/common/rbbitblb.cpp:1092: for (int32_t state=0; state<numStates;
state++) {
Is numStates always at least 1?  If not, line 11000 could read uninitialized
values.  Maybe add a U_ASSERT.

https://codereview.appspot.com/331690043/diff/1/icu4c/source/common/rbbitblb....
icu4c/source/common/rbbitblb.cpp:1094: table_base =
(uint16_t)sd->fDtran->elementAti(baseCategory);
Optional: make this static_cast<uint16_t>(...)

https://codereview.appspot.com/331690043/diff/1/icu4c/source/common/rbbitblb....
icu4c/source/common/rbbitblb.cpp:1132: if (firstSD->fAccepting !=
duplSD->fAccepting ||
This looks like it should be in an RBBIStateDescriptor::operator==().

https://codereview.appspot.com/331690043/diff/1/icu4c/source/common/rbbitblb....
icu4c/source/common/rbbitblb.cpp:1133: firstSD->fLookAhead != duplSD->fLookAhead
||
Do you need a "crossover" check for either of these fields, fAccepting and
fLookAhead, like you have for the cell values on line 1142-1143?

https://codereview.appspot.com/331690043/diff/1/icu4c/source/common/rbbitblb....
icu4c/source/common/rbbitblb.cpp:1141: if (!((firstVal == duplVal) ||
Seems to work, but thinking about if there is a way to simplify the boolean
expression.  At least add a comment.

https://codereview.appspot.com/331690043/diff/1/icu4c/source/common/rbbitblb....
icu4c/source/common/rbbitblb.cpp:1197: int32_t duplicateState = 0;
Again, please document why we start at 3 and 0; that's what you did on columns. 
Does the same thing apply to rows?

https://codereview.appspot.com/331690043/diff/1/icu4c/source/common/rbbitblb.h
File icu4c/source/common/rbbitblb.h (right):

https://codereview.appspot.com/331690043/diff/1/icu4c/source/common/rbbitblb....
icu4c/source/common/rbbitblb.h:54: */
Please document how to initialize baseClass and duplClass

https://codereview.appspot.com/331690043/diff/1/icu4c/source/common/rbbitblb....
icu4c/source/common/rbbitblb.h:55: bool     findDuplCharClassFrom(int
&baseClass, int &duplClass);
Optional: take the ints by pointer instead of reference

https://codereview.appspot.com/331690043/diff/1/icu4c/source/test/intltest/rb...
File icu4c/source/test/intltest/rbbitst.cpp (right):

https://codereview.appspot.com/331690043/diff/1/icu4c/source/test/intltest/rb...
icu4c/source/test/intltest/rbbitst.cpp:4507: s.append(row->fNextState[column]);
Optional: check crossover.  Probably more difficult to implement than it's
worth.

https://codereview.appspot.com/331690043/diff/1/icu4j/main/classes/core/src/c...
File icu4j/main/classes/core/src/com/ibm/icu/text/RBBIDataWrapper.java (right):

https://codereview.appspot.com/331690043/diff/1/icu4j/main/classes/core/src/c...
icu4j/main/classes/core/src/com/ibm/icu/text/RBBIDataWrapper.java:32: public
RBBIDataHeader fHeader;
Pretty sure all of this needs @internal+@deprecated.  Nicer would be to move
this class to the impl package.

https://codereview.appspot.com/331690043/diff/1/icu4j/main/classes/core/src/c...
icu4j/main/classes/core/src/com/ibm/icu/text/RBBIDataWrapper.java:83: /**
@internal */
Again (needs @deprecated also)

https://codereview.appspot.com/331690043/diff/1/icu4j/main/classes/core/src/c...
icu4j/main/classes/core/src/com/ibm/icu/text/RBBIDataWrapper.java:114: public
final static class RBBIDataHeader {
Again

https://codereview.appspot.com/331690043/diff/1/icu4j/main/classes/core/src/c...
icu4j/main/classes/core/src/com/ibm/icu/text/RBBIDataWrapper.java:119: public
int  fCatCount;      //  Number of character categories.
Again

https://codereview.appspot.com/331690043/diff/1/icu4j/main/classes/core/src/c...
icu4j/main/classes/core/src/com/ibm/icu/text/RBBIDataWrapper.java:141: public
RBBIDataHeader() {
Again

https://codereview.appspot.com/331690043/diff/1/icu4j/main/classes/core/src/c...
icu4j/main/classes/core/src/com/ibm/icu/text/RBBIDataWrapper.java:153: public
int getRowIndex(int state){
Again

https://codereview.appspot.com/331690043/diff/1/icu4j/main/classes/core/src/c...
icu4j/main/classes/core/src/com/ibm/icu/text/RBBIDataWrapper.java:326: */
Again

https://codereview.appspot.com/331690043/diff/1/icu4j/main/classes/core/src/c...
icu4j/main/classes/core/src/com/ibm/icu/text/RBBIDataWrapper.java:342: public
void dump(java.io.PrintStream out) {
Again

https://codereview.appspot.com/331690043/diff/1/icu4j/main/classes/core/src/c...
File icu4j/main/classes/core/src/com/ibm/icu/text/RBBIRuleBuilder.java (right):

https://codereview.appspot.com/331690043/diff/1/icu4j/main/classes/core/src/c...
icu4j/main/classes/core/src/com/ibm/icu/text/RBBIRuleBuilder.java:386: IntPair
duplPair = new IntPair(3, 0);
Like in C++, please comment on the reasoning behind this initial condition.

https://codereview.appspot.com/331690043/diff/1/icu4j/main/classes/core/src/c...
File icu4j/main/classes/core/src/com/ibm/icu/text/RBBITableBuilder.java (right):

https://codereview.appspot.com/331690043/diff/1/icu4j/main/classes/core/src/c...
icu4j/main/classes/core/src/com/ibm/icu/text/RBBITableBuilder.java:1: // © 2016
and later: Unicode, Inc. and others.
The same comments from C++ should apply to this file.

https://codereview.appspot.com/331690043/diff/1/icu4j/main/classes/core/src/c...
icu4j/main/classes/core/src/com/ibm/icu/text/RBBITableBuilder.java:879: int[]
newArray = Arrays.copyOf(sd.fDtran, sd.fDtran.length - 1);
Arrays.copyOf() was added in Java 1.6, so we're okay using it.  (A lot of
methods like this were not added until Java 1.7, which is why I checked)

https://codereview.appspot.com/331690043/diff/1/icu4j/main/classes/core/src/c...
icu4j/main/classes/core/src/com/ibm/icu/text/RBBITableBuilder.java:1049: int
rowLen = 4 + fRB.fSetBuilder.getNumCharCategories();   // Row Length in shorts.
Is this why we start at a nonzero offset?

https://codereview.appspot.com/331690043/diff/1/icu4j/main/classes/core/src/c...
icu4j/main/classes/core/src/com/ibm/icu/text/RBBITableBuilder.java:1067: int
rowLenInBytes = rowLen * 2;
Why did this change?

https://codereview.appspot.com/331690043/diff/1/icu4j/main/classes/core/src/c...
File icu4j/main/classes/core/src/com/ibm/icu/text/RuleBasedBreakIterator.java
(right):

https://codereview.appspot.com/331690043/diff/1/icu4j/main/classes/core/src/c...
icu4j/main/classes/core/src/com/ibm/icu/text/RuleBasedBreakIterator.java:228: */
Probably wants an @deprecated?

https://codereview.appspot.com/331690043/diff/1/icu4j/main/tests/core/src/com...
File icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/RBBITest.java (right):

https://codereview.appspot.com/331690043/diff/1/icu4j/main/tests/core/src/com...
icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/RBBITest.java:596: //   
System.out.printf("Duplicate columns (%d, %d)\n", c1, c2);
Why do you have this commented out?

https://codereview.appspot.com/331690043/diff/1/icu4j/main/tests/core/src/com...
icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/RBBITest.java:620: // if
(rows.get(r1).equals(rows.get(r2))) {
Same down here
Sign in to reply to this message.

Powered by Google App Engine
RSS Feeds Recent Issues | This issue
This is Rietveld f62528b