|
|
Created:
7 years, 6 months ago by imysak Modified:
7 years, 1 month ago CC:
connector-cr_google.com, sfruhwald Visibility:
Public. |
Description<#11781292 Add support for non-incremental groups push
Patch Set 1 #
Total comments: 1
Patch Set 2 : #11781292 Add support for non-incremental groups push #Patch Set 3 : #11781292 Add support for non-incremental groups push #
Total comments: 14
Patch Set 4 : #11781292 Add support for non-incremental groups push #Patch Set 5 : #11781292 Add support for non-incremental groups push #
Total comments: 4
Patch Set 6 : #11781292 Add support for non-incremental groups push #
Total comments: 1
Patch Set 7 : #11781292 Add support for non-incremental groups push #
Total comments: 8
MessagesTotal messages: 21
https://codereview.appspot.com/311210043/diff/1/src/com/google/enterprise/ada... File src/com/google/enterprise/adaptor/DocIdPusher.java (right): https://codereview.appspot.com/311210043/diff/1/src/com/google/enterprise/ada... src/com/google/enterprise/adaptor/DocIdPusher.java:172: boolean caseSensitive, boolean incremental) throws InterruptedException; You can't change a public interface like this. It breaks every adaptor, both Google's and third-party adaptors. You need to add a separate method, either a new overload with the incremental option, or just a new method that always does full pushes. I also don't like changing this part of the code without fixing or at least thinking about fixing the broken caseSensitive flag, which should be part of the Principal rather than a global parameter.
Sign in to reply to this message.
+1 to being backwards compatible. There are dozens of V4 connectors in the world. Updating library can require recompile. Updating library should not require changing adaptor code. Thank you - - technology's compounding interest On Tue, Oct 25, 2016 at 6:26 PM, <jlacey@google.com> wrote: > > https://codereview.appspot.com/311210043/diff/1/src/com/goog > le/enterprise/adaptor/DocIdPusher.java > File src/com/google/enterprise/adaptor/DocIdPusher.java (right): > > https://codereview.appspot.com/311210043/diff/1/src/com/goog > le/enterprise/adaptor/DocIdPusher.java#newcode172 > src/com/google/enterprise/adaptor/DocIdPusher.java:172: boolean > caseSensitive, boolean incremental) throws InterruptedException; > You can't change a public interface like this. It breaks every adaptor, > both Google's and third-party adaptors. You need to add a separate > method, either a new overload with the incremental option, or just a new > method that always does full pushes. > > I also don't like changing this part of the code without fixing or at > least thinking about fixing the broken caseSensitive flag, which should > be part of the Principal rather than a global parameter. > > https://codereview.appspot.com/311210043/ >
Sign in to reply to this message.
+1 Also please check if each of my previous comments for this CL have been addressed / reacted: https://codereview.appspot.com/317770043/ Thanks, Szabi On Oct 25, 2016 21:19, "PJ" <pjo@google.com> wrote: > +1 to being backwards compatible. There are dozens of V4 connectors in the > world. Updating library can require recompile. Updating library should not > require changing adaptor code. > > Thank you > > - > - technology's compounding interest > > On Tue, Oct 25, 2016 at 6:26 PM, <jlacey@google.com> wrote: > >> >> https://codereview.appspot.com/311210043/diff/1/src/com/goog >> le/enterprise/adaptor/DocIdPusher.java >> File src/com/google/enterprise/adaptor/DocIdPusher.java (right): >> >> https://codereview.appspot.com/311210043/diff/1/src/com/goog >> le/enterprise/adaptor/DocIdPusher.java#newcode172 >> src/com/google/enterprise/adaptor/DocIdPusher.java:172: boolean >> caseSensitive, boolean incremental) throws InterruptedException; >> You can't change a public interface like this. It breaks every adaptor, >> both Google's and third-party adaptors. You need to add a separate >> method, either a new overload with the incremental option, or just a new >> method that always does full pushes. >> >> I also don't like changing this part of the code without fixing or at >> least thinking about fixing the broken caseSensitive flag, which should >> be part of the Principal rather than a global parameter. >> >> https://codereview.appspot.com/311210043/ >> > >
Sign in to reply to this message.
this CL doesn't change public interfaces, fix full push group deffinition, we don't need to change adapttor code
Sign in to reply to this message.
refactoring affected tests
Sign in to reply to this message.
Thank you! https://codereview.appspot.com/311210043/diff/40001/src/com/google/enterprise... File src/com/google/enterprise/adaptor/DocIdSender.java (right): https://codereview.appspot.com/311210043/diff/40001/src/com/google/enterprise... src/com/google/enterprise/adaptor/DocIdSender.java:229: return pushGroupDefinitionsInternal(defs, caseSensitive, true, handler); Why not simply the following? return pushGroupDefinitionsInternal(defs, caseSensitive, !journal.isFullPushStarted(), handler); https://codereview.appspot.com/311210043/diff/40001/src/com/google/enterprise... src/com/google/enterprise/adaptor/DocIdSender.java:291: log.log(Level.WARNING, please put all params into newline (looks better than now with mixed indents) https://codereview.appspot.com/311210043/diff/40001/test/com/google/enterpris... File test/com/google/enterprise/adaptor/DocIdSenderTest.java (right): https://codereview.appspot.com/311210043/diff/40001/test/com/google/enterpris... test/com/google/enterprise/adaptor/DocIdSenderTest.java:53: private static final Map<GroupPrincipal, Collection<Principal>> SAMPLE_DATE did you mean SAMPLE_DATA here ? https://codereview.appspot.com/311210043/diff/40001/test/com/google/enterpris... test/com/google/enterprise/adaptor/DocIdSenderTest.java:56: NON_SPLITTED_EXPECTED_RESULT = nonsplittedExpectedResult(); camelCase https://codereview.appspot.com/311210043/diff/40001/test/com/google/enterpris... test/com/google/enterprise/adaptor/DocIdSenderTest.java:391: Map<GroupPrincipal, Collection<Principal>> groups = SAMPLE_DATE; why do we need local references here (and below + other test cases)? https://codereview.appspot.com/311210043/diff/40001/test/com/google/enterpris... test/com/google/enterprise/adaptor/DocIdSenderTest.java:403: assertEquals(Arrays.asList(new String[] { this should fit on 2 lines only (+ below + other test cases)
Sign in to reply to this message.
Thanks Szabi, applying changes following comments
Sign in to reply to this message.
https://codereview.appspot.com/311210043/diff/40001/src/com/google/enterprise... File src/com/google/enterprise/adaptor/DocIdSender.java (right): https://codereview.appspot.com/311210043/diff/40001/src/com/google/enterprise... src/com/google/enterprise/adaptor/DocIdSender.java:229: return pushGroupDefinitionsInternal(defs, caseSensitive, true, handler); On 2016/11/11 01:43:28, sfruhwald wrote: > Why not simply the following? > > return pushGroupDefinitionsInternal(defs, caseSensitive, > !journal.isFullPushStarted(), handler); Done. https://codereview.appspot.com/311210043/diff/40001/src/com/google/enterprise... src/com/google/enterprise/adaptor/DocIdSender.java:291: log.log(Level.WARNING, On 2016/11/11 01:43:28, sfruhwald wrote: > please put all params into newline (looks better than now with mixed indents) Done. https://codereview.appspot.com/311210043/diff/40001/test/com/google/enterpris... File test/com/google/enterprise/adaptor/DocIdSenderTest.java (right): https://codereview.appspot.com/311210043/diff/40001/test/com/google/enterpris... test/com/google/enterprise/adaptor/DocIdSenderTest.java:53: private static final Map<GroupPrincipal, Collection<Principal>> SAMPLE_DATE On 2016/11/11 01:43:29, sfruhwald wrote: > did you mean SAMPLE_DATA here ? Done. https://codereview.appspot.com/311210043/diff/40001/test/com/google/enterpris... test/com/google/enterprise/adaptor/DocIdSenderTest.java:56: NON_SPLITTED_EXPECTED_RESULT = nonsplittedExpectedResult(); On 2016/11/11 01:43:28, sfruhwald wrote: > camelCase Done. https://codereview.appspot.com/311210043/diff/40001/test/com/google/enterpris... test/com/google/enterprise/adaptor/DocIdSenderTest.java:391: Map<GroupPrincipal, Collection<Principal>> groups = SAMPLE_DATE; On 2016/11/11 01:43:29, sfruhwald wrote: > why do we need local references here (and below + other test cases)? Done. https://codereview.appspot.com/311210043/diff/40001/test/com/google/enterpris... test/com/google/enterprise/adaptor/DocIdSenderTest.java:403: assertEquals(Arrays.asList(new String[] { On 2016/11/11 01:43:28, sfruhwald wrote: > this should fit on 2 lines only (+ below + other test cases) Done.
Sign in to reply to this message.
https://codereview.appspot.com/311210043/diff/40001/test/com/google/enterpris... File test/com/google/enterprise/adaptor/DocIdSenderTest.java (right): https://codereview.appspot.com/311210043/diff/40001/test/com/google/enterpris... test/com/google/enterprise/adaptor/DocIdSenderTest.java:53: private static final Map<GroupPrincipal, Collection<Principal>> SAMPLE_DATE On 2016/11/11 18:55:33, imysak wrote: > On 2016/11/11 01:43:29, sfruhwald wrote: > > did you mean SAMPLE_DATA here ? > > Done. not done https://codereview.appspot.com/311210043/diff/40001/test/com/google/enterpris... test/com/google/enterprise/adaptor/DocIdSenderTest.java:56: NON_SPLITTED_EXPECTED_RESULT = nonsplittedExpectedResult(); On 2016/11/11 18:55:32, imysak wrote: > On 2016/11/11 01:43:28, sfruhwald wrote: > > camelCase > > Done. not done
Sign in to reply to this message.
Applying fixes
Sign in to reply to this message.
https://codereview.appspot.com/311210043/diff/80001/src/com/google/enterprise... File src/com/google/enterprise/adaptor/Journal.java (right): https://codereview.appspot.com/311210043/diff/80001/src/com/google/enterprise... src/com/google/enterprise/adaptor/Journal.java:342: * @return <code>true</code> means that full push was started and still in process. line > 80 (as this project is still on 80) https://codereview.appspot.com/311210043/diff/80001/src/com/google/enterprise... src/com/google/enterprise/adaptor/Journal.java:399: * @return <code>true</code> means that incremental push was started and still in process. line > 80 progress? https://codereview.appspot.com/311210043/diff/80001/test/com/google/enterpris... File test/com/google/enterprise/adaptor/DocIdSenderTest.java (right): https://codereview.appspot.com/311210043/diff/80001/test/com/google/enterpris... test/com/google/enterprise/adaptor/DocIdSenderTest.java:55: private static final List<List<Map.Entry<GroupPrincipal, Collection<Principal>>>> line > 80 (as this project is still on 80) https://codereview.appspot.com/311210043/diff/80001/test/com/google/enterpris... test/com/google/enterprise/adaptor/DocIdSenderTest.java:57: private static final List<List<Map.Entry<GroupPrincipal, Collection<Principal>>>> line > 80
Sign in to reply to this message.
Applying fixes
Sign in to reply to this message.
LGTM (fix that small typo pls) https://codereview.appspot.com/311210043/diff/100001/src/com/google/enterpris... File src/com/google/enterprise/adaptor/Journal.java (right): https://codereview.appspot.com/311210043/diff/100001/src/com/google/enterpris... src/com/google/enterprise/adaptor/Journal.java:399: * @return <code>true</code> if incremental push are still in progress. is :)
Sign in to reply to this message.
Applying fixes
Sign in to reply to this message.
Just few words to describe my CL. In Adapters we have two pusher strategies: Full(non-incremental) pusher and incremental pusher. In fact we have problem in both. This CL fixes only Full pusher strategy. Before CL: client needs clean(reset) indexes and then recrawle AD adapter to handle Groups deletions. After CL: client don't need to reset index, all deleted groups will be updated(deleted) in GSA after next FullPusher process occurred. Handling deletions for incremental group push require extra work and can be done in another CL if needed.
Sign in to reply to this message.
Thank you https://codereview.appspot.com/311210043/diff/120001/src/com/google/enterpris... File src/com/google/enterprise/adaptor/DocIdSender.java (right): https://codereview.appspot.com/311210043/diff/120001/src/com/google/enterpris... src/com/google/enterprise/adaptor/DocIdSender.java:226: defs, caseSensitive, !journal.isFullPushStarted(), handler); Can both full push and incremental push proceeed at the same time? https://codereview.appspot.com/311210043/diff/120001/src/com/google/enterpris... src/com/google/enterprise/adaptor/DocIdSender.java:226: defs, caseSensitive, !journal.isFullPushStarted(), handler); Althought this behaviour would work for AD adaptor, other adaptors might depend on always having incremental pushing. So I don't think we can force this behaviour. Maybe we can flag control it, something like adaptor.fullDocIdPushImpliesFullGroupsPush, with default of false. Still thinking about this proposed solution.
Sign in to reply to this message.
https://codereview.appspot.com/311210043/diff/120001/src/com/google/enterpris... File src/com/google/enterprise/adaptor/DocIdSender.java (right): https://codereview.appspot.com/311210043/diff/120001/src/com/google/enterpris... src/com/google/enterprise/adaptor/DocIdSender.java:226: defs, caseSensitive, !journal.isFullPushStarted(), handler); On 2016/12/05 23:34:27, pjo wrote: > Can both full push and incremental push proceeed at the same time? For AD adaptor - no, but looks like for others type adaptors - it could happens. I see what you mean..
Sign in to reply to this message.
I just make small debug to verify my bug or ensure in safety of my solution. We have synchronization point in DonIdSender.pushGroupDefinitionsInternal() > journal.recordGroupPushStarted(); So although both full push and incremental push can proceed at the same time, the part of it - groupPushDefinitions can't proceed in same time. I agree with your doubts about my solution, just I wanted to ensure that it is worked and safe. https://codereview.appspot.com/311210043/diff/120001/src/com/google/enterpris... File src/com/google/enterprise/adaptor/DocIdSender.java (left): https://codereview.appspot.com/311210043/diff/120001/src/com/google/enterpris... src/com/google/enterprise/adaptor/DocIdSender.java:274: journal.recordGroupPushStarted(); synchronization point
Sign in to reply to this message.
Just stopping by for a peek. Overall I like this, and basically the OpenText and Documentum connectors were more or less wishing it already worked this way. However, it completely breaks the SharePoint connector, which may call pushGroupDefinitions multiple times from getDocIds. If I'm understanding this change correctly, the successive calls would now overwrite each other. It looks like the connector is only doing that to handle maxFeedUrls, so maybe that support could just be removed from the SharePoint connector (it may predate the same in the library, or maybe was intended to save memory, which I don't think we care as much about as we once did). The SP connector also calls pushGroupDefinitions from getDocContent, which could cause problems, based on the concurrency issues. As a pedantic aside, it looks like multiple group pushes are not synchronized or serialized, simply that concurrent pushes are disallowed and will throw an IllegalStateException. The offending traversal will be retried from DocIdSender. This is not quite the case for crawl requests, which will fail to the GSA, which has a less robust retry mechanism. We could possibly truly serialize the group pushes, but that wouldn't work without further changes for calls to pushGroupDefinitions from getDocContent (due to timeout limits in the library and the GSA crawler). AsyncDocIdPusher does not support group definitions. John L https://codereview.appspot.com/311210043/diff/120001/src/com/google/enterpris... File src/com/google/enterprise/adaptor/DocIdSender.java (right): https://codereview.appspot.com/311210043/diff/120001/src/com/google/enterpris... src/com/google/enterprise/adaptor/DocIdSender.java:307: final int max = incremental ? config.getFeedMaxUrls() : defs.size(); Do we have any idea how scalable this really is? The use of getFeedMaxUrls() might have been overly precautious, but there is a real limit (just the 1 GB max feed file size, I think). There are cases where we would exceed that limit, in which case we would probably rather have used incremental mode (or maybe used a hack of sending an empty full feed followed by incremental feeds; this will interrupt serving for a while but produce the correct results eventually). https://codereview.appspot.com/311210043/diff/120001/src/com/google/enterpris... src/com/google/enterprise/adaptor/DocIdSender.java:330: incremental || !firstBatch); How would you trigger this with max set the way it is? In fact, why no leave max alone, feed as many as you can in full mode, and finish the rest in incremental, as this code is doing? https://codereview.appspot.com/311210043/diff/120001/src/com/google/enterpris... File src/com/google/enterprise/adaptor/GsaFeedFileSender.java (right): https://codereview.appspot.com/311210043/diff/120001/src/com/google/enterpris... src/com/google/enterprise/adaptor/GsaFeedFileSender.java:111: buildPostParameter(sb, "feedtype", "text/plain", Do pre-7.4 GSAs support a feedtype on group feeds? https://codereview.appspot.com/311210043/diff/120001/test/com/google/enterpri... File test/com/google/enterprise/adaptor/DocIdSenderTest.java (right): https://codereview.appspot.com/311210043/diff/120001/test/com/google/enterpri... test/com/google/enterprise/adaptor/DocIdSenderTest.java:57: NON_SPLITTED_EXPECTED_RESULT = nonSplittedExpectedResult(); Splitted is not a word. Maybe unsplit and split, respectively.
Sign in to reply to this message.
|