|
|
Created:
7 years, 10 months ago by lchandramouli Modified:
7 years, 8 months ago CC:
connector-cr_google.com Visibility:
Public. |
DescriptionSharePoint Adaptor to support multiple site collections
New Property in adaptor-config:
==============================
sharepoint.siteCollectionsToInclude : Provide list of comma separated site collections to be indexed.
and sharepoint.server : Provide virtual server URL.
For eg:
sharepoint.siteCollectionsToInclude = "http://spserver2016.gsa-connectors.com/sites/sitecoll1, http://spserver2016.gsa-connectors.com/sites/sitecoll2"
Patch Set 1 #
Total comments: 6
Patch Set 2 : Removed extra tab space #
Total comments: 13
Patch Set 3 : Updated review comments. #
Total comments: 22
Patch Set 4 : Added tests to retrieve doc content from excluded site collection and excluded root site collection… #
Total comments: 6
Patch Set 5 : When both sharepoint.siteCollectionsToInclude and sharepoint.siteCollectiononly properties are spec… #
Total comments: 8
Patch Set 6 : Added tests to verify siteCollectionsToInclude property in SharePointUrl object. #
Total comments: 12
Patch Set 7 : Adaptor will throw exception when sp.siteCollectionToInclude is specified and sp.siteCollectionOnly… #
Total comments: 5
Patch Set 8 : Added test when both sp.siteCollectionsToInclude and sp.siteCollectionOnly flag is specified. #
MessagesTotal messages: 29
On 2016/06/24 21:15:05, lchandramouli wrote: Can you please update code review description to explain how to use "sharepoint.siteCollections" property.
Sign in to reply to this message.
Nearly there - just a few last tweaks needed. https://codereview.appspot.com/303890044/diff/1/src/com/google/enterprise/ada... File src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java (right): https://codereview.appspot.com/303890044/diff/1/src/com/google/enterprise/ada... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:392: Please remove trailing spaces (here, line 514, 515, and throughout). It's considered "not worth it" for existing lines of code (because it changes the last-person-who-edited-this-line-of-code attribution), but it's definitely not too late (since this has not yet been submitted). https://codereview.appspot.com/303890044/diff/1/src/com/google/enterprise/ada... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:645: if (configuredSharePointUrl.isSiteCollectionUrl() suggestion: move lines 645-651 to just before lines 639-643. It doesn't make much sense to do the log (lines 640-642) if you're going to throw the InvalidConfigurationException. https://codereview.appspot.com/303890044/diff/1/src/com/google/enterprise/ada... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:2123: if (!isSiteCollectionIncluded(siteUrl)) { log.finer() message about skipping site (siteUrl) ? (Or, if you think there would be too many such log messages, just keep a counter of how many were skipped, and log it as a single message after the loop?)
Sign in to reply to this message.
https://codereview.appspot.com/303890044/diff/1/src/com/google/enterprise/ada... File src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java (right): https://codereview.appspot.com/303890044/diff/1/src/com/google/enterprise/ada... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:392: On 2016/06/24 22:55:17, myk wrote: > Please remove trailing spaces (here, line 514, 515, and throughout). > It's considered "not worth it" for existing lines of code (because it > changes the last-person-who-edited-this-line-of-code attribution), but > it's definitely not too late (since this has not yet been submitted). Done. https://codereview.appspot.com/303890044/diff/1/src/com/google/enterprise/ada... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:645: if (configuredSharePointUrl.isSiteCollectionUrl() On 2016/06/24 22:55:17, myk wrote: > suggestion: move lines 645-651 to just before lines 639-643. It doesn't make > much sense to do the log (lines 640-642) if you're going to throw the > InvalidConfigurationException. Done. https://codereview.appspot.com/303890044/diff/1/src/com/google/enterprise/ada... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:2123: if (!isSiteCollectionIncluded(siteUrl)) { On 2016/06/24 22:55:17, myk wrote: > log.finer() message about skipping site (siteUrl) ? > (Or, if you think there would be too many such log messages, just keep a counter > of how many were skipped, and log it as a single message after the loop?) Done.
Sign in to reply to this message.
https://codereview.appspot.com/303890044/diff/20001/src/com/google/enterprise... File src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java (right): https://codereview.appspot.com/303890044/diff/20001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:513: config.addKey("sharepoint.siteCollections", ""); This seems like an unfortunate conflict between siteCollectionOnly mode and siteCollections. So now I can set sharepoint.server to a site collection URL and optionally set siteCollectionOnly, or set sharepoint.server to the server URL and add the site collection URL to siteCollections? Should we deprecate the former, or expand it to simply allow multiple site collections in sharepoint.server? https://codereview.appspot.com/303890044/diff/20001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:591: log.log(Level.CONFIG, "List of SiteCollections To Index: {0}", s/SiteCollections To Index/Site Collections to index/ (As an aside, even if this were title case and "Index" were capitalized, "to" is a preposition and should not be.) https://codereview.appspot.com/303890044/diff/20001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:1075: } Is this missing a continue? https://codereview.appspot.com/303890044/diff/20001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:1656: + "belongs to included site collection.", docId.getUniqueId()); s/belongs/belong/ https://codereview.appspot.com/303890044/diff/20001/test/com/google/enterpris... File test/com/google/enterprise/adaptor/sharepoint/SharePointAdaptorTest.java (right): https://codereview.appspot.com/303890044/diff/20001/test/com/google/enterpris... test/com/google/enterprise/adaptor/sharepoint/SharePointAdaptorTest.java:1112: There should minimally be a test of an excluded site collection with getDocContent. I don't know how hard the setup is to test an excluded site collection with getModifiedDocIds, but if we don't do it here we just add to the manual regression test burden.
Sign in to reply to this message.
Followup to new patch set. John L https://codereview.appspot.com/303890044/diff/80001/src/com/google/enterprise... File src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java (right): https://codereview.appspot.com/303890044/diff/80001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:1071: log.log(Level.FINE, "Excluding site collection URL {0}", siteString); This logging seems a little inconsistent. Here, you're logging each item individually, in getModifiedDocIdsSite you're not logging excluded sites at all, and farther down you're collecting a list to log once at the end. Should these be more similar or is this on purpose? https://codereview.appspot.com/303890044/diff/80001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:1445: "Site Collection Url may not be null."); s/Url/URL/ https://codereview.appspot.com/303890044/diff/80001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:2109: Set<String> excluded = new HashSet<String>(); Would a more consistent order in the logs be useful, using a TreeSet or LinkedHashSet? https://codereview.appspot.com/303890044/diff/80001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:2132: if(excluded.size() > 0){ Add space after "if" and before "{". https://codereview.appspot.com/303890044/diff/80001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:2133: log.log(Level.FINE, "List of site collections excluded to index {0}", s/to index/from index:/
Sign in to reply to this message.
A few comments on John's comments. I also do agree with the spelling/grammar corrections, but I feel much more strongly about the ones acknowledged below. https://codereview.appspot.com/303890044/diff/20001/src/com/google/enterprise... File src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java (right): https://codereview.appspot.com/303890044/diff/20001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:513: config.addKey("sharepoint.siteCollections", ""); On 2016/06/27 23:55:11, JohnL wrote: > This seems like an unfortunate conflict between siteCollectionOnly mode and > siteCollections. So now I can set sharepoint.server to a site collection URL and > optionally set siteCollectionOnly, or set sharepoint.server to the server URL > and add the site collection URL to siteCollections? Should we deprecate the > former, or expand it to simply allow multiple site collections in > sharepoint.server? I'm ok with deprecrating the former (except that we may need the user to specify the server as well as the single collection) - that is, have it still work, but indicate that they should specify the single collection in the "sharepoint.siteCollections" field, specify the server address in "sharepoint.server", and eliminate the use of the "sharepoint.siteCollectionOnly" boolean. https://codereview.appspot.com/303890044/diff/20001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:591: log.log(Level.CONFIG, "List of SiteCollections To Index: {0}", On 2016/06/27 23:55:11, JohnL wrote: > s/SiteCollections To Index/Site Collections to index/ > > (As an aside, even if this were title case and "Index" were capitalized, "to" is > a preposition and should not be.) Acknowledged. https://codereview.appspot.com/303890044/diff/20001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:1075: } On 2016/06/27 23:55:11, JohnL wrote: > Is this missing a continue? I think it is. Lavanya? https://codereview.appspot.com/303890044/diff/80001/src/com/google/enterprise... File src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java (right): https://codereview.appspot.com/303890044/diff/80001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:1071: log.log(Level.FINE, "Excluding site collection URL {0}", siteString); On 2016/06/28 00:49:47, JohnL wrote: > This logging seems a little inconsistent. Here, you're logging each item > individually, in getModifiedDocIdsSite you're not logging excluded sites at all, > and farther down you're collecting a list to log once at the end. Should these > be more similar or is this on purpose? I definitely agree with the consistency request. The only reason I see for collecting a list / outputting at end is that you could decide on a threshhold (say 10 sites) - if you are only excluding up to that threshhold, output each. Otherwise, just output up to that threshhold, and indicate how many additional URLs are not being logged. https://codereview.appspot.com/303890044/diff/80001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:2109: Set<String> excluded = new HashSet<String>(); On 2016/06/28 00:49:47, JohnL wrote: > Would a more consistent order in the logs be useful, using a TreeSet or > LinkedHashSet? TreeSet would be my choice - the URLs get sorted into alphabetical order. https://codereview.appspot.com/303890044/diff/80001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:2132: if(excluded.size() > 0){ On 2016/06/28 00:49:47, JohnL wrote: > Add space after "if" and before "{". Acknowledged.
Sign in to reply to this message.
Thank you. https://codereview.appspot.com/303890044/diff/80001/src/com/google/enterprise... File src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java (right): https://codereview.appspot.com/303890044/diff/80001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:510: // Set this to true when you want to index only single site collection add "and ignore web application's access controls" https://codereview.appspot.com/303890044/diff/80001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:513: config.addKey("sharepoint.siteCollections", ""); Maybe we should add a note along the lines of: """ If siteCollections is provided then the sharepoint.server variable has to be set to a webapplication (not site collection) which we have access to. If siteColletionOnly is set to true then sharepoint.server variable has to be set to a sitecollection (not a webapplication). siteCollectionOnly being true means we skip webapplication access. Only one of siteCollections and siteCollectionOnly can be used at same time. """ https://codereview.appspot.com/303890044/diff/80001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:640: + "sharepoint.siteCollections is specified."); Maybe this config should provide more text explanation?
Sign in to reply to this message.
Thank you. https://codereview.appspot.com/303890044/diff/80001/src/com/google/enterprise... File src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java (right): https://codereview.appspot.com/303890044/diff/80001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:513: config.addKey("sharepoint.siteCollections", ""); Would it make more sense to make this variable be called "siteCollectionNameFilter"? or "siteCollectionsToIncludeRegexp"? It could even then be -- an unnecessary -- check on the name in siteCollectionOnly mode?
Sign in to reply to this message.
https://codereview.appspot.com/303890044/diff/20001/src/com/google/enterprise... File src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java (right): https://codereview.appspot.com/303890044/diff/20001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:513: config.addKey("sharepoint.siteCollections", ""); On 2016/06/27 23:55:11, JohnL wrote: > This seems like an unfortunate conflict between siteCollectionOnly mode and > siteCollections. So now I can set sharepoint.server to a site collection URL and > optionally set siteCollectionOnly, or set sharepoint.server to the server URL > and add the site collection URL to siteCollections? Should we deprecate the > former, or expand it to simply allow multiple site collections in > sharepoint.server? With sharepoint.siteCollectionsToInclude property we can configure adaptor to index single site collection similar to what sharepoint.siteCollectionOnly provides. One difference is anonymous setting and web app policy is applicable with "sharepoint.siteCollectionsToInclude" But sharepoint.siteCollectionOnly is a dual purpose feature. It allows adaptor to index site collections without requiring permissions at web application policy level. Note that "sharepoint.siteCollectionsToInclude" rely on access to virtual server object (requires web app policy permission). For now we need both. Note : sharepoint.server points to virtual server URL (web application path) when "sharepoint.siteCollectionsToInclude" is used. https://codereview.appspot.com/303890044/diff/20001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:591: log.log(Level.CONFIG, "List of SiteCollections To Index: {0}", On 2016/06/27 23:55:11, JohnL wrote: > s/SiteCollections To Index/Site Collections to index/ > > (As an aside, even if this were title case and "Index" were capitalized, "to" is > a preposition and should not be.) Done. https://codereview.appspot.com/303890044/diff/20001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:1075: } On 2016/06/27 23:55:11, JohnL wrote: > Is this missing a continue? Done. Thank you. https://codereview.appspot.com/303890044/diff/20001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:1656: + "belongs to included site collection.", docId.getUniqueId()); On 2016/06/27 23:55:11, JohnL wrote: > s/belongs/belong/ Done. https://codereview.appspot.com/303890044/diff/20001/test/com/google/enterpris... File test/com/google/enterprise/adaptor/sharepoint/SharePointAdaptorTest.java (right): https://codereview.appspot.com/303890044/diff/20001/test/com/google/enterpris... test/com/google/enterprise/adaptor/sharepoint/SharePointAdaptorTest.java:1112: On 2016/06/27 23:55:11, JohnL wrote: > There should minimally be a test of an excluded site collection with > getDocContent. I don't know how hard the setup is to test an excluded site > collection with getModifiedDocIds, but if we don't do it here we just add to the > manual regression test burden. Done. https://codereview.appspot.com/303890044/diff/80001/src/com/google/enterprise... File src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java (right): https://codereview.appspot.com/303890044/diff/80001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:513: config.addKey("sharepoint.siteCollections", ""); On 2016/06/29 00:55:07, pjo wrote: > Maybe we should add a note along the lines of: > > > """ > If siteCollections is provided then the sharepoint.server variable has to be set > to a webapplication (not site collection) which we have access to. > > If siteColletionOnly is set to true then sharepoint.server variable has to be > set to a sitecollection (not a webapplication). siteCollectionOnly being true > means we skip webapplication access. > > Only one of siteCollections and siteCollectionOnly can be used at same time. > """ Acknowledged. https://codereview.appspot.com/303890044/diff/80001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:513: config.addKey("sharepoint.siteCollections", ""); On 2016/06/29 00:59:36, pjo wrote: > Would it make more sense to make this variable be called > "siteCollectionNameFilter"? or "siteCollectionsToIncludeRegexp"? > > It could even then be -- an unnecessary -- check on the name in > siteCollectionOnly mode? Done. https://codereview.appspot.com/303890044/diff/80001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:640: + "sharepoint.siteCollections is specified."); On 2016/06/29 00:55:07, pjo wrote: > Maybe this config should provide more text explanation? Done. https://codereview.appspot.com/303890044/diff/80001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:1071: log.log(Level.FINE, "Excluding site collection URL {0}", siteString); On 2016/06/28 00:49:47, JohnL wrote: > This logging seems a little inconsistent. Here, you're logging each item > individually, in getModifiedDocIdsSite you're not logging excluded sites at all, > and farther down you're collecting a list to log once at the end. Should these > be more similar or is this on purpose? Done. https://codereview.appspot.com/303890044/diff/80001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:1071: log.log(Level.FINE, "Excluding site collection URL {0}", siteString); On 2016/06/28 20:14:42, myk wrote: > On 2016/06/28 00:49:47, JohnL wrote: > > This logging seems a little inconsistent. Here, you're logging each item > > individually, in getModifiedDocIdsSite you're not logging excluded sites at > all, > > and farther down you're collecting a list to log once at the end. Should these > > be more similar or is this on purpose? > > I definitely agree with the consistency request. The only reason I see for > collecting a list / outputting at end is that you could decide on a threshhold > (say 10 sites) - if you are only excluding up to that threshhold, output each. > Otherwise, just output up to that threshhold, and indicate how many additional > URLs are not being logged. Done. https://codereview.appspot.com/303890044/diff/80001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:1445: "Site Collection Url may not be null."); On 2016/06/28 00:49:47, JohnL wrote: > s/Url/URL/ Done. https://codereview.appspot.com/303890044/diff/80001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:2109: Set<String> excluded = new HashSet<String>(); On 2016/06/28 00:49:47, JohnL wrote: > Would a more consistent order in the logs be useful, using a TreeSet or > LinkedHashSet? Done. https://codereview.appspot.com/303890044/diff/80001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:2109: Set<String> excluded = new HashSet<String>(); On 2016/06/28 20:14:42, myk wrote: > On 2016/06/28 00:49:47, JohnL wrote: > > Would a more consistent order in the logs be useful, using a TreeSet or > > LinkedHashSet? > > TreeSet would be my choice - the URLs get sorted into alphabetical order. Done. https://codereview.appspot.com/303890044/diff/80001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:2132: if(excluded.size() > 0){ On 2016/06/28 00:49:47, JohnL wrote: > Add space after "if" and before "{". Done. https://codereview.appspot.com/303890044/diff/80001/src/com/google/enterprise... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:2133: log.log(Level.FINE, "List of site collections excluded to index {0}", On 2016/06/28 00:49:47, JohnL wrote: > s/to index/from index:/ Done.
Sign in to reply to this message.
Looking better - still hoping to avoid the different configuration values necessarily being incompatible. See ideas at first comment below. https://codereview.appspot.com/303890044/diff/140001/src/com/google/enterpris... File src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java (right): https://codereview.appspot.com/303890044/diff/140001/src/com/google/enterpris... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:649: + "sharepoint.siteCollectionOnly is set to false."); If site collection only mode is set, then (rather than also specifying siteCollectionsToIndex being an automatic InvalidConfigurationException), how about: (1) If siteCollectionsToIndex consists of only one entry (matching the sharepoint.server), then quietly index that one site collection. (2) If siteCollectionsToIndex contains multiple items, one of which matches the sharepoint.server entry, then indicate the others are excluded (because they don't match the sharepoint.server value). (3) If siteCollectionsToIndex contains multiple items, none of which matches the sharepoint.server entry, you could either log a message and ignore the siteCollectionsToIndex value (and just index the one sharepoint.server siteCollection), or (better, in my opinion) only *then* throw the InvalidConfigurationException, pointing out that the one siteCollection specified (in sharepoint.server) is not included in the list of siteCollectionsToIndex. https://codereview.appspot.com/303890044/diff/140001/src/com/google/enterpris... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:2150: } this "}" should line up with the "if" on line 2147 / the "writer" on line 2151. https://codereview.appspot.com/303890044/diff/140001/src/com/google/enterpris... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:3374: log.exiting("SiteAdaptor", "getAdaptorForUrl", null); Make sure to have at least one different value here (so that it's not a duplicate of line 3369). Since the line number isn't included in the "log.existing()" output, there is no way to figure out which of the two lines was called.
Sign in to reply to this message.
Thank you. Please review. https://codereview.appspot.com/303890044/diff/140001/src/com/google/enterpris... File src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java (right): https://codereview.appspot.com/303890044/diff/140001/src/com/google/enterpris... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:649: + "sharepoint.siteCollectionOnly is set to false."); On 2016/06/29 23:25:33, myk wrote: > If site collection only mode is set, then (rather than also specifying > siteCollectionsToIndex being an automatic InvalidConfigurationException), how > about: > > (1) If siteCollectionsToIndex consists of only one entry (matching the > sharepoint.server), then quietly index that one site collection. > (2) If siteCollectionsToIndex contains multiple items, one of which matches the > sharepoint.server entry, then indicate the others are excluded (because they > don't match the sharepoint.server value). > (3) If siteCollectionsToIndex contains multiple items, none of which matches the > sharepoint.server entry, you could either log a message and ignore the > siteCollectionsToIndex value (and just index the one sharepoint.server > siteCollection), or (better, in my opinion) only *then* throw the > InvalidConfigurationException, pointing out that the one siteCollection > specified (in sharepoint.server) is not included in the list of > siteCollectionsToIndex. As discussed offline, when both sharepoint.siteCollectionsToInclude and sharepoint.siteCollectionOnly properties are specified, adaptor will ignore siteCollectionOnly flag (logs a warning message) and continue to index site collections specified in siteCollectionsToInclude property. https://codereview.appspot.com/303890044/diff/140001/src/com/google/enterpris... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:2150: } On 2016/06/29 23:25:33, myk wrote: > this "}" should line up with the "if" on line 2147 / the "writer" on line 2151. Acknowledged. https://codereview.appspot.com/303890044/diff/140001/src/com/google/enterpris... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:3374: log.exiting("SiteAdaptor", "getAdaptorForUrl", null); On 2016/06/29 23:25:33, myk wrote: > Make sure to have at least one different value here (so that it's not a > duplicate of line 3369). Since the line number isn't included in the > "log.existing()" output, there is no way to figure out which of the two lines > was called. Done.
Sign in to reply to this message.
https://codereview.appspot.com/303890044/diff/180001/test/com/google/enterpri... File test/com/google/enterprise/adaptor/sharepoint/SharePointAdaptorTest.java (right): https://codereview.appspot.com/303890044/diff/180001/test/com/google/enterpri... test/com/google/enterprise/adaptor/sharepoint/SharePointAdaptorTest.java:3767: public void testSharePointUrlIsSiteCollectionUrl() { please add tests to verify changes in SharePointUrl object. e.g. specify siteCollectionstoinclude with siteCollectionOnly flag as true. Also verify new InvalidConfigurationException
Sign in to reply to this message.
Only two more nits from me: the log.exiting() lines that are still identical (we spoke about this offile), and the ImmutableSet (which I describe in the comment on SharePointAdaptor.java:1777). Tanmay is asking for one additional unit test (which I agree with). I've pointed out where in the code is the exception you should test for. Thank you! -- Marc https://codereview.appspot.com/303890044/diff/180001/src/com/google/enterpris... File src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java (right): https://codereview.appspot.com/303890044/diff/180001/src/com/google/enterpris... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:1777: siteCollectionsToInclude = new TreeSet<String>(); Shouldn't it be siteCollectionsToInclude that's an ImmutableSet, than siteCollections (which you declare in line 1774, and iterate through in line 1778, and then never use again)? https://codereview.appspot.com/303890044/diff/180001/src/com/google/enterpris... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:1789: throw new InvalidConfigurationException("sharepoint.server value " This is the new Exception that Tanmay is asking for a unit test for. https://codereview.appspot.com/303890044/diff/180001/src/com/google/enterpris... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:1799: log.log(Level.WARNING, "Overriding sharepoint.siteCollectionOnly " Much better than throwing an exception here - thank you!
Sign in to reply to this message.
https://codereview.appspot.com/303890044/diff/180001/src/com/google/enterpris... File src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java (right): https://codereview.appspot.com/303890044/diff/180001/src/com/google/enterpris... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:1777: siteCollectionsToInclude = new TreeSet<String>(); On 2016/07/01 21:26:53, myk wrote: > Shouldn't it be siteCollectionsToInclude that's an ImmutableSet, than > siteCollections (which you declare in line 1774, and iterate through in line > 1778, and then never use again)? Done. https://codereview.appspot.com/303890044/diff/180001/src/com/google/enterpris... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:1789: throw new InvalidConfigurationException("sharepoint.server value " On 2016/07/01 21:26:53, myk wrote: > This is the new Exception that Tanmay is asking for a unit test for. Done. https://codereview.appspot.com/303890044/diff/180001/src/com/google/enterpris... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:1799: log.log(Level.WARNING, "Overriding sharepoint.siteCollectionOnly " On 2016/07/01 21:26:53, myk wrote: > Much better than throwing an exception here - thank you! Acknowledged. https://codereview.appspot.com/303890044/diff/180001/test/com/google/enterpri... File test/com/google/enterprise/adaptor/sharepoint/SharePointAdaptorTest.java (right): https://codereview.appspot.com/303890044/diff/180001/test/com/google/enterpri... test/com/google/enterprise/adaptor/sharepoint/SharePointAdaptorTest.java:3767: public void testSharePointUrlIsSiteCollectionUrl() { On 2016/07/01 21:13:31, Tanmay Vartak wrote: > please add tests to verify changes in SharePointUrl object. > e.g. specify siteCollectionstoinclude with siteCollectionOnly flag as true. > Also verify new InvalidConfigurationException Done.
Sign in to reply to this message.
LGTM, except for one line with newly-added trailing whitespace. Tanmay also LGTM'd it (out loud), and PJ said he's fine with it if John Lacey is. https://codereview.appspot.com/303890044/diff/240001/src/com/google/enterpris... File src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java (right): https://codereview.appspot.com/303890044/diff/240001/src/com/google/enterpris... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:1776: .split(siteCollectionsToIncludeConfig.toLowerCase(Locale.ENGLISH)); please remove trailing whitespace (which also makes the line >80 characters)
Sign in to reply to this message.
Partial review, I haven't looked at the tests yet. Mostly nits, but I'd like to change the outcome when the flags conflict. John L https://codereview.appspot.com/303890044/diff/240001/src/com/google/enterpris... File src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java (right): https://codereview.appspot.com/303890044/diff/240001/src/com/google/enterpris... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:1778: for(String url : siteCollections) { Add space after "for" https://codereview.appspot.com/303890044/diff/240001/src/com/google/enterpris... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:1783: log.log(Level.CONFIG, "List of SiteCollections to index: {0}", s/SiteCollections/Site Collection/ (or lowercased, too, as elsewhere). https://codereview.appspot.com/303890044/diff/240001/src/com/google/enterpris... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:1788: if (siteCollectionsToInclude.size() > 0 Use !isEmpty() or size() > 0 consistently (just above, here, and below). https://codereview.appspot.com/303890044/diff/240001/src/com/google/enterpris... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:1800: log.log(Level.WARNING, "Overriding sharepoint.siteCollectionOnly " I thought that PJ wanted this to be OK if size() == 1, and I strongly agree with that. siteCollectionOnly is not deprecated or replaced by this new feature; it is required in some cloud environments. I think we should change this to > 1 and say something like "siteCollectionOnly mode is not supported with multiple siteCollectionsToInclude URLs". I think it would be a reasonable FR to add support for multiple site collections URLs in the future (or cloud deployments could be a real pain for larger customers).
Sign in to reply to this message.
(John, please clarify your comment about line 1800.) Lavanya, please do implement these last suggestions from John. Then this be submitted. https://codereview.appspot.com/303890044/diff/240001/src/com/google/enterpris... File src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java (right): https://codereview.appspot.com/303890044/diff/240001/src/com/google/enterpris... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:1800: log.log(Level.WARNING, "Overriding sharepoint.siteCollectionOnly " On 2016/07/02 00:46:00, JohnL wrote: > I thought that PJ wanted this to be OK if size() == 1, and I strongly agree with > that. siteCollectionOnly is not deprecated or replaced by this new feature; it > is required in some cloud environments. I think we should change this to > 1 and > say something like "siteCollectionOnly mode is not supported with multiple > siteCollectionsToInclude URLs". I think it would be a reasonable FR to add > support for multiple site collections URLs in the future (or cloud deployments > could be a real pain for larger customers). The first part of this (up to, and including, changing this to > 1 (but if it's equal to 1, we need to make sure that the single siteCollectionsToInclude matches sharepoint.server)). As far as your last sentence - do you mean, in the future, a reasonable FR would be to add support for multiple site collection URLs *with* sharepoint.siteCollectionOnly set to true? (Because otherwise, this is the change that's implementing your suggested future FR.)
Sign in to reply to this message.
On 2016/07/02 01:14:38, myk wrote: > (John, please clarify your comment about line 1800.) Yes, I meant a future FR to make siteCollectionOnly = true work with multiple siteCollectionsToInclude URLs. > Lavanya, please do implement these last suggestions from John. Then this be > submitted. I thought about it a bit, and it's more work than just size() > 1 is required to make sharepoint.server = siteCollectionURL sharepoint.siteCollectionOnly = true the same as sharepoint.server = serverURL sharepoint.siteCollectionOnly = true sharepoint.siteCollectionsToInclude = siteCollectionURL We could do that in a followup CL if we're all agreed they should be the same. John L
Sign in to reply to this message.
On 2016/07/02 01:57:41, JohnL wrote: > On 2016/07/02 01:14:38, myk wrote: > > (John, please clarify your comment about line 1800.) > > Yes, I meant a future FR to make siteCollectionOnly = true work with multiple > siteCollectionsToInclude URLs. > > > Lavanya, please do implement these last suggestions from John. Then this be > > submitted. > > I thought about it a bit, and it's more work than just size() > 1 is required to > make > > sharepoint.server = siteCollectionURL > sharepoint.siteCollectionOnly = true > > the same as > > sharepoint.server = serverURL > sharepoint.siteCollectionOnly = true > sharepoint.siteCollectionsToInclude = siteCollectionURL > > We could do that in a followup CL if we're all agreed they should be the same. > > John L My understanding is that: sharepoint.server = serverURL sharepoint.siteCollectionOnly = true sharepoint.siteCollectionsToInclude = siteCollectionURL and sharepoint.server = siteCollectionURL sharepoint.siteCollectionOnly = true both will call siteCollectionURL (as a site collection), without needing to use any "full suite"-crawling power. So -- yes, I think they should be treated the same.
Sign in to reply to this message.
https://codereview.appspot.com/303890044/diff/240001/src/com/google/enterpris... File src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java (right): https://codereview.appspot.com/303890044/diff/240001/src/com/google/enterpris... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:1800: log.log(Level.WARNING, "Overriding sharepoint.siteCollectionOnly " On 2016/07/02 00:46:00, JohnL wrote: > I thought that PJ wanted this to be OK if size() == 1, and I strongly agree with > that. siteCollectionOnly is not deprecated or replaced by this new feature; it > is required in some cloud environments. I think we should change this to > 1 and > say something like "siteCollectionOnly mode is not supported with multiple > siteCollectionsToInclude URLs". I think it would be a reasonable FR to add > support for multiple site collections URLs in the future (or cloud deployments > could be a real pain for larger customers). with current implementation, we still have support for indexing single site collection with less than web app policy permissions. Just don't specify siteCollectionsToInclude when you don't have enough permissions. we can document this explicitly. In future when we want to support multiple site collections (with less than web app policy permissions e.g hosted sharepoint env), we can turn off overriding of siteCollectionOnly flag. Note this requires significant code change as each site collection needs to maintain its own change token for incremental updates.
Sign in to reply to this message.
On 2016/07/04 23:25:15, myk wrote: > On 2016/07/02 01:57:41, JohnL wrote: > > On 2016/07/02 01:14:38, myk wrote: > > > (John, please clarify your comment about line 1800.) > > > > Yes, I meant a future FR to make siteCollectionOnly = true work with multiple > > siteCollectionsToInclude URLs. > > > > > Lavanya, please do implement these last suggestions from John. Then this be > > > submitted. > > > > I thought about it a bit, and it's more work than just size() > 1 is required > to > > make > > > > sharepoint.server = siteCollectionURL > > sharepoint.siteCollectionOnly = true > > > > the same as > > > > sharepoint.server = serverURL > > sharepoint.siteCollectionOnly = true > > sharepoint.siteCollectionsToInclude = siteCollectionURL > > > > We could do that in a followup CL if we're all agreed they should be the same. > > > > John L > > My understanding is that: > > sharepoint.server = serverURL > sharepoint.siteCollectionOnly = true > sharepoint.siteCollectionsToInclude = siteCollectionURL > > and > > sharepoint.server = siteCollectionURL > sharepoint.siteCollectionOnly = true > > both will call siteCollectionURL (as a site collection), without needing to use > any "full suite"-crawling power. > So -- yes, I think they should be treated the same. Right now, the first config is treated as sharepoint.server = serverURL sharepoint.siteCollectionOnly = false sharepoint.siteCollectionsToInclude = siteCollectionURL with a warning, and it requires full access. > with current implementation, we still have support for indexing single site > collection with less than web app policy permissions. Just don't specify > siteCollectionsToInclude when you don't have enough permissions. we can document > this explicitly. I know we're not losing anything by having them conflict like this, and that generalizing the new config for siteCollectionsToInclude with multiple entries is non-trivial, and deferred as a possible FR. It's just that it doesn't make any sense to me for them to conflict unnecessarily. If we had added siteCollectionsToInclude first, I don't think we would have allowed the second config at all, and it even seems reasonable to me to deprecate it. It's just an issue of config semantics, not the implementation. John L
Sign in to reply to this message.
https://codereview.appspot.com/303890044/diff/240001/src/com/google/enterpris... File src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java (right): https://codereview.appspot.com/303890044/diff/240001/src/com/google/enterpris... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:1776: .split(siteCollectionsToIncludeConfig.toLowerCase(Locale.ENGLISH)); On 2016/07/01 23:56:52, myk wrote: > please remove trailing whitespace (which also makes the line >80 characters) Done. https://codereview.appspot.com/303890044/diff/240001/src/com/google/enterpris... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:1778: for(String url : siteCollections) { On 2016/07/02 00:46:00, JohnL wrote: > Add space after "for" Done. https://codereview.appspot.com/303890044/diff/240001/src/com/google/enterpris... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:1783: log.log(Level.CONFIG, "List of SiteCollections to index: {0}", On 2016/07/02 00:46:00, JohnL wrote: > s/SiteCollections/Site Collection/ (or lowercased, too, as elsewhere). Done. https://codereview.appspot.com/303890044/diff/240001/src/com/google/enterpris... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:1788: if (siteCollectionsToInclude.size() > 0 On 2016/07/02 00:46:00, JohnL wrote: > Use !isEmpty() or size() > 0 consistently (just above, here, and below). Done. https://codereview.appspot.com/303890044/diff/240001/src/com/google/enterpris... src/com/google/enterprise/adaptor/sharepoint/SharePointAdaptor.java:1800: log.log(Level.WARNING, "Overriding sharepoint.siteCollectionOnly " On 2016/07/02 01:14:38, myk wrote: > On 2016/07/02 00:46:00, JohnL wrote: > > I thought that PJ wanted this to be OK if size() == 1, and I strongly agree > with > > that. siteCollectionOnly is not deprecated or replaced by this new feature; it > > is required in some cloud environments. I think we should change this to > 1 > and > > say something like "siteCollectionOnly mode is not supported with multiple > > siteCollectionsToInclude URLs". I think it would be a reasonable FR to add > > support for multiple site collections URLs in the future (or cloud deployments > > could be a real pain for larger customers). > > The first part of this (up to, and including, changing this to > 1 (but if it's > equal to 1, we need to make sure that the single siteCollectionsToInclude > matches sharepoint.server)). As far as your last sentence - do you mean, in the > future, a reasonable FR would be to add support for multiple site collection > URLs *with* sharepoint.siteCollectionOnly set to true? (Because otherwise, > this is the change that's implementing your suggested future FR.) As discussed with John L and Marc, added TODO: To support site collection only mode for multiple site collections with sharepoint.siteCollectionstoInclude filter.
Sign in to reply to this message.
LGTM (but please wait for John to say the same).
Sign in to reply to this message.
"Missed it by THAT MUCH!" John L https://codereview.appspot.com/303890044/diff/260001/test/com/google/enterpri... File test/com/google/enterprise/adaptor/sharepoint/SharePointAdaptorTest.java (right): https://codereview.appspot.com/303890044/diff/260001/test/com/google/enterpri... test/com/google/enterprise/adaptor/sharepoint/SharePointAdaptorTest.java:3790: thrown.expect(InvalidConfigurationException.class); This doesn't work. The line above has thrown an exception, and this block will never be executed. You need to do at least one of these in a separate test method. I would move each of these new failures into separate test methods.
Sign in to reply to this message.
https://codereview.appspot.com/303890044/diff/260001/test/com/google/enterpri... File test/com/google/enterprise/adaptor/sharepoint/SharePointAdaptorTest.java (right): https://codereview.appspot.com/303890044/diff/260001/test/com/google/enterpri... test/com/google/enterprise/adaptor/sharepoint/SharePointAdaptorTest.java:3790: thrown.expect(InvalidConfigurationException.class); On 2016/07/07 00:33:42, JohnL wrote: > This doesn't work. The line above has thrown an exception, and this block will > never be executed. You need to do at least one of these in a separate test > method. I would move each of these new failures into separate test methods. Thank You. For Invalid web server URL, we already have a test: testGetDocContentMultipleSCInvalidServerURL. Added new test for MultipleSC with SCOnly flag.
Sign in to reply to this message.
new test looks good (but again please wait for John's LGTM before submitting). https://codereview.appspot.com/303890044/diff/260001/test/com/google/enterpri... File test/com/google/enterprise/adaptor/sharepoint/SharePointAdaptorTest.java (right): https://codereview.appspot.com/303890044/diff/260001/test/com/google/enterpri... test/com/google/enterprise/adaptor/sharepoint/SharePointAdaptorTest.java:3790: thrown.expect(InvalidConfigurationException.class); On 2016/07/07 00:33:42, JohnL wrote: > This doesn't work. The line above has thrown an exception, and this block will > never be executed. You need to do at least one of these in a separate test > method. I would move each of these new failures into separate test methods. oops - I missed this in the test. https://codereview.appspot.com/303890044/diff/260001/test/com/google/enterpri... test/com/google/enterprise/adaptor/sharepoint/SharePointAdaptorTest.java:3790: thrown.expect(InvalidConfigurationException.class); On 2016/07/07 17:30:01, lchandramouli wrote: > On 2016/07/07 00:33:42, JohnL wrote: > > This doesn't work. The line above has thrown an exception, and this block will > > never be executed. You need to do at least one of these in a separate test > > method. I would move each of these new failures into separate test methods. > > Thank You. > > For Invalid web server URL, we already have a test: > testGetDocContentMultipleSCInvalidServerURL. > > Added new test for MultipleSC with SCOnly flag. new test LGTM.
Sign in to reply to this message.
LGTM John L https://codereview.appspot.com/303890044/diff/260001/test/com/google/enterpri... File test/com/google/enterprise/adaptor/sharepoint/SharePointAdaptorTest.java (right): https://codereview.appspot.com/303890044/diff/260001/test/com/google/enterpri... test/com/google/enterprise/adaptor/sharepoint/SharePointAdaptorTest.java:3790: thrown.expect(InvalidConfigurationException.class); On 2016/07/07 17:44:55, myk wrote: > On 2016/07/07 00:33:42, JohnL wrote: > > This doesn't work. The line above has thrown an exception, and this block will > > never be executed. You need to do at least one of these in a separate test > > method. I would move each of these new failures into separate test methods. > > oops - I missed this in the test. As an aside, this is kind of cute, in a mildly dangerous way. If either of these two blocks is broken and fails to throw an exception, then the test will still pass. ;-)
Sign in to reply to this message.
|