|
|
Created:
10 years, 3 months ago by Brett Modified:
10 years, 3 months ago CC:
connector-cr_google.com Visibility:
Public. |
DescriptionThis change avoids crawling hidden files or directories.
When crawling a directory, hidden files and subdirectories
are not included in the returned content listing. This
change also verifies that the requested file/folder does
not reside within a hidded ancester directory.
One exception is made. Windows Administrative Shares are
hidden by default, so we allow the root start point to be
hidden, but not its descendents. This is consistent with
the behaviour of the filesystem connector.
This change includes another minor enhancement. It verifies
that the traversal user has read access to a file or folder
before returning it.
Patch Set 1 #
Total comments: 8
Patch Set 2 : Drop check for isReadable. #
Total comments: 2
Patch Set 3 : Drop check for administrative shares. Better logging for hidden items. #
Total comments: 3
Patch Set 4 : OK. Feed hidden directories and files, but return not found on crawl request. #
MessagesTotal messages: 21
I'm not sure I like the isReadable check. It seems like it actually makes it harder to diagnose lack of access. https://codereview.appspot.com/39390046/diff/1/src/com/google/enterprise/adap... File src/com/google/enterprise/adaptor/fs/FsAdaptor.java (right): https://codereview.appspot.com/39390046/diff/1/src/com/google/enterprise/adap... src/com/google/enterprise/adaptor/fs/FsAdaptor.java:296: log.log(Level.WARNING, "The path {0} is not a supported file type.", This warning is misleading if you are checking for read access. https://codereview.appspot.com/39390046/diff/1/src/com/google/enterprise/adap... src/com/google/enterprise/adaptor/fs/FsAdaptor.java:437: return !rootPath.equals(p) && delegate.isHidden(p); isHidden being different for the root was very unexpected. It seems isSupportedPath is the only place that relies on that behavior. Could the rootPath check be done there?
Sign in to reply to this message.
I'm not sure I like the isReadable check. It seems like it actually makes it harder to diagnose lack of access.
Sign in to reply to this message.
https://codereview.appspot.com/39390046/diff/1/src/com/google/enterprise/adap... File src/com/google/enterprise/adaptor/fs/FsAdaptor.java (right): https://codereview.appspot.com/39390046/diff/1/src/com/google/enterprise/adap... src/com/google/enterprise/adaptor/fs/FsAdaptor.java:296: log.log(Level.WARNING, "The path {0} is not a supported file type.", On 2013/12/12 17:20:42, ejona wrote: > This warning is misleading if you are checking for read access. I think the best thing to do is push log comments down to the individual checks, like the File System Connector's isQualifiedFile() does. https://codereview.appspot.com/39390046/diff/1/src/com/google/enterprise/adap... src/com/google/enterprise/adaptor/fs/FsAdaptor.java:428: && delegate.isReadable(p) && !isHidden(p); On 2013/12/12 17:20:42, ejona wrote: > I'm not sure I like the isReadable check. It seems like it actually makes it > harder to diagnose lack of access. b/11579314 shows it wasn't any easier to diagnose lack of access when exceptions were thrown. I will rework these checks to have better logging as to the specific failure. https://codereview.appspot.com/39390046/diff/1/src/com/google/enterprise/adap... src/com/google/enterprise/adaptor/fs/FsAdaptor.java:437: return !rootPath.equals(p) && delegate.isHidden(p); On 2013/12/12 17:20:42, ejona wrote: > isHidden being different for the root was very unexpected. It seems > isSupportedPath is the only place that relies on that behavior. Could the > rootPath check be done there? Yes, I could move it there. I will duplicate the administrative share comment there and below tho.
Sign in to reply to this message.
https://codereview.appspot.com/39390046/diff/1/src/com/google/enterprise/adap... File src/com/google/enterprise/adaptor/fs/FsAdaptor.java (right): https://codereview.appspot.com/39390046/diff/1/src/com/google/enterprise/adap... src/com/google/enterprise/adaptor/fs/FsAdaptor.java:428: && delegate.isReadable(p) && !isHidden(p); On 2013/12/12 17:37:12, Brett wrote: > On 2013/12/12 17:20:42, ejona wrote: > > I'm not sure I like the isReadable check. It seems like it actually makes it > > harder to diagnose lack of access. > > b/11579314 shows it wasn't any easier to diagnose lack of access when exceptions > were thrown. I will rework these checks to have better logging as to the > specific failure. I don't buy that. It has reasonably clearly "Caused by: java.nio.file.AccessDeniedException: ...\access\file4.ppt" After this change you won't even notice the file is missing. Even if we add logging that we skipped it, that doesn't mean you see the warning before the log rolls over. As it stands now, you can see something is wrong with the document in Index Diagnostics.
Sign in to reply to this message.
The File System Connector customers did not like exceptions being thrown when the traversal user did not have access. They knew there were files and folders that the connector was not allowed to index. They just wanted that lack of access logged and for the connector to proceed. They considered logged exceptions as problems that needed addressing - typically problems with the connector itself of with the computing/network infrastructure. -- Brett M. Johnson On Dec 12, 2013, at 9:47 AM, ejona@google.com wrote: > > https://codereview.appspot.com/39390046/diff/1/src/com/google/enterprise/adap... > File src/com/google/enterprise/adaptor/fs/FsAdaptor.java (right): > > https://codereview.appspot.com/39390046/diff/1/src/com/google/enterprise/adap... > src/com/google/enterprise/adaptor/fs/FsAdaptor.java:428: && > delegate.isReadable(p) && !isHidden(p); > On 2013/12/12 17:37:12, Brett wrote: >> On 2013/12/12 17:20:42, ejona wrote: >> > I'm not sure I like the isReadable check. It seems like it actually > makes it >> > harder to diagnose lack of access. > >> b/11579314 shows it wasn't any easier to diagnose lack of access when > exceptions >> were thrown. I will rework these checks to have better logging as to > the >> specific failure. > > I don't buy that. It has reasonably clearly "Caused by: > java.nio.file.AccessDeniedException: ...\access\file4.ppt" > > After this change you won't even notice the file is missing. Even if we > add logging that we skipped it, that doesn't mean you see the warning > before the log rolls over. As it stands now, you can see something is > wrong with the document in Index Diagnostics. > > https://codereview.appspot.com/39390046/
Sign in to reply to this message.
I agree with Eric the isReadable check is not needed. I would prefer an exception in the log and an entry in the Index Diagnostic that shows a read failure. I would also expect that some customers are going to care about hidden files and others will not so we should consider adding a config parameter for this. https://codereview.appspot.com/39390046/diff/1/src/com/google/enterprise/adap... File src/com/google/enterprise/adaptor/fs/FileDelegate.java (right): https://codereview.appspot.com/39390046/diff/1/src/com/google/enterprise/adap... src/com/google/enterprise/adaptor/fs/FileDelegate.java:60: boolean isReadable(Path doc) throws IOException; isReadable is not really needed. We should just throw an access denied exception on a read error. I'm not sure we could possible return an appropriate error code the GSA on a read failure during getDocContent. https://codereview.appspot.com/39390046/diff/1/src/com/google/enterprise/adap... File src/com/google/enterprise/adaptor/fs/FsAdaptor.java (right): https://codereview.appspot.com/39390046/diff/1/src/com/google/enterprise/adap... src/com/google/enterprise/adaptor/fs/FsAdaptor.java:426: private boolean isSupportedPath(Path p) throws IOException { isSupportedPath is just suppose to check if the file type is of a supported type like a regular file or a folder. Remove the isReadable and isHidden from here.
Sign in to reply to this message.
Drop check for isReadable.
Sign in to reply to this message.
LGTM
Sign in to reply to this message.
https://codereview.appspot.com/39390046/diff/20001/src/com/google/enterprise/... File src/com/google/enterprise/adaptor/fs/FsAdaptor.java (right): https://codereview.appspot.com/39390046/diff/20001/src/com/google/enterprise/... src/com/google/enterprise/adaptor/fs/FsAdaptor.java:286: if (!isVisibleDescendantOfRoot(doc)) { I think we should make a distinction here on where the doc is not a descendant or is hidden. Also, I would suggest changing the wording of the log message "not visible" does not have a specific meaning the Windows world. While "hidden" does and points admins to the hidden attribute of a file/folder. https://codereview.appspot.com/39390046/diff/20001/src/com/google/enterprise/... src/com/google/enterprise/adaptor/fs/FsAdaptor.java:431: && (rootPath.equals(p) || !delegate.isHidden(p)); This is not correct: SMB Administrative Shares are marked as "hidden". In the FS connector the admin share was hidden because of the a library was marking as hidden. Calling Files.isHidden on an admin share will return false.
Sign in to reply to this message.
-- Brett M. Johnson On Dec 12, 2013, at 5:56 PM, mifern@google.com wrote: > > https://codereview.appspot.com/39390046/diff/20001/src/com/google/enterprise/... > File src/com/google/enterprise/adaptor/fs/FsAdaptor.java (right): > > https://codereview.appspot.com/39390046/diff/20001/src/com/google/enterprise/... > src/com/google/enterprise/adaptor/fs/FsAdaptor.java:286: if > (!isVisibleDescendantOfRoot(doc)) { > I think we should make a distinction here on where the doc is not a > descendant or is hidden. > > Also, I would suggest changing the wording of the log message "not > visible" does not have a specific meaning the Windows world. While > "hidden" does and points admins to the hidden attribute of a > file/folder. > > https://codereview.appspot.com/39390046/diff/20001/src/com/google/enterprise/... > src/com/google/enterprise/adaptor/fs/FsAdaptor.java:431: && > (rootPath.equals(p) || !delegate.isHidden(p)); > This is not correct: SMB Administrative Shares are marked as "hidden”. I’m not sure how to interpret that comment. Is the conditional not correct, or is the statement about administrative shares being hidden not correct? > > In the FS connector the admin share was hidden because > of the a library was marking as hidden. I interpret this mangled sentence as stating that JCIFS SmbFile.isHidden() will return true for administrative shares, whereas ... > Calling Files.isHidden on an admin share will > return false. NIO Files.isHidden() will return false. > > https://codereview.appspot.com/39390046/
Sign in to reply to this message.
The statement about admin shares being hidden is not true. I can't comment on SmbFile.isHidden, but NIO Files.isHidden() will return false for admin shares. Also, Windows does not set the hidden attribute for admin shares. On 2013/12/13 18:43:00, brett.michael.johnson_gmail.com wrote: > -- > Brett M. Johnson > > > > > On Dec 12, 2013, at 5:56 PM, mailto:mifern@google.com wrote: > > > > > > https://codereview.appspot.com/39390046/diff/20001/src/com/google/enterprise/... > > File src/com/google/enterprise/adaptor/fs/FsAdaptor.java (right): > > > > > https://codereview.appspot.com/39390046/diff/20001/src/com/google/enterprise/... > > src/com/google/enterprise/adaptor/fs/FsAdaptor.java:286: if > > (!isVisibleDescendantOfRoot(doc)) { > > I think we should make a distinction here on where the doc is not a > > descendant or is hidden. > > > > Also, I would suggest changing the wording of the log message "not > > visible" does not have a specific meaning the Windows world. While > > "hidden" does and points admins to the hidden attribute of a > > file/folder. > > > > > https://codereview.appspot.com/39390046/diff/20001/src/com/google/enterprise/... > > src/com/google/enterprise/adaptor/fs/FsAdaptor.java:431: && > > (rootPath.equals(p) || !delegate.isHidden(p)); > > This is not correct: SMB Administrative Shares are marked as "hidden”. > > I’m not sure how to interpret that comment. > Is the conditional not correct, or is the > statement about administrative shares being > hidden not correct? > > > > > In the FS connector the admin share was hidden because > > of the a library was marking as hidden. > > I interpret this mangled sentence as stating > that JCIFS SmbFile.isHidden() will return true > for administrative shares, whereas ... > > > > Calling Files.isHidden on an admin share will > > return false. > > NIO Files.isHidden() will return false. > > > > > https://codereview.appspot.com/39390046/ >
Sign in to reply to this message.
Drop check for administrative shares. Better logging for hidden items.
Sign in to reply to this message.
https://codereview.appspot.com/39390046/diff/40001/src/com/google/enterprise/... File src/com/google/enterprise/adaptor/fs/FsAdaptor.java (right): https://codereview.appspot.com/39390046/diff/40001/src/com/google/enterprise/... src/com/google/enterprise/adaptor/fs/FsAdaptor.java:425: && !delegate.isHidden(p); Is calling delegate.isHidden really needed here? Hidden files are handled in isVisibleDescendantOfRoot. Now we will send a file that is hidden in the listing for a folder but when the GSA send a getDocContent request we will return not found. This is the same behavior we have for files that the adaptor does not have read access. This helps in debugging missing and read file errors.
Sign in to reply to this message.
https://codereview.appspot.com/39390046/diff/40001/src/com/google/enterprise/... File src/com/google/enterprise/adaptor/fs/FsAdaptor.java (right): https://codereview.appspot.com/39390046/diff/40001/src/com/google/enterprise/... src/com/google/enterprise/adaptor/fs/FsAdaptor.java:425: && !delegate.isHidden(p); On 2013/12/13 22:16:05, mifern wrote: > Is calling delegate.isHidden really needed here? Hidden files are handled in > isVisibleDescendantOfRoot. > > Now we will send a file that is hidden in the listing for a folder but when the > GSA send a getDocContent request we will return not found. This is the same > behavior we have for files that the adaptor does not have read access. This > helps in debugging missing and read file errors. I would prefer that behavior as well, but the other behavior isn't "wrong".
Sign in to reply to this message.
https://codereview.appspot.com/39390046/diff/40001/src/com/google/enterprise/... File src/com/google/enterprise/adaptor/fs/FsAdaptor.java (right): https://codereview.appspot.com/39390046/diff/40001/src/com/google/enterprise/... src/com/google/enterprise/adaptor/fs/FsAdaptor.java:425: && !delegate.isHidden(p); On 2013/12/13 22:16:05, mifern wrote: > Is calling delegate.isHidden really needed here? Hidden files are handled in > isVisibleDescendantOfRoot. > > Now we will send a file that is hidden in the listing for a folder but when the > GSA send a getDocContent request we will return not found. This is the same > behavior we have for files that the adaptor does not have read access. This > helps in debugging missing and read file errors. I would really rather not feed files that are hidden. The check here avoids feeding hidden files in directory listings and from the monitor. The check in isVisibleDescendantOfRoot was specifically designed to catch files or ancestors that become hidden after being initially fed. Do you know the real reason the customers requested we not index hidden files in the connector? It not that we were serving up top-secret stuff or personal porn collections. It was because they were nearly always system files and directories. They didn't want us indexing System32 and .Trashes.
Sign in to reply to this message.
OK. Feed hidden directories and files, but return not found on crawl request.
Sign in to reply to this message.
On 2013/12/14 00:40:27, Brett wrote: > OK. Feed hidden directories and files, but return not found on crawl request. I am okay with it either way. This feels like a you and Miguel decision. It would seem to be true that system files are possibly better left unseen at all by the GSA. But on the other hand seeing the file in the listing makes it obvious the adaptor is aware of the file.
Sign in to reply to this message.
If it was up to me, I wouldn’t feed items that were hidden or unreadable. However, the philosophy of the adaptors seems to be feed everything, so there is some sort of tracking history, then let the GSA deal with the failures to crawl. It didn’t really make sense to have access denied vs hidden files handled differently in this case. Unless you want the mere existence of hidden files to remain a secret, but then we shouldn’t log it either. In this case, I decided to go for consistency, rather than convenience. If, in the future, the customers complain that the access-denied and file-not-found errors are too much noise, we can revisit the issue. -- Brett M. Johnson On Dec 13, 2013, at 4:50 PM, ejona@google.com wrote: > On 2013/12/14 00:40:27, Brett wrote: >> OK. Feed hidden directories and files, but return not found on crawl > request. > > I am okay with it either way. This feels like a you and Miguel decision. > It would seem to be true that system files are possibly better left > unseen at all by the GSA. But on the other hand seeing the file in the > listing makes it obvious the adaptor is aware of the file. > > https://codereview.appspot.com/39390046/
Sign in to reply to this message.
LGTM
Sign in to reply to this message.
|