Hey Guido, > Hi tav (do you have a real name?), http://tav.espians.com/about-tav.html#what-s-your-name > I will ...
16 years, 8 months ago
(2009-02-21 02:10:25 UTC)
#3
Hey Guido,
> Hi tav (do you have a real name?),
http://tav.espians.com/about-tav.html#what-s-your-name
> I will review your patch using Rietveld.
Thanks! Please forgive me as I get accustomed to Rietveld. I checked out the
code when you released it last year -- but yet to actually use it as an app =(
--
Cheers, tav
Hey Guido, I agree with most of your changes, but I'd like to push back ...
16 years, 8 months ago
(2009-02-21 03:14:31 UTC)
#4
Hey Guido,
I agree with most of your changes, but I'd like to push back on nixing the get_*
functions. I'll try and give a rationale below, but please forgive me if it
stops making sense.
The idea with capabilities is simply that object references form the basis of
security. My other favourite language dude, Mark S. Miller, is also working at
Google now and he could explain this far better than I can, but I'll try with a
simple example...
Let's assume that there was an App Engine app called 'Plex'. This app allows
users to run arbitrary Python services on their data.
Now, because we only want to allow services to access the *current* user's data,
we give it indirect access to db.get via a safe_get function in its
__builtins__:
def create_safe_get(user_id, db_get):
def _safe_get(key_name):
return db_get(user_id + key_name)
return _safe_get
safe_builtins = {'safe_get': create_safe_get('tav')}
exec(service, {'__builtins__': safe_builtins})
Now, thanks to the restricted mode, the service cannot access
``safe_get.func_closure`` and get at the original db_get and access other user's
data.
All's good so far.
Now, let's assume that a service wants to make use of ``inspect.getargspec``.
This is a fairly benign function, so we decide to add it to the __builtins__.
Problem.
The service is executing in restricted mode when it calls getargspec! So, when
getargspec accesses func.func_code, an error is raised =(
This is the reason for needing the get_* functions -- for use by *trusted*
functions whilst in restricted execution.
Now, workarounds can be made. But they all add a lot of complications and
interrupt the linear flow of code and will make it less easy to audit any given
function.
With the proposed get_* functions, it is easy to look at the code for getargspec
and see that whilst it uses get_func_code, it doesn't expose it to the caller in
any way, and thus can be passed to untrusted code.
Thus, given that there will be trusted code being called by untrusted code
whilst in restricted mode, any restricted capability (e.g. get/set func_dict)
will have to be exposed via getter/setter functions.
The more restrictions we apply, the more of these functions there will have to
be. I am happy to code functions for the various other restrictions on function
objects that you want back in.
I decided to put these in the sys module as trusted code is likely to have
access to the sys module. But they could just as well be placed in a separate
module.
So I hope this explains why those functions are absolutely vital to the
simplicity of the approach.
Forgive me if I was confusing. Please tell me which bits were confusing and I'll
try to explain it better =(
--
Many thanks again, tav
http://codereview.appspot.com/21043/diff/1/5
File Objects/funcobject.c (left):
http://codereview.appspot.com/21043/diff/1/5#oldcode186
Line 186: return NULL;
In writing various bits of secure code, I've never depended on these attributes
to be unalterable -- but I think you have a point with regards to consistency.
I'd be happy to put these restrictions back in. The flip side though is the need
for more getter/setter functions for use by trusted functions in restricted
mode...
http://codereview.appspot.com/21043/diff/1/3
File Objects/typeobject.c (left):
http://codereview.appspot.com/21043/diff/1/3#oldcode2249
Line 2249: type_subclasses(PyTypeObject *type, PyObject *args_ignored)
Great idea. I'll put it back in with that check.
http://codereview.appspot.com/21043/diff/1/2
File Python/sysmodule.c (right):
http://codereview.appspot.com/21043/diff/1/2#newcode678
Line 678: static PyObject *
Yes, they are needed for use by "trusted" code which is executing in restricted
mode. I'll try to explain above (?) in the "draft message"...
http://codereview.appspot.com/21043/diff/1/2#newcode683
Line 683: "get_func_code() requires a function as its argument.");
Ah bugger, will fix that. Sowwy!
On Fri, Feb 20, 2009 at 9:58 PM, <codesite-noreply@google.com> wrote: > Comment #13 on issue ...
16 years, 8 months ago
(2009-02-21 15:51:42 UTC)
#5
On Fri, Feb 20, 2009 at 9:58 PM, <codesite-noreply@google.com> wrote:
> Comment #13 on issue 671 by asktav: Secure the Python Interpreter
> http://code.google.com/p/googleappengine/issues/detail?id=671
>
> Dear Guido,
>
> I couldn't figure out how to *update* the code on
> http://codereview.appspot.com/21043/show -- so am
> uploading the patch here. Does one just create a new issue on Rietveld?
Just create a new issue. I explained how in my prevous mail.
> This updated py2.5.4.secure.v2.patch does the following:
>
> * type.__subclasses__() is back and denied in restricted mode.
Great.
> * sys.get_subclasses() exists to allow trusted code access in restricted
> mode.
Not needed, see my previous message.
> * Access to GeneratorType's gi_frame denied in restricted mode.
> * sys.get_gi_frame allows trusted code access in restricted mode.
Good.
> * Likewise with FunctionType's func_closure, func_code, func_defaults,
> func_dict and func_globals. With
> corresponding sys.get_func_* functions.
You don't need the latter.
> * All restrictions relating to class __dict__/methods/attributes have been
> removed.
I have hesitations about this, though I see your point. Making classes read-only
in restricted mode breaks certain types of Pythonic idioms. However I don't want
restricted code to be able to modify classes defined in trusted mode, so I think
that getting an object's class should be restricted -- that way hopefully the
trusted code can limit the access to its classes from restricted code. (More
below.)
> As ``rexec`` proved,
> class-based security is pretty difficult to get right. Closures-based
> security is the simplest and "guaranteed"
> to work method -- e.g. Google Caja.
I think of rexec as based on closures too -- however it was designed in a time
when only classic classes existed, and that provided certain extra problems. I
believe the problem was that we had to use classes in some cases because the
standard APIs are defined in terms of objects, so we had to define a classic
class in trusted mode that implemented a safe version of the API (e.g. the file
API) and then pass an instance to the restricted code (e.g. sys.stdout) that was
properly sealed off. IIRC that's where we needed the Bastion class. With
new-style classes this would be easier I believe, though some of their new
special attributes were also the downfall of the system. Hopefully your patch
will evolve into something that will address the latter.
> The removal of class restrictions in
> restricted mode means that trusted
> code wouldn't be needlessly limited.
Agreed for classic classes -- the trusted code should just not use classic
classes to communicate with restricted code, ever.
> * Restrictions related to file/open() have been removed in restricted mode.
> As shown in an example in a
> previous message here, it's possible to securely call open() by trusted code
> whilst untrusted code won't have
> access since they won't be given a direct reference to open().
One of us is missing something here. Any file object has a type, which is the
*same thing* as the open function. (Try this: "assert sys.stdin.__class__ is
open".) So unless you restrict going from an object to its type (which is hard
to do for other reasons, given the built-in __class__ and its great utility) I
don't see how you can be safe this way.
> * Restrictions relaxed on accessing func.__doc__ and func.__name__ in
> restricted mode. It's a common design
> pattern of decorators to copy over a function's __doc__ and __name__.
I know, though I find it horrible. It also causes confusion in tracebacks since
it will print the wrong function name when the traceback references the
decorator.
> There's no reason to deny this pattern
> since no-one relies on those attributes to be read-only for security. It's
> just metadata.
That is true.
> * Patches the dist/inspect/doctest/types modules so that they can be
> potentially used by untrusted code.
Not needed, see above.
> Sorry for all the mixed tenses in the above description.
>
> I've tried to keep all lines in the patch to less than 80 chars.
Check again...
> Thanks again and please let me know what else I could do to meet your
> standards.
So far we're having a lively discussion about how to write rexec mode for a
modern Python. Keep it going! I expect that soon we'll have to open a bug in the
Python issue tracker and get others involved. Please do create an issue on
Rietveld for your next version!
--
--Guido van Rossum (home page: http://www.python.org/~guido/)
http://codereview.appspot.com/21043/diff/1011/17
File Objects/fileobject.c (left):
http://codereview.appspot.com/21043/diff/1011/17#oldcode219
Line 219: }
I disagree with the deletion here. I like to be able to wrap open() with
something that checks permissions, and if allowed, opens the file and returns a
file instance. If we remove this restriction, we'd have to return wrapped file
instances, which hampers performance a lot more.
http://codereview.appspot.com/21043/diff/1011/14
File Objects/methodobject.c (left):
http://codereview.appspot.com/21043/diff/1011/14#oldcode165
Line 165: }
I'm skeptical about making access to any method's 'self' unrestricted. Suppose
we have a trusted object with secret instance data. I'd like to be able to pass
a bound method of that object (which could do security checks) to restricted
code without having to worry about restricted code accessing (or modifying!) the
rest of the object.
Also, this file is only about built-in functions and methods (defined by C
code). Anything defined by Python code (even for new-style classes) uses the
bound method API from classobject.c.
http://codereview.appspot.com/21043/diff/1011/14#oldcode183
Line 183: {"__module__", T_OBJECT, OFF(m_module), WRITE_RESTRICTED},
I think I'd like to keep this; how often does one set __module__ on a C
function? (That is, a built-in function or method!)
Dear Guido, Thank you for such a detailed reply and for not using my full ...
16 years, 8 months ago
(2009-02-22 13:00:33 UTC)
#6
Dear Guido,
Thank you for such a detailed reply and for not using my full name =)
I had studied rexec and was aware of the use of __builtins__. However, it hadn't
twigged that a function would be operating in the context of the __builtins__
defined in its func_globals, and thus there is no need for the various sys.get_*
functions or the Lib/ patches. Doh!! Sorry for wasting your time and mine with
all that fluff.
And, you are right, it really does solve the problem quite elegantly. It's hard
arguing with someone who has a time machine ;p
If it's any consolation, one of my main intended use of all this is to recreate
somewhat of a bastard child of Grail applets. Python services over the cloud --
running directly on App Engine instances for backend services and occasionally
compiled to "secure javascript" (like Caja does) for use on the frontend.
Hopefully the world is ready for this kind of stuff now.
So, given the new found understanding, the two definitive changes I think we can
both agree on is:
* type.__subclasses__ being restricted
* GeneratorType.gi_frame being restricted
In Python 2.6 and above:
* GeneratorType.gi_code also needs to be restricted
We can also now definitely agree on not needing the:
* sys.get_* functions
* Lib/ patches
Again, doh!!
I think the only question left is: whether to lift the other restrictions or
not? This question splits into two halves -- some "immaterial" RESTRICTIONs and
the restrictions related to class structures.
Some of the restrictions, e.g. __module__ on built-in functions/methods, are
immaterial. And my argument for lifting those restrictions is purely in the
context of simplifying the design. Seeing as they don't really impact security
considerations, why make attributes like func_name, __doc__, __module__, etc.
*RESTRICTED? It just seems to add needless complexity to the design and gives
users something else to watch out for...
As for the class restrictions, I'm not so hot on them. I know that the following
will always work:
def FileReader(filename, mode='r', buffering=0):
if mode not in ['r', 'rb', 'rU']:
mode = 'r'
file = open(filename, mode, buffering)
def read(bufsize=-1):
return file.read(bufsize)
return Namespace(read=read)
I can pass FileReader to untrusted code without worry because it'll never be
able to get at the real ``file`` type in restricted mode as it doesn't have
access to func_globals (in restricted mode).
You are right in that it's inefficient. There are 2 function calls for the price
of one. But it's a price that, at least personally, I'm willing to pay for
knowing that there isn't some kind of open exploit. Unfortunately, my knowledge
of the Python internals isn't good enough to know that I could have that same
level of security with class-based structures.
One of my first Python projects (back in 2000) was an IRC bot which provided a
.py feature executing arbitrary Python code. It was a very useful feature for
both learning and teaching Python and was built on top of the rexec framework.
It was also an interesting exercise in learning the limitations of rexec -- as
you can imagine it became a sport for everyone to try and overcome the various
security limitations...
And after trying to patch various holes, I eventually gave up hope and retired
the feature from the bot. I simply didn't have enough knowledge of the various
Python internals with regards to classes. And, sadly, I still don't =(
And, things like descriptors -- which are really elegant btw -- sadly just add
more complexity to the issue. There are (at least seemingly) a *lot* of moving
parts that need to be "locked down".
In contrast, the exposure of function internals is limited to the various func_*
attributes and by denying access to them in restricted mode, I find myself much
more confident... in fact, very confident!
And whilst it is inefficient, it is a price worth paying imo. I also came up
with a cute Namespace() constructor that takes functions which are defined
inside functions and returns a pseudo-class instance like structure so that
normal Pythonic idioms can be used.
It's implemented in
http://github.com/tav/plexnet/blob/9dabc570a2499689e773d1af3599a29102071f80/s...
It takes advantage of tuple's immutability and uses dynamic metaclasses to
create the pseudo-class-instance wrapper. And whilst it's only about 1.2x slower
for "method" calls, it's sadly about 46x slower in comparison to the equivalent
class-based approach for instance initialisation.
My previous dabblings with C were actually an attempt to create Namespace as an
extension type:
http://github.com/tav/plexnet/blob/9dabc570a2499689e773d1af3599a29102071f80/s...
This managed to bring down the instantiation overhead to just 3-5x, but also
lost some functionality along the way, e.g. builtin functions like len() don't
treat a __len__ function which getattro returns in the same way as it normally
treats a __len__ class method. So whilst the feature works in the slow pure
Python capbase.py implementation, it didn't in the C extension. And, as you can
see from .capbase.c, my knowledge of C isn't good enough =(
Also, there are interesting developments like:
http://google-caja.googlecode.com/svn/trunk/doc/html/cajitaOptimization/index...
which I think we could possibly adopt for use on Python/AppEngine.
Sorry for rambling =(
The point that I am trying to make is that despite the inefficiencies, I feel
much more confident with security through a function based approach in
restricted mode than I do with class-based restrictions. And, as such, would
petition for the removal of the various class related restrictions.
To me, this would just give others a false sense of security which we cannot
really guarantee without doing a full-on audit of some kind...
However, I can also see that one day it could be possible to be confident enough
of a class-based approach too. And, seeing as the removal of gi_frame and
__subclasses__ enables me to do what I want, perhaps I shouldn't be too greedy
;p
Sorry again for rambling. Please let me know your thoughts and what I should do
for the next version of the patch.
--
Many thanks again, tav
Issue 21043: Patch to secure Python interpreter
(Closed)
Created 16 years, 8 months ago by GvR
Modified 16 years, 3 months ago
Reviewers: asktav
Base URL: http://svn.python.org/view/*checkout*/python/branches/release25-maint/
Comments: 15