|
|
Created:
13 years, 5 months ago by Kaelyn Modified:
13 years, 3 months ago CC:
codereview-discuss_googlegroups.com Visibility:
Public. |
Patch Set 1 #
Total comments: 22
MessagesTotal messages: 14
http://codereview.appspot.com/4931044/diff/1/.hgignore File .hgignore (right): http://codereview.appspot.com/4931044/diff/1/.hgignore#newcode7 .hgignore:7: \..*sw.?$ What's this? http://codereview.appspot.com/4931044/diff/1/codereview/models.py File codereview/models.py (right): http://codereview.appspot.com/4931044/diff/1/codereview/models.py#newcode199 codereview/models.py:199: message_id = db.StringProperty() # the value of the Message-ID email header Why not to use Message model key directly? http://code.google.com/appengine/docs/python/datastore/keyclass.html
Sign in to reply to this message.
http://codereview.appspot.com/4931044/diff/1/.hgignore File .hgignore (right): http://codereview.appspot.com/4931044/diff/1/.hgignore#newcode7 .hgignore:7: \..*sw.?$ On 2011/08/29 16:07:11, techtonik wrote: > What's this? To ignore the .swp files created by Vim when editing a file--if you edit foo.c, Vim will create a .foo.c.swp for doing things like checkpointing unsaved changes or maintaining long/large undo histories (and of .foo.c.swp exists, it'll try .foo.c.swo, etc). Having mercurial report the files aren't checked in and as a result upload.py asking if I wish to continue just because I haven't closed out all of the editor sessions is quite annoying... http://codereview.appspot.com/4931044/diff/1/codereview/models.py File codereview/models.py (right): http://codereview.appspot.com/4931044/diff/1/codereview/models.py#newcode199 codereview/models.py:199: message_id = db.StringProperty() # the value of the Message-ID email header On 2011/08/29 16:07:11, techtonik wrote: > Why not to use Message model key directly? > http://code.google.com/appengine/docs/python/datastore/keyclass.html Because the References and In-Reply-To email headers need to be set to the value of a Message-ID header (assigned by the SMTP server) from the email that is being referenced or replied to, not some internal AppEngine value. As the comment in the code notes, it is storing the value from that specific email header.
Sign in to reply to this message.
On 2011/08/29 16:31:06, Kaelyn wrote: > Message-ID header (assigned by the SMTP server) I see. App Engine doesn't allow to set Message-ID field, so it should be retrieved and saved after the message has been sent. This review may take a while without unit tests. http://www.google.com/codesearch#Qx8E-7HUBTk/trunk/python/google/appengine/ap...
Sign in to reply to this message.
Is this still active? I'd love to see mails threaded correctly :) -Andi On Mon, Aug 29, 2011 at 6:46 PM, <techtonik@gmail.com> wrote: > On 2011/08/29 16:31:06, Kaelyn wrote: >> >> Message-ID header (assigned by the SMTP server) > > I see. App Engine doesn't allow to set Message-ID field, so it should be > retrieved and saved after the message has been sent. This review may > take a while without unit tests. > > http://www.google.com/codesearch#Qx8E-7HUBTk/trunk/python/google/appengine/ap... > > http://codereview.appspot.com/4931044/ >
Sign in to reply to this message.
So would I along with many folks working on Clang/LLVM (especially those of us at Google). Right now I'm still waiting on review feedback. Cheers, Kaelyn On Tue, Sep 13, 2011 at 7:37 AM, Andi Albrecht <albrecht.andi@googlemail.com > wrote: > Is this still active? I'd love to see mails threaded correctly :) > > -Andi > > On Mon, Aug 29, 2011 at 6:46 PM, <techtonik@gmail.com> wrote: > > On 2011/08/29 16:31:06, Kaelyn wrote: > >> > >> Message-ID header (assigned by the SMTP server) > > > > I see. App Engine doesn't allow to set Message-ID field, so it should be > > retrieved and saved after the message has been sent. This review may > > take a while without unit tests. > > > > > http://www.google.com/codesearch#Qx8E-7HUBTk/trunk/python/google/appengine/ap... > > > > http://codereview.appspot.com/4931044/ > > >
Sign in to reply to this message.
In Gmail I see threads linked correctly already, so I am not sure how to test if this works. Any references to public mail archive with web interface that we can add to CC?
Sign in to reply to this message.
I'm seeing the incorrect threading on my iPhone. Should be easy to test on one of our testing instances. -Andi Am 14.09.2011 um 17:20 schrieb "techtonik@gmail.com" <techtonik@gmail.com>: > In Gmail I see threads linked correctly already, so I am not sure how to > test if this works. Any references to public mail archive with web > interface that we can add to CC? > > http://codereview.appspot.com/4931044/
Sign in to reply to this message.
It took me almost 2 hours to get myself acquainted with threading specifics. Wheew. =) http://codereview.appspot.com/4931044/diff/1/codereview/models.py File codereview/models.py (right): http://codereview.appspot.com/4931044/diff/1/codereview/models.py#newcode199 codereview/models.py:199: message_id = db.StringProperty() # the value of the Message-ID email header All right. But it is better to rename it to email_id to avoid confusion in JS code. http://codereview.appspot.com/4931044/diff/1/codereview/views.py File codereview/views.py (right): http://codereview.appspot.com/4931044/diff/1/codereview/views.py#newcode3110 codereview/views.py:3110: message_id='<unknown>') I would avoid any magic strings (even if it is transformed to token or constant in this case). None is enough. http://codereview.appspot.com/4931044/diff/1/codereview/views.py#newcode3163 codereview/views.py:3163: break Looks like messages without 'send email' will interrupt the thread, because they don't receive email message id from SMTP server, and all subsequent replies with 'send email' will be linked directly to the first message. This seems broken to me. There is an ideal way to fix that: 1. add own field to Message that stores parent key, so that every message is linked to its parent regardless of if it's sent or not 2. References: header in email accepts several message-id's, so if we reply to a message that wasn't sent, so it is ok if we generate ids for such message and include it after the message with true email message ID This will also allow us to show threads in web interface in future without extra MapReduce migrations. http://codereview.appspot.com/4931044/diff/1/codereview/views.py#newcode3657 codereview/views.py:3657: if 'X-Google-Appengine-App-Id' in incoming_msg.original: I believe the purpose of this check is to prevent indefinite loops when somebody tricks AppEngine to send email to its own address. As for the code below - this stuff is better to be done with MapReduce job, but I think we can safely remove that. It won't join messages that were already sent in threads. http://codereview.appspot.com/4931044/diff/1/static/script.js File static/script.js (right): http://codereview.appspot.com/4931044/diff/1/static/script.js#newcode695 static/script.js:695: form.insertBefore(msgIdInput, form.firstChild); JFYI, the input box can be just copied or even constructed from scratch. Textarea is moved back and forth to preserve text contents when user accidentally clicks a link and then hits back button. To make browser remember content, Textarea element should be present on a page when the page is loaded. http://codereview.appspot.com/4931044/diff/1/templates/issue.html File templates/issue.html (right): http://codereview.appspot.com/4931044/diff/1/templates/issue.html#newcode156 templates/issue.html:156: <input type="hidden" This input field can be optional. This will get rid us from magic values.
Sign in to reply to this message.
http://codereview.appspot.com/4931044/diff/1/codereview/views.py File codereview/views.py (right): http://codereview.appspot.com/4931044/diff/1/codereview/views.py#newcode3668 codereview/views.py:3668: logging.debig('Set message_id = "%s" where issue = "%s" and sender = "%s"', Here's a typo: debig -> debug
Sign in to reply to this message.
Thanks for the review feedback so far! :) I haven't uploaded a new patch with the trivial changes (e.g. renaming Message.message_id to Message.email_id) yet because there are other more extensive changes to be made that will require me retesting the new functionality to make sure it works before uploading to this issue. And I'd like to have some sort of agreement on the direction of those changes before spending a lot of time testing them. http://codereview.appspot.com/4931044/diff/1/codereview/models.py File codereview/models.py (right): http://codereview.appspot.com/4931044/diff/1/codereview/models.py#newcode199 codereview/models.py:199: message_id = db.StringProperty() # the value of the Message-ID email header On 2011/09/15 08:48:34, techtonik wrote: > All right. But it is better to rename it to email_id to avoid confusion in JS > code. Done. http://codereview.appspot.com/4931044/diff/1/codereview/views.py File codereview/views.py (right): http://codereview.appspot.com/4931044/diff/1/codereview/views.py#newcode3110 codereview/views.py:3110: message_id='<unknown>') On 2011/09/15 08:48:34, techtonik wrote: > I would avoid any magic strings (even if it is transformed to token or constant > in this case). None is enough. The magic string is to distinguish between messages that predate this patch and messages for which we're expecting to get an email_id but haven't yet (lest things get very messy for existing issues when this patch goes live). http://codereview.appspot.com/4931044/diff/1/codereview/views.py#newcode3163 codereview/views.py:3163: break On 2011/09/15 08:48:34, techtonik wrote: > Looks like messages without 'send email' will interrupt the thread, because they > don't receive email message id from SMTP server, and all subsequent replies with > 'send email' will be linked directly to the first message. This seems broken to > me. Although this is the one case I did not test locally (due to the pain of manually feeding sent emails into a local dev AppEngine server), it shouldn't completely break the thread as only replies to the message without an email_id would be linked to the first message instead. I could also follow RFC 2822 and leave out the In-Reply-To and References email headers entirely if the reply is to a message without an email_id, but figured that would be even less useful since it would mean spawning new email threads. > > There is an ideal way to fix that: > 1. add own field to Message that stores parent key, so that every message is > linked to its parent regardless of if it's sent or not The only issue is identifying the parent regardless of whether it was ever emailed. The simplest way is probably to change the the 'message-email-id' hidden input I added from being the message's email_id to being the message's key(). > 2. References: header in email accepts several message-id's, so if we reply to a > message that wasn't sent, so it is ok if we generate ids for such message and > include it after the message with true email message ID Based on RFC 2822 "Internet Message Format" (http://www.ietf.org/rfc/rfc2822.txt) doing that would be very bad. First, the RFC *require* message IDs such as those in References and In-Reply-To headers to be globally unique. Second, it explicitly states the message IDs used in the References and In-Reply-To fields to be the value of the parent messages' Message-ID field (and possibly including the IDs from parents' References field), or for the headers to not be included if there are no Message-IDs. Finally, the RFC directly discourages including multiple message IDs in the References field: 'Note: Some implementations parse the "References:" field to display the "thread of the discussion". These implementations assume that each new message is a reply to a single parent and hence that they can walk backwards through the "References:" field to find the parent of each message listed there. Therefore, trying to form a "References:" field for a reply that has multiple parents is discouraged and how to do so is not defined in this document.' What would be REALLY nice would be to add a "X-Rietveld-Message-ID" header to the email containing a unique identifier that rietveld generates and stores as a field in Message (or just using the str value of Message.key() as long as it is sane), so that it can accurately fetch and update the email_id of the correct Message when it receives the email. But alas AppEngine is overly restrictive about what email headers can be set. Maybe a similar effect could be achieved by e.g. using GMail-like email aliases, so when sending the email for message 1234 "reply+1234@" is CC'ed instead of simply "reply@"? Having a clear reference to the Message in the datastore embedded somewhere in the email would also eliminate the current need for / purpose of the special '<unknown>' email_id. > > This will also allow us to show threads in web interface in future without extra > MapReduce migrations. Good point. I was so focused on getting the emails threaded I hadn't thought about its utility in the web interface. http://codereview.appspot.com/4931044/diff/1/codereview/views.py#newcode3657 codereview/views.py:3657: if 'X-Google-Appengine-App-Id' in incoming_msg.original: On 2011/09/15 08:48:34, techtonik wrote: > I believe the purpose of this check is to prevent indefinite loops when somebody > tricks AppEngine to send email to its own address. Indeed it was. But I'm also relying on Rietveld/AppEngine sending an email to itself (since it already CC's itself in the messages it sends out) so I can get the SMTP-server-assigned Message-ID from the email and update the email_id of the original Message with it. > > As for the code below - this stuff is better to be done with MapReduce job, but > I think we can safely remove that. It won't join messages that were already sent > in threads. Um, I don't understand what you are talking about, with MapReduce jobs and joining messages. As far as I could tell through the AppEngine docs and local testing of my changes, the code below only updates an existing message with an email_id; it won't create duplicate messages, and defaults to the pre-patch behavior of raising an error on the incoming message if it does not find a Message to update in the datastore. http://codereview.appspot.com/4931044/diff/1/codereview/views.py#newcode3668 codereview/views.py:3668: logging.debig('Set message_id = "%s" where issue = "%s" and sender = "%s"', On 2011/09/15 17:41:42, Andi Albrecht wrote: > Here's a typo: debig -> debug Thanks. Fixed. http://codereview.appspot.com/4931044/diff/1/static/script.js File static/script.js (right): http://codereview.appspot.com/4931044/diff/1/static/script.js#newcode695 static/script.js:695: form.insertBefore(msgIdInput, form.firstChild); On 2011/09/15 08:48:34, techtonik wrote: > JFYI, the input box can be just copied or even constructed from scratch. > Textarea is moved back and forth to preserve text contents when user > accidentally clicks a link and then hits back button. To make browser remember > content, Textarea element should be present on a page when the page is loaded. From what I could discern of the JS code and the associated HTML page, it is also needed--at least for the hidden input widget--to associate a piece of data (the email_id) of a particular comment with the form that is actually submitted. There can be multiple comments with different values for email_id, but only one form that is submitted and prior to this patch the form does not include any information about *which* comment was being replied to. Hence the moving of an appropriate hidden input field containing the email_id to and from the to-be-submitted form in the same manner as the textarea element(s). http://codereview.appspot.com/4931044/diff/1/templates/issue.html File templates/issue.html (right): http://codereview.appspot.com/4931044/diff/1/templates/issue.html#newcode156 templates/issue.html:156: <input type="hidden" On 2011/09/15 08:48:34, techtonik wrote: > This input field can be optional. This will get rid us from magic values. True, but see my main comment about why I added the magic values. ;)
Sign in to reply to this message.
Ping? On Mon, Sep 26, 2011 at 4:20 PM, <rikka@google.com> wrote: > Thanks for the review feedback so far! :) > > I haven't uploaded a new patch with the trivial changes (e.g. renaming > Message.message_id to Message.email_id) yet because there are other more > extensive changes to be made that will require me retesting the new > functionality to make sure it works before uploading to this issue. And > I'd like to have some sort of agreement on the direction of those > changes before spending a lot of time testing them. > > > http://codereview.appspot.com/**4931044/diff/1/codereview/**models.py<http://... > File codereview/models.py (right): > > http://codereview.appspot.com/**4931044/diff/1/codereview/** > models.py#newcode199<http://codereview.appspot.com/4931044/diff/1/codereview/models.py#newcode199> > codereview/models.py:199: message_id = db.StringProperty() # the value > of the Message-ID email header > On 2011/09/15 08:48:34, techtonik wrote: > >> All right. But it is better to rename it to email_id to avoid >> > confusion in JS > >> code. >> > > Done. > > http://codereview.appspot.com/**4931044/diff/1/codereview/**views.py<http://c... > File codereview/views.py (right): > > http://codereview.appspot.com/**4931044/diff/1/codereview/** > views.py#newcode3110<http://codereview.appspot.com/4931044/diff/1/codereview/views.py#newcode3110> > codereview/views.py:3110: message_id='<unknown>') > On 2011/09/15 08:48:34, techtonik wrote: > >> I would avoid any magic strings (even if it is transformed to token or >> > constant > >> in this case). None is enough. >> > > The magic string is to distinguish between messages that predate this > patch and messages for which we're expecting to get an email_id but > haven't yet (lest things get very messy for existing issues when this > patch goes live). > > http://codereview.appspot.com/**4931044/diff/1/codereview/** > views.py#newcode3163<http://codereview.appspot.com/4931044/diff/1/codereview/views.py#newcode3163> > codereview/views.py:3163: break > On 2011/09/15 08:48:34, techtonik wrote: > >> Looks like messages without 'send email' will interrupt the thread, >> > because they > >> don't receive email message id from SMTP server, and all subsequent >> > replies with > >> 'send email' will be linked directly to the first message. This seems >> > broken to > >> me. >> > > Although this is the one case I did not test locally (due to the pain of > manually feeding sent emails into a local dev AppEngine server), it > shouldn't completely break the thread as only replies to the message > without an email_id would be linked to the first message instead. I > could also follow RFC 2822 and leave out the In-Reply-To and References > email headers entirely if the reply is to a message without an email_id, > but figured that would be even less useful since it would mean spawning > new email threads. > > > There is an ideal way to fix that: >> 1. add own field to Message that stores parent key, so that every >> > message is > >> linked to its parent regardless of if it's sent or not >> > > The only issue is identifying the parent regardless of whether it was > ever emailed. The simplest way is probably to change the the > 'message-email-id' hidden input I added from being the message's > email_id to being the message's key(). > > 2. References: header in email accepts several message-id's, so if we >> > reply to a > >> message that wasn't sent, so it is ok if we generate ids for such >> > message and > >> include it after the message with true email message ID >> > > Based on RFC 2822 "Internet Message Format" > (http://www.ietf.org/rfc/**rfc2822.txt<http://www.ietf.org/rfc/rfc2822.txt>) > doing that would be very bad. > First, the RFC *require* message IDs such as those in References and > In-Reply-To headers to be globally unique. Second, it explicitly states > the message IDs used in the References and In-Reply-To fields to be the > value of the parent messages' Message-ID field (and possibly including > the IDs from parents' References field), or for the headers to not be > included if there are no Message-IDs. Finally, the RFC directly > discourages including multiple message IDs in the References field: > 'Note: Some implementations parse the "References:" field to display > the "thread of the discussion". These implementations assume that each > new message is a reply to a single parent and hence that they can walk > backwards through the "References:" field to find the parent of each > message listed there. Therefore, trying to form a "References:" field > for a reply that has multiple parents is discouraged and how to do so is > not defined in this document.' > > What would be REALLY nice would be to add a "X-Rietveld-Message-ID" > header to the email containing a unique identifier that rietveld > generates and stores as a field in Message (or just using the str value > of Message.key() as long as it is sane), so that it can accurately fetch > and update the email_id of the correct Message when it receives the > email. But alas AppEngine is overly restrictive about what email headers > can be set. Maybe a similar effect could be achieved by e.g. using > GMail-like email aliases, so when sending the email for message 1234 > "reply+1234@" is CC'ed instead of simply "reply@"? Having a clear > reference to the Message in the datastore embedded somewhere in the > email would also eliminate the current need for / purpose of the special > '<unknown>' email_id. > > > This will also allow us to show threads in web interface in future >> > without extra > >> MapReduce migrations. >> > > Good point. I was so focused on getting the emails threaded I hadn't > thought about its utility in the web interface. > > http://codereview.appspot.com/**4931044/diff/1/codereview/** > views.py#newcode3657<http://codereview.appspot.com/4931044/diff/1/codereview/views.py#newcode3657> > codereview/views.py:3657: if 'X-Google-Appengine-App-Id' in > incoming_msg.original: > On 2011/09/15 08:48:34, techtonik wrote: > >> I believe the purpose of this check is to prevent indefinite loops >> > when somebody > >> tricks AppEngine to send email to its own address. >> > > Indeed it was. But I'm also relying on Rietveld/AppEngine sending an > email to itself (since it already CC's itself in the messages it sends > out) so I can get the SMTP-server-assigned Message-ID from the email and > update the email_id of the original Message with it. > > As for the code below - this stuff is better to be done with MapReduce >> > job, but > >> I think we can safely remove that. It won't join messages that were >> > already sent > >> in threads. >> > > Um, I don't understand what you are talking about, with MapReduce jobs > and joining messages. As far as I could tell through the AppEngine docs > and local testing of my changes, the code below only updates an existing > message with an email_id; it won't create duplicate messages, and > defaults to the pre-patch behavior of raising an error on the incoming > message if it does not find a Message to update in the datastore. > > http://codereview.appspot.com/**4931044/diff/1/codereview/** > views.py#newcode3668<http://codereview.appspot.com/4931044/diff/1/codereview/views.py#newcode3668> > codereview/views.py:3668: logging.debig('Set message_id = "%s" where > issue = "%s" and sender = "%s"', > On 2011/09/15 17:41:42, Andi Albrecht wrote: > >> Here's a typo: debig -> debug >> > > Thanks. Fixed. > > http://codereview.appspot.com/**4931044/diff/1/static/script.**js<http://code... > File static/script.js (right): > > http://codereview.appspot.com/**4931044/diff/1/static/script.** > js#newcode695<http://codereview.appspot.com/4931044/diff/1/static/script.js#newcode695> > static/script.js:695: form.insertBefore(msgIdInput, form.firstChild); > On 2011/09/15 08:48:34, techtonik wrote: > >> JFYI, the input box can be just copied or even constructed from >> > scratch. > >> Textarea is moved back and forth to preserve text contents when user >> accidentally clicks a link and then hits back button. To make browser >> > remember > >> content, Textarea element should be present on a page when the page is >> > loaded. > > From what I could discern of the JS code and the associated HTML page, > it is also needed--at least for the hidden input widget--to associate a > piece of data (the email_id) of a particular comment with the form that > is actually submitted. There can be multiple comments with different > values for email_id, but only one form that is submitted and prior to > this patch the form does not include any information about *which* > comment was being replied to. Hence the moving of an appropriate hidden > input field containing the email_id to and from the to-be-submitted form > in the same manner as the textarea element(s). > > http://codereview.appspot.com/**4931044/diff/1/templates/**issue.html<http://... > File templates/issue.html (right): > > http://codereview.appspot.com/**4931044/diff/1/templates/** > issue.html#newcode156<http://codereview.appspot.com/4931044/diff/1/templates/issue.html#newcode156> > templates/issue.html:156: <input type="hidden" > On 2011/09/15 08:48:34, techtonik wrote: > >> This input field can be optional. This will get rid us from magic >> > values. > > True, but see my main comment about why I added the magic values. ;) > > http://codereview.appspot.com/**4931044/<http://codereview.appspot.com/4931044/> >
Sign in to reply to this message.
Hi Kaelyn, As you may see it took more time as amount of text to read through is increasing in geometrical progression. =) To make everything happen faster, I propose to split the issue into two, as we certainly dealing with two different problems: 1. Linking messages into threads (add parent id for forum-like interface) 2. Linking mails into threads (collecting email_ids, storing them and linking with header fields) The latter stuff is more complicated, because AppEngine doesn't allow us to retrieve email Message-ID field after it has been sent. There should be an issue opened for AppEngine and temporary hack on our side. This review is too long to be a sane description of the feature, so it will require some separate PEP. In this case you may expect faster Pong! responses. ,) http://codereview.appspot.com/4931044/diff/1/codereview/views.py File codereview/views.py (right): http://codereview.appspot.com/4931044/diff/1/codereview/views.py#newcode3110 codereview/views.py:3110: message_id='<unknown>') On 2011/09/26 23:20:08, Kaelyn wrote: > On 2011/09/15 08:48:34, techtonik wrote: > > I would avoid any magic strings (even if it is transformed to token or > constant > > in this case). None is enough. > > The magic string is to distinguish between messages that predate this patch and > messages for which we're expecting to get an email_id but haven't yet (lest > things get very messy for existing issues when this patch goes live). Where is this used, i.e. where messages without email_id and with unknown email_id should be treated differently? http://codereview.appspot.com/4931044/diff/1/codereview/views.py#newcode3163 codereview/views.py:3163: break On 2011/09/26 23:20:08, Kaelyn wrote: > On 2011/09/15 08:48:34, techtonik wrote: > > Looks like messages without 'send email' will interrupt the thread, because > they > > don't receive email message id from SMTP server, and all subsequent replies > with > > 'send email' will be linked directly to the first message. This seems broken > to > > me. > > Although this is the one case I did not test locally (due to the pain of > manually feeding sent emails into a local dev AppEngine server), it shouldn't > completely break the thread as only replies to the message without an email_id > would be linked to the first message instead. I could also follow RFC 2822 and > leave out the In-Reply-To and References email headers entirely if the reply is > to a message without an email_id, but figured that would be even less useful > since it would mean spawning new email threads. We already have unit tests that may already contain necessary code to help you automatically feed emails to a local server. It would be nice to have such tests committed anyway. > > There is an ideal way to fix that: > > 1. add own field to Message that stores parent key, so that every message is > > linked to its parent regardless of if it's sent or not > > The only issue is identifying the parent regardless of whether it was ever > emailed. The simplest way is probably to change the the 'message-email-id' > hidden input I added from being the message's email_id to being the message's > key(). I doesn't seem good to me to handle ancestry through JS/template/view. It is the role of data model to take care of this stuff. BTW, doesn't form already contain message id? Why do you need an additional hidden field for that? > > 2. References: header in email accepts several message-id's, so if we reply to > a > > message that wasn't sent, so it is ok if we generate ids for such message and > > include it after the message with true email message ID > > Based on RFC 2822 "Internet Message Format" > (http://www.ietf.org/rfc/rfc2822.txt) doing that would be very bad. First, the > RFC *require* message IDs such as those in References and In-Reply-To headers to > be globally unique. That's ok. I believe message key is rather unique. > Second, it explicitly states the message IDs used in the > References and In-Reply-To fields to be the value of the parent messages' > Message-ID field (and possibly including the IDs from parents' References > field), or for the headers to not be included if there are no Message-IDs. > Finally, the RFC directly discourages including multiple message IDs in the > References field: > 'Note: Some implementations parse the "References:" field to display the > "thread of the discussion". These implementations assume that each new message > is a reply to a single parent and hence that they can walk backwards through the > "References:" field to find the parent of each message listed there. Therefore, > trying to form a "References:" field for a reply that has multiple parents is > discouraged and how to do so is not defined in this document.' We don't have multiple parent for each message, so it's ok. > What would be REALLY nice would be to add a "X-Rietveld-Message-ID" header to > the email containing a unique identifier that rietveld generates and stores as a > field in Message (or just using the str value of Message.key() as long as it is > sane), so that it can accurately fetch and update the email_id of the correct > Message when it receives the email. Stop. I don't understand why do you need Rietveld to update existing Message entity when it _receives_ letter. Correct me if I wrong, but when we send message letter - we receive email_id and update message entity. When we receive message - it already has email_id, but we don't have the entity, so we create entity with email_id already filled in. > But alas AppEngine is overly restrictive > about what email headers can be set. Maybe a similar effect could be achieved by > e.g. using GMail-like email aliases, so when sending the email for message 1234 > "reply+1234@" is CC'ed instead of simply "reply@"? Having a clear reference to > the Message in the datastore embedded somewhere in the email would also > eliminate the current need for / purpose of the special '<unknown>' email_id. I believe the list of allowable headers is pretty limited (I posted a link to AppEngine SDK sources somewhere), but if the problem is properly described, then I guess it is possible to add some X-Appname-... header support to AppEngine. > > > > This will also allow us to show threads in web interface in future without > extra > > MapReduce migrations. > > Good point. I was so focused on getting the emails threaded I hadn't thought > about its utility in the web interface. http://codereview.appspot.com/4931044/diff/1/codereview/views.py#newcode3657 codereview/views.py:3657: if 'X-Google-Appengine-App-Id' in incoming_msg.original: On 2011/09/26 23:20:08, Kaelyn wrote: > On 2011/09/15 08:48:34, techtonik wrote: > > I believe the purpose of this check is to prevent indefinite loops when > somebody > > tricks AppEngine to send email to its own address. > > Indeed it was. But I'm also relying on Rietveld/AppEngine sending an email to > itself (since it already CC's itself in the messages it sends out) so I can get > the SMTP-server-assigned Message-ID from the email and update the email_id of > the original Message with it. Oh. I see. There is no way we can get Message-ID. That a big difference. I think this feature is worthy an enhancement at AppEngine tracker. > > > > As for the code below - this stuff is better to be done with MapReduce job, > but > > I think we can safely remove that. It won't join messages that were already > sent > > in threads. > > Um, I don't understand what you are talking about, with MapReduce jobs and > joining messages. As far as I could tell through the AppEngine docs and local > testing of my changes, the code below only updates an existing message with an > email_id; it won't create duplicate messages, and defaults to the pre-patch > behavior of raising an error on the incoming message if it does not find a > Message to update in the datastore. I meant 'link' instead of 'join'. If message is already mailed - this code won't help to join them in threads, so it seemed useless. But I see it is the way to get Message-ID and seems to be the only way. http://codereview.appspot.com/4931044/diff/1/static/script.js File static/script.js (right): http://codereview.appspot.com/4931044/diff/1/static/script.js#newcode695 static/script.js:695: form.insertBefore(msgIdInput, form.firstChild); On 2011/09/26 23:20:08, Kaelyn wrote: > On 2011/09/15 08:48:34, techtonik wrote: > > JFYI, the input box can be just copied or even constructed from scratch. > > Textarea is moved back and forth to preserve text contents when user > > accidentally clicks a link and then hits back button. To make browser remember > > content, Textarea element should be present on a page when the page is loaded. > > From what I could discern of the JS code and the associated HTML page, it is > also needed--at least for the hidden input widget--to associate a piece of data > (the email_id) of a particular comment with the form that is actually submitted. > There can be multiple comments with different values for email_id, but only one > form that is submitted and prior to this patch the form does not include any > information about *which* comment was being replied to. Hence the moving of an > appropriate hidden input field containing the email_id to and from the > to-be-submitted form in the same manner as the textarea element(s). All right. If there is no information about parent of the message that is being submitted, then it should be fixed, of course. But I would prefer to review this in a separate commit. And it should be not email_id, but some parent message id.
Sign in to reply to this message.
On Thu, Oct 6, 2011 at 7:53 AM, <techtonik@gmail.com> wrote: > Hi Kaelyn, > > As you may see it took more time as amount of text to read through is > increasing in geometrical progression. =) To make everything happen > faster, I propose to split the issue into two, as we certainly dealing > with two different problems: > > 1. Linking messages into threads (add parent id for forum-like > interface) > 2. Linking mails into threads (collecting email_ids, storing them and > linking with header fields) > The split sounds reasonable to me, now that there will be a linkage between messages other than through email Message-IDs. I'll keep this rietveld issue for the second half as there has already been a lot of discussion about the email side of the changes. > > The latter stuff is more complicated, because AppEngine doesn't allow us > to retrieve email Message-ID field after it has been sent. There should > be an issue opened for AppEngine and temporary hack on our side. This > review is too long to be a sane description of the feature, so it will > require some separate PEP. In this case you may expect faster Pong! > responses. ,) I'm not quite sure I understand what you are wanting from AppEngine... as far as I am aware, AppEngine is not an SMTP server itself but hands off the emails to a separate SMTP server which then assigns the Message-IDs. Admittedly I am not familiar with AppEngine or how it is set up in production, so its dependence on a (local) SMTP server to relay the mail might just only be for local development instances. But I also can't think of a reason why it would depend on the presence of sendmail for local dev instances but include the functionality of an SMTP when deployed in the cloud. Anyway, assuming AppEngine does not include its own internal SMTP server and instead passes off the messages to an external SMTP server to be queued for delivery, the only way AppEngine would be able to retrieve the email Message-ID after it is sent (after an undetermined amount of time in the SMTP server's outgoing mail queue) would be to include itself as the recipient of the email--which is exactly what my patch does. > http://codereview.appspot.com/**4931044/diff/1/codereview/**views.py<http://c... > File codereview/views.py (right): > > http://codereview.appspot.com/**4931044/diff/1/codereview/** > views.py#newcode3110<http://codereview.appspot.com/4931044/diff/1/codereview/views.py#newcode3110> > codereview/views.py:3110: message_id='<unknown>') > On 2011/09/26 23:20:08, Kaelyn wrote: > >> On 2011/09/15 08:48:34, techtonik wrote: >> > I would avoid any magic strings (even if it is transformed to token >> > or > >> constant >> > in this case). None is enough. >> > > The magic string is to distinguish between messages that predate this >> > patch and > >> messages for which we're expecting to get an email_id but haven't yet >> > (lest > >> things get very messy for existing issues when this patch goes live). >> > > Where is this used, i.e. where messages without email_id and with > unknown email_id should be treated differently? Simple: any message without an email_id set is ignored completely by the code that extracts the Message-ID from an incoming email and tries to figure out which message it belongs to. The code only considers messages with the email_id "<unknown>" as possible matches for the incoming email that had originated from AppEngine. > > http://codereview.appspot.com/**4931044/diff/1/codereview/** > views.py#newcode3163<http://codereview.appspot.com/4931044/diff/1/codereview/views.py#newcode3163> > codereview/views.py:3163: break > On 2011/09/26 23:20:08, Kaelyn wrote: > >> On 2011/09/15 08:48:34, techtonik wrote: >> > Looks like messages without 'send email' will interrupt the thread, >> > because > >> they >> > don't receive email message id from SMTP server, and all subsequent >> > replies > >> with >> > 'send email' will be linked directly to the first message. This >> > seems broken > >> to >> > me. >> > > Although this is the one case I did not test locally (due to the pain >> > of > >> manually feeding sent emails into a local dev AppEngine server), it >> > shouldn't > >> completely break the thread as only replies to the message without an >> > email_id > >> would be linked to the first message instead. I could also follow RFC >> > 2822 and > >> leave out the In-Reply-To and References email headers entirely if the >> > reply is > >> to a message without an email_id, but figured that would be even less >> > useful > >> since it would mean spawning new email threads. >> > > We already have unit tests that may already contain necessary code to > help you automatically feed emails to a local server. It would be nice > tests/test_incomingmail.py contains unit tests?! As far as I could tell when I looked at the file, the tests are functional tests to be executed against a running rietveld instance. I guess I missed something... ::shrugs:: to have such tests committed anyway. I first read this as saying the existing unit tests you were talking about have not been committed for some reason. :p > > > There is an ideal way to fix that: >> > 1. add own field to Message that stores parent key, so that every >> > message is > >> > linked to its parent regardless of if it's sent or not >> > > The only issue is identifying the parent regardless of whether it was >> > ever > >> emailed. The simplest way is probably to change the the >> > 'message-email-id' > >> hidden input I added from being the message's email_id to being the >> > message's > >> key(). >> > > I doesn't seem good to me to handle ancestry through JS/template/view. > It is the role of data model to take care of this stuff. BTW, doesn't > form already contain message id? Why do you need an additional hidden > field for that? No the form does not contain any kind of message id. Each comment block in the page only contains an index value that is dynamically generated by the template, and that index number is not part of the form that is submitted. > > 2. References: header in email accepts several message-id's, so if >> > we reply to > >> a >> > message that wasn't sent, so it is ok if we generate ids for such >> > message and > >> > include it after the message with true email message ID >> > > Based on RFC 2822 "Internet Message Format" >> (http://www.ietf.org/rfc/**rfc2822.txt<http://www.ietf.org/rfc/rfc2822.txt>) >> doing that would be very bad. >> > First, the > >> RFC *require* message IDs such as those in References and In-Reply-To >> > headers to > >> be globally unique. >> > > That's ok. I believe message key is rather unique. While rendered moot because of the second requirement per the RFC, "rather unique" is not sufficient for a value that must be globally unique. > Second, it explicitly states the message IDs used in the >> References and In-Reply-To fields to be the value of the parent >> > messages' > >> Message-ID field (and possibly including the IDs from parents' >> > References > >> field), or for the headers to not be included if there are no >> > Message-IDs. > >> Finally, the RFC directly discourages including multiple message IDs >> > in the > >> References field: >> 'Note: Some implementations parse the "References:" field to >> > display the > >> "thread of the discussion". These implementations assume that each >> > new message > >> is a reply to a single parent and hence that they can walk backwards >> > through the > >> "References:" field to find the parent of each message listed there. >> > Therefore, > >> trying to form a "References:" field for a reply that has multiple >> > parents is > >> discouraged and how to do so is not defined in this document.' >> > > We don't have multiple parent for each message, so it's ok. Except that the values of the References field are defined as the Message-ID of any parents, so including more than one id in the References field means the message has/references multiple parents. To put it another way, the RFC explicitly discourages having more than one value in the References field. Plus including a value in the References field that is not a Message-ID violates the RFC and as such would lead to undefined behavior in any software that tries to parse the email (anything from ignoring it to raising an error about a malformed header to blowing up your monitor and eating your dog). > What would be REALLY nice would be to add a "X-Rietveld-Message-ID" >> > header to > >> the email containing a unique identifier that rietveld generates and >> > stores as a > >> field in Message (or just using the str value of Message.key() as long >> > as it is > >> sane), so that it can accurately fetch and update the email_id of the >> > correct > >> Message when it receives the email. >> > > Stop. I don't understand why do you need Rietveld to update existing > Message entity when it _receives_ letter. Correct me if I wrong, but > when we send message letter - we receive email_id and update message > entity. When we receive message - it already has email_id, but we don't > have the entity, so we create entity with email_id already filled in. We do not receive anything when we send a message, and after the email has been sent the email_id is still "<unknown>". The email_id cannot be and is not filled in for a message until the associated email has passed through the SMTP server, been assigned a unique Message-ID, and then is received again by Rietveld--at which point Rietveld processes the email, tries to figure out which Message entity is associated with it (provided the email was sent by itself), extracts the value of the Message-ID header from the email, and stores the value as the email_id of the Message entity. > But alas AppEngine is overly restrictive >> about what email headers can be set. Maybe a similar effect could be >> > achieved by > >> e.g. using GMail-like email aliases, so when sending the email for >> > message 1234 > >> "reply+1234@" is CC'ed instead of simply "reply@"? Having a clear >> > reference to > >> the Message in the datastore embedded somewhere in the email would >> > also > >> eliminate the current need for / purpose of the special '<unknown>' >> > email_id. > > I believe the list of allowable headers is pretty limited (I posted a > link to AppEngine SDK sources somewhere), but if the problem is properly > described, then I guess it is possible to add some X-Appname-... header > support to AppEngine. According to http://code.google.com/appengine/docs/python/mail/overview.html#Sending_Mail_..., the only extra email headers that can be set are In-Reply-To and References. It would certainly be nice to add support for X-Appname-... headers to AppEngine, but given my experience trying to fix the support of In-Reply-To and References in the local Python development server (consisted of adding three lines of code to mail_message_to_mime_message() in appengine/api/mail.py for it to actually pull out the header_list from the protocol_message and add them to the email) I am not inclined to try to shave that yak. > > >> > This will also allow us to show threads in web interface in future >> > without > >> extra >> > MapReduce migrations. >> > > Good point. I was so focused on getting the emails threaded I hadn't >> > thought > >> about its utility in the web interface. >> > > http://codereview.appspot.com/**4931044/diff/1/codereview/** > views.py#newcode3657<http://codereview.appspot.com/4931044/diff/1/codereview/views.py#newcode3657> > codereview/views.py:3657: if 'X-Google-Appengine-App-Id' in > incoming_msg.original: > On 2011/09/26 23:20:08, Kaelyn wrote: > >> On 2011/09/15 08:48:34, techtonik wrote: >> > I believe the purpose of this check is to prevent indefinite loops >> > when > >> somebody >> > tricks AppEngine to send email to its own address. >> > > Indeed it was. But I'm also relying on Rietveld/AppEngine sending an >> > email to > >> itself (since it already CC's itself in the messages it sends out) so >> > I can get > >> the SMTP-server-assigned Message-ID from the email and update the >> > email_id of > >> the original Message with it. >> > > Oh. I see. There is no way we can get Message-ID. That a big difference. > I think this feature is worthy an enhancement at AppEngine tracker. > > > > >> > As for the code below - this stuff is better to be done with >> > MapReduce job, > >> but >> > I think we can safely remove that. It won't join messages that were >> > already > >> sent >> > in threads. >> > > Um, I don't understand what you are talking about, with MapReduce jobs >> > and > >> joining messages. As far as I could tell through the AppEngine docs >> > and local > >> testing of my changes, the code below only updates an existing message >> > with an > >> email_id; it won't create duplicate messages, and defaults to the >> > pre-patch > >> behavior of raising an error on the incoming message if it does not >> > find a > >> Message to update in the datastore. >> > > I meant 'link' instead of 'join'. If message is already mailed - this > code won't help to join them in threads, so it seemed useless. But I see > it is the way to get Message-ID and seems to be the only way. > > > http://codereview.appspot.com/**4931044/diff/1/static/script.**js<http://code... > File static/script.js (right): > > http://codereview.appspot.com/**4931044/diff/1/static/script.** > js#newcode695<http://codereview.appspot.com/4931044/diff/1/static/script.js#newcode695> > static/script.js:695: form.insertBefore(msgIdInput, form.firstChild); > On 2011/09/26 23:20:08, Kaelyn wrote: > >> On 2011/09/15 08:48:34, techtonik wrote: >> > JFYI, the input box can be just copied or even constructed from >> > scratch. > >> > Textarea is moved back and forth to preserve text contents when user >> > accidentally clicks a link and then hits back button. To make >> > browser remember > >> > content, Textarea element should be present on a page when the page >> > is loaded. > > From what I could discern of the JS code and the associated HTML page, >> > it is > >> also needed--at least for the hidden input widget--to associate a >> > piece of data > >> (the email_id) of a particular comment with the form that is actually >> > submitted. > >> There can be multiple comments with different values for email_id, but >> > only one > >> form that is submitted and prior to this patch the form does not >> > include any > >> information about *which* comment was being replied to. Hence the >> > moving of an > >> appropriate hidden input field containing the email_id to and from the >> to-be-submitted form in the same manner as the textarea element(s). >> > > All right. If there is no information about parent of the message that > is being submitted, then it should be fixed, of course. But I would > prefer to review this in a separate commit. And it should be not > email_id, but some parent message id. > > http://codereview.appspot.com/**4931044/<http://codereview.appspot.com/4931044/> >
Sign in to reply to this message.
|