Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(1768)

Issue 4931044: Store each email's Message-ID for use in In-Reply-To and References email headers

Can't Edit
Can't Publish+Mail
Start Review
Created:
12 years, 8 months ago by Kaelyn
Modified:
12 years, 7 months ago
CC:
codereview-discuss_googlegroups.com
Visibility:
Public.

Patch Set 1 #

Total comments: 22
Unified diffs Side-by-side diffs Delta from patch set Stats (+47 lines, -7 lines) Patch
M .hgignore View 1 chunk +1 line, -0 lines 2 comments Download
M codereview/models.py View 1 chunk +1 line, -0 lines 4 comments Download
M codereview/views.py View 8 chunks +39 lines, -7 lines 11 comments Download
M static/script.js View 1 chunk +3 lines, -0 lines 3 comments Download
M templates/issue.html View 1 chunk +3 lines, -0 lines 2 comments Download

Messages

Total messages: 14
Kaelyn
12 years, 8 months ago (2011-08-23 22:35:52 UTC) #1
techtonik
http://codereview.appspot.com/4931044/diff/1/.hgignore File .hgignore (right): http://codereview.appspot.com/4931044/diff/1/.hgignore#newcode7 .hgignore:7: \..*sw.?$ What's this? http://codereview.appspot.com/4931044/diff/1/codereview/models.py File codereview/models.py (right): http://codereview.appspot.com/4931044/diff/1/codereview/models.py#newcode199 codereview/models.py:199: ...
12 years, 8 months ago (2011-08-29 16:07:11 UTC) #2
Kaelyn
http://codereview.appspot.com/4931044/diff/1/.hgignore File .hgignore (right): http://codereview.appspot.com/4931044/diff/1/.hgignore#newcode7 .hgignore:7: \..*sw.?$ On 2011/08/29 16:07:11, techtonik wrote: > What's this? ...
12 years, 8 months ago (2011-08-29 16:31:06 UTC) #3
techtonik
On 2011/08/29 16:31:06, Kaelyn wrote: > Message-ID header (assigned by the SMTP server) I see. ...
12 years, 8 months ago (2011-08-29 16:46:34 UTC) #4
Andi Albrecht
Is this still active? I'd love to see mails threaded correctly :) -Andi On Mon, ...
12 years, 7 months ago (2011-09-13 14:37:52 UTC) #5
Kaelyn
So would I along with many folks working on Clang/LLVM (especially those of us at ...
12 years, 7 months ago (2011-09-13 16:50:46 UTC) #6
techtonik
In Gmail I see threads linked correctly already, so I am not sure how to ...
12 years, 7 months ago (2011-09-14 15:20:49 UTC) #7
Andi Albrecht
I'm seeing the incorrect threading on my iPhone. Should be easy to test on one ...
12 years, 7 months ago (2011-09-14 15:44:06 UTC) #8
techtonik
It took me almost 2 hours to get myself acquainted with threading specifics. Wheew. =) ...
12 years, 7 months ago (2011-09-15 08:48:34 UTC) #9
Andi Albrecht
http://codereview.appspot.com/4931044/diff/1/codereview/views.py File codereview/views.py (right): http://codereview.appspot.com/4931044/diff/1/codereview/views.py#newcode3668 codereview/views.py:3668: logging.debig('Set message_id = "%s" where issue = "%s" and ...
12 years, 7 months ago (2011-09-15 17:41:42 UTC) #10
Kaelyn
Thanks for the review feedback so far! :) I haven't uploaded a new patch with ...
12 years, 7 months ago (2011-09-26 23:20:07 UTC) #11
Kaelyn
Ping? On Mon, Sep 26, 2011 at 4:20 PM, <rikka@google.com> wrote: > Thanks for the ...
12 years, 7 months ago (2011-10-03 19:41:48 UTC) #12
techtonik
Hi Kaelyn, As you may see it took more time as amount of text to ...
12 years, 7 months ago (2011-10-06 14:53:19 UTC) #13
Kaelyn
12 years, 7 months ago (2011-10-11 20:35:03 UTC) #14
On Thu, Oct 6, 2011 at 7:53 AM, <techtonik@gmail.com> wrote:

> Hi Kaelyn,
>
> As you may see it took more time as amount of text to read through is
> increasing in geometrical progression. =) To make everything happen
> faster, I propose to split the issue into two, as we certainly dealing
> with two different problems:
>
> 1. Linking messages into threads (add parent id for forum-like
> interface)
> 2. Linking mails into threads (collecting email_ids, storing them and
> linking with header fields)
>

The split sounds reasonable to me, now that there will be a linkage between
messages other than through email Message-IDs. I'll keep this rietveld issue
for the second half as there has already been a lot of discussion about the
email side of the changes.


>
> The latter stuff is more complicated, because AppEngine doesn't allow us
> to retrieve email Message-ID field after it has been sent. There should
> be an issue opened for AppEngine and temporary hack on our side. This
> review is too long to be a sane description of the feature, so it will
> require some separate PEP. In this case you may expect faster Pong!
> responses. ,)


I'm not quite sure I understand what you are wanting from AppEngine... as
far as I am aware, AppEngine is not an SMTP server itself but hands off the
emails to a separate SMTP server which then assigns the Message-IDs.
Admittedly I am not familiar with AppEngine or how it is set up in
production, so its dependence on a (local) SMTP server to relay the mail
might just only be for local development instances. But I also can't think
of a reason why it would depend on the presence of sendmail for local dev
instances but include the functionality of an SMTP when deployed in the
cloud. Anyway, assuming AppEngine does not include its own internal SMTP
server and instead passes off the messages to an external SMTP server to be
queued for delivery, the only way AppEngine would be able to retrieve the
email Message-ID after it is sent (after an undetermined amount of time in
the SMTP server's outgoing mail queue) would be to include itself as the
recipient of the email--which is exactly what my patch does.


>
http://codereview.appspot.com/**4931044/diff/1/codereview/**views.py<http://c...
> File codereview/views.py (right):
>
> http://codereview.appspot.com/**4931044/diff/1/codereview/**
>
views.py#newcode3110<http://codereview.appspot.com/4931044/diff/1/codereview/views.py#newcode3110>
> codereview/views.py:3110: message_id='<unknown>')
> On 2011/09/26 23:20:08, Kaelyn wrote:
>
>> On 2011/09/15 08:48:34, techtonik wrote:
>> > I would avoid any magic strings (even if it is transformed to token
>>
> or
>
>> constant
>> > in this case). None is enough.
>>
>
>  The magic string is to distinguish between messages that predate this
>>
> patch and
>
>> messages for which we're expecting to get an email_id but haven't yet
>>
> (lest
>
>> things get very messy for existing issues when this patch goes live).
>>
>
> Where is this used, i.e. where messages without email_id and with
> unknown email_id should be treated differently?


Simple: any message without an email_id set is ignored completely by the
code that extracts the Message-ID from an incoming email and tries to figure
out which message it belongs to. The code only considers messages with the
email_id "<unknown>" as possible matches for the incoming email that had
originated from AppEngine.


>
> http://codereview.appspot.com/**4931044/diff/1/codereview/**
>
views.py#newcode3163<http://codereview.appspot.com/4931044/diff/1/codereview/views.py#newcode3163>
> codereview/views.py:3163: break
> On 2011/09/26 23:20:08, Kaelyn wrote:
>
>> On 2011/09/15 08:48:34, techtonik wrote:
>> > Looks like messages without 'send email' will interrupt the thread,
>>
> because
>
>> they
>> > don't receive email message id from SMTP server, and all subsequent
>>
> replies
>
>> with
>> > 'send email' will be linked directly to the first message. This
>>
> seems broken
>
>> to
>> > me.
>>
>
>  Although this is the one case I did not test locally (due to the pain
>>
> of
>
>> manually feeding sent emails into a local dev AppEngine server), it
>>
> shouldn't
>
>> completely break the thread as only replies to the message without an
>>
> email_id
>
>> would be linked to the first message instead. I could also follow RFC
>>
> 2822 and
>
>> leave out the In-Reply-To and References email headers entirely if the
>>
> reply is
>
>> to a message without an email_id, but figured that would be even less
>>
> useful
>
>> since it would mean spawning new email threads.
>>
>
> We already have unit tests that may already contain necessary code to
> help you automatically feed emails to a local server. It would be nice
>

tests/test_incomingmail.py contains unit tests?! As far as I could tell when
I looked at the file, the tests are functional tests to be executed against
a running rietveld instance. I guess I missed something... ::shrugs::

to have such tests committed anyway.


I first read this as saying the existing unit tests you were talking about
have not been committed for some reason. :p


>
>  > There is an ideal way to fix that:
>> > 1. add own field to Message that stores parent key, so that every
>>
> message is
>
>> > linked to its parent regardless of if it's sent or not
>>
>
>  The only issue is identifying the parent regardless of whether it was
>>
> ever
>
>> emailed. The simplest way is probably to change the the
>>
> 'message-email-id'
>
>> hidden input I added from being the message's email_id to being the
>>
> message's
>
>> key().
>>
>
> I doesn't seem good to me to handle ancestry through JS/template/view.
> It is the role of data model to take care of this stuff. BTW, doesn't
> form already contain message id? Why do you need an additional hidden
> field for that?


No the form does not contain any kind of message id. Each comment block in
the page only contains an index value that is dynamically generated by the
template, and that index number is not part of the form that is submitted.


>  > 2. References: header in email accepts several message-id's, so if
>>
> we reply to
>
>> a
>> > message that wasn't sent, so it is ok if we generate ids for such
>>
> message and
>
>> > include it after the message with true email message ID
>>
>
>  Based on RFC 2822 "Internet Message Format"
>> (http://www.ietf.org/rfc/**rfc2822.txt<http://www.ietf.org/rfc/rfc2822.txt>)
>> doing that would be very bad.
>>
> First, the
>
>> RFC *require* message IDs such as those in References and In-Reply-To
>>
> headers to
>
>> be globally unique.
>>
>
> That's ok. I believe message key is rather unique.


While rendered moot because of the second requirement per the RFC, "rather
unique" is not sufficient for a value that must be globally unique.


>  Second, it explicitly states the message IDs used in the
>> References and In-Reply-To fields to be the value of the parent
>>
> messages'
>
>> Message-ID field (and possibly including the IDs from parents'
>>
> References
>
>> field), or for the headers to not be included if there are no
>>
> Message-IDs.
>
>> Finally, the RFC directly discourages including multiple message IDs
>>
> in the
>
>> References field:
>>     'Note: Some implementations parse the "References:" field to
>>
> display the
>
>> "thread of the discussion".  These implementations assume that each
>>
> new message
>
>> is a reply to a single parent and hence that they can walk backwards
>>
> through the
>
>> "References:" field to find the parent of each message listed there.
>>
> Therefore,
>
>> trying to form a "References:" field for a reply that has multiple
>>
> parents is
>
>> discouraged and how to do so is not defined in this document.'
>>
>
> We don't have multiple parent for each message, so it's ok.


Except that the values of the References field are defined as the Message-ID
of any parents, so including more than one id in the References field means
the message has/references multiple parents. To put it another way, the RFC
explicitly discourages having more than one value in the References field.
Plus including a value in the References field that is not a Message-ID
violates the RFC and as such would lead to undefined behavior in any
software that tries to parse the email (anything from ignoring it to raising
an error about a malformed header to blowing up your monitor and eating your
dog).


>  What would be REALLY nice would be to add a "X-Rietveld-Message-ID"
>>
> header to
>
>> the email containing a unique identifier that rietveld generates and
>>
> stores as a
>
>> field in Message (or just using the str value of Message.key() as long
>>
> as it is
>
>> sane), so that it can accurately fetch and update the email_id of the
>>
> correct
>
>> Message when it receives the email.
>>
>
> Stop. I don't understand why do you need Rietveld to update existing
> Message entity when it _receives_ letter. Correct me if I wrong, but
> when we send message letter - we receive email_id and update message
> entity. When we receive message - it already has email_id, but we don't
> have the entity, so we create entity with email_id already filled in.


We do not receive anything when we send a message, and after the email has
been sent the email_id is still "<unknown>". The email_id cannot be and is
not filled in for a message until the associated email has passed through
the SMTP server, been assigned a unique Message-ID, and then is received
again by Rietveld--at which point Rietveld processes the email, tries to
figure out which Message entity is associated with it (provided the email
was sent by itself), extracts the value of the Message-ID header from the
email, and stores the value as the email_id of the Message entity.


>  But alas AppEngine is overly restrictive
>> about what email headers can be set. Maybe a similar effect could be
>>
> achieved by
>
>> e.g. using GMail-like email aliases, so when sending the email for
>>
> message 1234
>
>> "reply+1234@" is CC'ed instead of simply "reply@"? Having a clear
>>
> reference to
>
>> the Message in the datastore embedded somewhere in the email would
>>
> also
>
>> eliminate the current need for / purpose of the special '<unknown>'
>>
> email_id.
>
> I believe the list of allowable headers is pretty limited (I posted a
> link to AppEngine SDK sources somewhere), but if the problem is properly
> described, then I guess it is possible to add some X-Appname-... header
> support to AppEngine.


According to
http://code.google.com/appengine/docs/python/mail/overview.html#Sending_Mail_...,
the only extra email headers that can be set are In-Reply-To and References.
It would certainly be nice to add support for X-Appname-... headers to
AppEngine, but given my experience trying to fix the support of In-Reply-To
and References in the local Python development server (consisted of adding
three lines of code to mail_message_to_mime_message() in
appengine/api/mail.py for it to actually pull out the header_list from the
protocol_message and add them to the email) I am not inclined to try to
shave that yak.


>  >
>> > This will also allow us to show threads in web interface in future
>>
> without
>
>> extra
>> > MapReduce migrations.
>>
>
>  Good point. I was so focused on getting the emails threaded I hadn't
>>
> thought
>
>> about its utility in the web interface.
>>
>
> http://codereview.appspot.com/**4931044/diff/1/codereview/**
>
views.py#newcode3657<http://codereview.appspot.com/4931044/diff/1/codereview/views.py#newcode3657>
> codereview/views.py:3657: if 'X-Google-Appengine-App-Id' in
> incoming_msg.original:
> On 2011/09/26 23:20:08, Kaelyn wrote:
>
>> On 2011/09/15 08:48:34, techtonik wrote:
>> > I believe the purpose of this check is to prevent indefinite loops
>>
> when
>
>> somebody
>> > tricks AppEngine to send email to its own address.
>>
>
>  Indeed it was. But I'm also relying on Rietveld/AppEngine sending an
>>
> email to
>
>> itself (since it already CC's itself in the messages it sends out) so
>>
> I can get
>
>> the SMTP-server-assigned Message-ID from the email and update the
>>
> email_id of
>
>> the original Message with it.
>>
>
> Oh. I see. There is no way we can get Message-ID. That a big difference.
> I think this feature is worthy an enhancement at AppEngine tracker.
>
>
>  >
>> > As for the code below - this stuff is better to be done with
>>
> MapReduce job,
>
>> but
>> > I think we can safely remove that. It won't join messages that were
>>
> already
>
>> sent
>> > in threads.
>>
>
>  Um, I don't understand what you are talking about, with MapReduce jobs
>>
> and
>
>> joining messages. As far as I could tell through the AppEngine docs
>>
> and local
>
>> testing of my changes, the code below only updates an existing message
>>
> with an
>
>> email_id; it won't create duplicate messages, and defaults to the
>>
> pre-patch
>
>> behavior of raising an error on the incoming message if it does not
>>
> find a
>
>> Message to update in the datastore.
>>
>
> I meant 'link' instead of 'join'. If message is already mailed - this
> code won't help to join them in threads, so it seemed useless. But I see
> it is the way to get Message-ID and seems to be the only way.
>
>
>
http://codereview.appspot.com/**4931044/diff/1/static/script.**js<http://code...
> File static/script.js (right):
>
> http://codereview.appspot.com/**4931044/diff/1/static/script.**
>
js#newcode695<http://codereview.appspot.com/4931044/diff/1/static/script.js#newcode695>
> static/script.js:695: form.insertBefore(msgIdInput, form.firstChild);
> On 2011/09/26 23:20:08, Kaelyn wrote:
>
>> On 2011/09/15 08:48:34, techtonik wrote:
>> > JFYI, the input box can be just copied or even constructed from
>>
> scratch.
>
>> > Textarea is moved back and forth to preserve text contents when user
>> > accidentally clicks a link and then hits back button. To make
>>
> browser remember
>
>> > content, Textarea element should be present on a page when the page
>>
> is loaded.
>
>   From what I could discern of the JS code and the associated HTML page,
>>
> it is
>
>> also needed--at least for the hidden input widget--to associate a
>>
> piece of data
>
>> (the email_id) of a particular comment with the form that is actually
>>
> submitted.
>
>> There can be multiple comments with different values for email_id, but
>>
> only one
>
>> form that is submitted and prior to this patch the form does not
>>
> include any
>
>> information about *which* comment was being replied to. Hence the
>>
> moving of an
>
>> appropriate hidden input field containing the email_id to and from the
>> to-be-submitted form in the same manner as the textarea element(s).
>>
>
> All right. If there is no information about parent of the message that
> is being submitted, then it should be fixed, of course. But I would
> prefer to review this in a separate commit. And it should be not
> email_id, but some parent message id.
>
>
http://codereview.appspot.com/**4931044/<http://codereview.appspot.com/4931044/>
>
Sign in to reply to this message.

Powered by Google App Engine
RSS Feeds Recent Issues | This issue
This is Rietveld f62528b