|
|
Descriptionencoding: added the quotedprintable package.
Fixes issue 4943.
Fixes issue 4687.
Fixes issue 6611.
Fixes issue 7079.
Fixes issue 7140.
Patch Set 1 #Patch Set 2 : diff -r ea9b86f1ad2d https://code.google.com/p/go/ #Patch Set 3 : diff -r ea9b86f1ad2d https://code.google.com/p/go/ #Patch Set 4 : diff -r ea9b86f1ad2d https://code.google.com/p/go/ #
Total comments: 8
Patch Set 5 : diff -r ea9b86f1ad2d https://code.google.com/p/go/ #
Total comments: 14
Patch Set 6 : diff -r ea9b86f1ad2d https://code.google.com/p/go/ #Patch Set 7 : diff -r ea9b86f1ad2d https://code.google.com/p/go/ #
Total comments: 21
Patch Set 8 : diff -r ea9b86f1ad2d https://code.google.com/p/go/ #Patch Set 9 : diff -r 42bcf6ae4aff0511890d4fe35aa15f14b7b751c8 https://code.google.com/p/go/ #
Total comments: 15
Patch Set 10 : diff -r 601ad8b625532388ff9dee86409bf32c1f2f624a https://code.google.com/p/go/ #Patch Set 11 : diff -r 601ad8b625532388ff9dee86409bf32c1f2f624a https://code.google.com/p/go/ #
Total comments: 16
Patch Set 12 : diff -r 4085766e83907da960acf906f483372e1cce7130 https://code.google.com/p/go/ #
MessagesTotal messages: 40
Hello golang-codereviews@googlegroups.com, I'd like you to review this change to https://code.google.com/p/go/
Sign in to reply to this message.
I think quotedprintable package is not the right place to put functions for =?charset?encoding?encoded-text?=. It has nothing to do with the string encoded in that way unless <encoding> part is "Q". If we add these functions to this package, we probably need add a decoder for "=?UTF-8?B?...?=" to base64 package. I feel that's wrong. They should be added to mime package IMO.
Sign in to reply to this message.
That is right but quoted-printable and Q encoding are very similar so if we do not put all these functions in the same package we will have to duplicate code. Since quoted-printable is hardly used outside emails, I think all these functions should go to mime/quotedprintable or even mime. These encodings are defined in MIME specs and are not general encodings like base64 or hex.
Sign in to reply to this message.
Quoted printable is an encoding scheme, so I think encoding directory is the right place to put it. I don't understand why we'd have to have duplicate code in encoding and mime if we remove encoded-word (=?*charset*... stuff) from this package. You can still write a filtering Reader that recognizes "=?*Q|B*?" in an input, searches for the ending "?=", and passes the substring to quoted-printable or base64 decoder function. At least I'd like to say that the function only handles UTF-8 and Q encoding. That's too specific to be included to the standard library. On Thu, Jun 19, 2014 at 3:56 PM, <alexandre.cesaro@gmail.com> wrote: > That is right but quoted-printable and Q encoding are very similar so if > we do not put all these functions in the same package we will have to > duplicate code. > > Since quoted-printable is hardly used outside emails, I think all these > functions should go to mime/quotedprintable or even mime. These > encodings are defined in MIME specs and are not general encodings like > base64 or hex. > > https://codereview.appspot.com/101330049/ >
Sign in to reply to this message.
After looking more closely, the only duplicated code would be the encodeByte function (since Q encoding must encode more characters than quoted-printable like "?", "=" and "_") and some one-line functions like isVchar. Decoding headers would also be slightly slower but I think it would not be significant. So I think there are 3 choices: 1. Keep all functions in a package like encoding/quotedprintable or mime. 2. Keep quoted-printable functions in encoding/quotedprintable and move headers functions to a package in mime and duplicate encodeByte 3. Same as above but export encodeByte in encoding/quotedprintable On Fri, Jun 20, 2014 at 2:29 AM, Rui Ueyama <ruiu@google.com> wrote: > Quoted printable is an encoding scheme, so I think encoding directory is > the right place to put it. > > I don't understand why we'd have to have duplicate code in encoding and > mime if we remove encoded-word (=?*charset*... stuff) from this package. > You can still write a filtering Reader that recognizes "=?*Q|B*?" in an > input, searches for the ending "?=", and passes the substring to > quoted-printable or base64 decoder function. > > At least I'd like to say that the function only handles UTF-8 and Q > encoding. That's too specific to be included to the standard library. > > On Thu, Jun 19, 2014 at 3:56 PM, <alexandre.cesaro@gmail.com> wrote: > >> That is right but quoted-printable and Q encoding are very similar so if >> we do not put all these functions in the same package we will have to >> duplicate code. >> >> Since quoted-printable is hardly used outside emails, I think all these >> functions should go to mime/quotedprintable or even mime. These >> encodings are defined in MIME specs and are not general encodings like >> base64 or hex. >> >> https://codereview.appspot.com/101330049/ >> > >
Sign in to reply to this message.
On an unrelated matter, the Decode function would be much cleaner and simpler if the white spaces at the end of lines were not trimmed. I did it because it was done in the previous code but it is not defined in the spec. Brad it looks like you are the author of the previous code. Is there a specific reason white spaces at the end of lines are trimmed in quoted-printable decoding or can I remove that part of the code? On Fri, Jun 20, 2014 at 5:01 PM, Alexandre Cesaro < alexandre.cesaro@gmail.com> wrote: > After looking more closely, the only duplicated code would be the > encodeByte function (since Q encoding must encode more characters than > quoted-printable like "?", "=" and "_") and some one-line functions like > isVchar. Decoding headers would also be slightly slower but I think it > would not be significant. > > So I think there are 3 choices: > > 1. Keep all functions in a package like encoding/quotedprintable or > mime. > 2. Keep quoted-printable functions in encoding/quotedprintable and > move headers functions to a package in mime and duplicate encodeByte > 3. Same as above but export encodeByte in encoding/quotedprintable > > > > On Fri, Jun 20, 2014 at 2:29 AM, Rui Ueyama <ruiu@google.com> wrote: > >> Quoted printable is an encoding scheme, so I think encoding directory is >> the right place to put it. >> >> I don't understand why we'd have to have duplicate code in encoding and >> mime if we remove encoded-word (=?*charset*... stuff) from this package. >> You can still write a filtering Reader that recognizes "=?*Q|B*?" in an >> input, searches for the ending "?=", and passes the substring to >> quoted-printable or base64 decoder function. >> >> At least I'd like to say that the function only handles UTF-8 and Q >> encoding. That's too specific to be included to the standard library. >> >> On Thu, Jun 19, 2014 at 3:56 PM, <alexandre.cesaro@gmail.com> wrote: >> >>> That is right but quoted-printable and Q encoding are very similar so if >>> we do not put all these functions in the same package we will have to >>> duplicate code. >>> >>> Since quoted-printable is hardly used outside emails, I think all these >>> functions should go to mime/quotedprintable or even mime. These >>> encodings are defined in MIME specs and are not general encodings like >>> base64 or hex. >>> >>> https://codereview.appspot.com/101330049/ >>> >> >> >
Sign in to reply to this message.
On Fri, Jun 20, 2014 at 8:01 AM, Alexandre Cesaro < alexandre.cesaro@gmail.com> wrote: > After looking more closely, the only duplicated code would be the > encodeByte function (since Q encoding must encode more characters than > quoted-printable like "?", "=" and "_") and some one-line functions like > isVchar. Decoding headers would also be slightly slower but I think it > would not be significant. > Base64 supports two encodings, web-safe and the standard one. I'd do the same thing thing for Q encoding and the standard quoted-printable. So I think there are 3 choices: > > 1. Keep all functions in a package like encoding/quotedprintable or > mime. > 2. Keep quoted-printable functions in encoding/quotedprintable and > move headers functions to a package in mime and duplicate encodeByte > 3. Same as above but export encodeByte in encoding/quotedprintable > > > > On Fri, Jun 20, 2014 at 2:29 AM, Rui Ueyama <ruiu@google.com> wrote: > >> Quoted printable is an encoding scheme, so I think encoding directory is >> the right place to put it. >> >> I don't understand why we'd have to have duplicate code in encoding and >> mime if we remove encoded-word (=?*charset*... stuff) from this package. >> You can still write a filtering Reader that recognizes "=?*Q|B*?" in an >> input, searches for the ending "?=", and passes the substring to >> quoted-printable or base64 decoder function. >> >> At least I'd like to say that the function only handles UTF-8 and Q >> encoding. That's too specific to be included to the standard library. >> >> On Thu, Jun 19, 2014 at 3:56 PM, <alexandre.cesaro@gmail.com> wrote: >> >>> That is right but quoted-printable and Q encoding are very similar so if >>> we do not put all these functions in the same package we will have to >>> duplicate code. >>> >>> Since quoted-printable is hardly used outside emails, I think all these >>> functions should go to mime/quotedprintable or even mime. These >>> encodings are defined in MIME specs and are not general encodings like >>> base64 or hex. >>> >>> https://codereview.appspot.com/101330049/ >>> >> >> >
Sign in to reply to this message.
I am not sure it is a good idea. It will make the API more complicated (NewEncoder, etc) for a use case that is not needed. If we add the function to encode MIME headers directly, no one will need a function that only generate the encoded text of an encoded word. The base64 package allows you to define the alphabet used to encode/decode because that is how it is defined in the RFC. On Fri, Jun 20, 2014 at 9:43 PM, Rui Ueyama <ruiu@google.com> wrote: > On Fri, Jun 20, 2014 at 8:01 AM, Alexandre Cesaro < > alexandre.cesaro@gmail.com> wrote: > >> After looking more closely, the only duplicated code would be the >> encodeByte function (since Q encoding must encode more characters than >> quoted-printable like "?", "=" and "_") and some one-line functions like >> isVchar. Decoding headers would also be slightly slower but I think it >> would not be significant. >> > > Base64 supports two encodings, web-safe and the standard one. I'd do the > same thing thing for Q encoding and the standard quoted-printable. > > So I think there are 3 choices: >> >> 1. Keep all functions in a package like encoding/quotedprintable or >> mime. >> 2. Keep quoted-printable functions in encoding/quotedprintable and >> move headers functions to a package in mime and duplicate encodeByte >> 3. Same as above but export encodeByte in encoding/quotedprintable >> >> >> >> On Fri, Jun 20, 2014 at 2:29 AM, Rui Ueyama <ruiu@google.com> wrote: >> >>> Quoted printable is an encoding scheme, so I think encoding directory is >>> the right place to put it. >>> >>> I don't understand why we'd have to have duplicate code in encoding and >>> mime if we remove encoded-word (=?*charset*... stuff) from this >>> package. You can still write a filtering Reader that recognizes "=?*Q|B*?" >>> in an input, searches for the ending "?=", and passes the substring to >>> quoted-printable or base64 decoder function. >>> >>> At least I'd like to say that the function only handles UTF-8 and Q >>> encoding. That's too specific to be included to the standard library. >>> >>> On Thu, Jun 19, 2014 at 3:56 PM, <alexandre.cesaro@gmail.com> wrote: >>> >>>> That is right but quoted-printable and Q encoding are very similar so if >>>> we do not put all these functions in the same package we will have to >>>> duplicate code. >>>> >>>> Since quoted-printable is hardly used outside emails, I think all these >>>> functions should go to mime/quotedprintable or even mime. These >>>> encodings are defined in MIME specs and are not general encodings like >>>> base64 or hex. >>>> >>>> https://codereview.appspot.com/101330049/ >>>> >>> >>> >> >
Sign in to reply to this message.
What do you think? On Sun, Jun 22, 2014 at 1:59 PM, Alexandre Cesaro < alexandre.cesaro@gmail.com> wrote: > I am not sure it is a good idea. It will make the API more complicated > (NewEncoder, etc) for a use case that is not needed. If we add the function > to encode MIME headers directly, no one will need a function that only > generate the encoded text of an encoded word. > > The base64 package allows you to define the alphabet used to encode/decode > because that is how it is defined in the RFC. > > > On Fri, Jun 20, 2014 at 9:43 PM, Rui Ueyama <ruiu@google.com> wrote: > >> On Fri, Jun 20, 2014 at 8:01 AM, Alexandre Cesaro < >> alexandre.cesaro@gmail.com> wrote: >> >>> After looking more closely, the only duplicated code would be the >>> encodeByte function (since Q encoding must encode more characters than >>> quoted-printable like "?", "=" and "_") and some one-line functions like >>> isVchar. Decoding headers would also be slightly slower but I think it >>> would not be significant. >>> >> >> Base64 supports two encodings, web-safe and the standard one. I'd do the >> same thing thing for Q encoding and the standard quoted-printable. >> >> So I think there are 3 choices: >>> >>> 1. Keep all functions in a package like encoding/quotedprintable or >>> mime. >>> 2. Keep quoted-printable functions in encoding/quotedprintable and >>> move headers functions to a package in mime and duplicate encodeByte >>> 3. Same as above but export encodeByte in encoding/quotedprintable >>> >>> >>> >>> On Fri, Jun 20, 2014 at 2:29 AM, Rui Ueyama <ruiu@google.com> wrote: >>> >>>> Quoted printable is an encoding scheme, so I think encoding directory >>>> is the right place to put it. >>>> >>>> I don't understand why we'd have to have duplicate code in encoding and >>>> mime if we remove encoded-word (=?*charset*... stuff) from this >>>> package. You can still write a filtering Reader that recognizes "=? >>>> *Q|B*?" in an input, searches for the ending "?=", and passes the >>>> substring to quoted-printable or base64 decoder function. >>>> >>>> At least I'd like to say that the function only handles UTF-8 and Q >>>> encoding. That's too specific to be included to the standard library. >>>> >>>> On Thu, Jun 19, 2014 at 3:56 PM, <alexandre.cesaro@gmail.com> wrote: >>>> >>>>> That is right but quoted-printable and Q encoding are very similar so >>>>> if >>>>> we do not put all these functions in the same package we will have to >>>>> duplicate code. >>>>> >>>>> Since quoted-printable is hardly used outside emails, I think all these >>>>> functions should go to mime/quotedprintable or even mime. These >>>>> encodings are defined in MIME specs and are not general encodings like >>>>> base64 or hex. >>>>> >>>>> https://codereview.appspot.com/101330049/ >>>>> >>>> >>>> >>> >> >
Sign in to reply to this message.
Sorry, this isn't a thorough review yet. Just looking at the API. I'm back to leaning that this should be in mime/quotedprintable instead of encoding, just because I think it'll grow MIME warts over time regardless. https://codereview.appspot.com/101330049/diff/60001/src/pkg/encoding/quotedpr... File src/pkg/encoding/quotedprintable/quotedprintable.go (right): https://codereview.appspot.com/101330049/diff/60001/src/pkg/encoding/quotedpr... src/pkg/encoding/quotedprintable/quotedprintable.go:25: func Encode(dst, src []byte) (n int) { If you're going to name n in the result parameter, I'd expect to see it used in docs, otherwise it doesn't add much and I might remove its name. https://codereview.appspot.com/101330049/diff/60001/src/pkg/encoding/quotedpr... src/pkg/encoding/quotedprintable/quotedprintable.go:66: const hextable = "0123456789ABCDEF" let's call this upperhex https://codereview.appspot.com/101330049/diff/60001/src/pkg/encoding/quotedpr... src/pkg/encoding/quotedprintable/quotedprintable.go:193: // NewDecoder returns a new quoted-printable stream decoder. do we use the word "stream" anywhere else for talking about io readers and writers? https://codereview.appspot.com/101330049/diff/60001/src/pkg/encoding/quotedpr... src/pkg/encoding/quotedprintable/quotedprintable.go:268: // isNewline returns true if c is a newline character. s/returns true if/reports whether/ to use Go standard wording.
Sign in to reply to this message.
Yes I also think this should be in the mime/quotedprintable package. Quoted-printable is not a general encoding scheme like base64 or hex. It is defined in MIME specs and it is unlikely to be used elsewhere. Also if we decide to put headers encoding functions in another package than mime/quotedprintable (like mime or mime/headers) and if internal packages are validated, we can put the shared code in mime/internal. https://codereview.appspot.com/101330049/diff/60001/src/pkg/encoding/quotedpr... File src/pkg/encoding/quotedprintable/quotedprintable.go (right): https://codereview.appspot.com/101330049/diff/60001/src/pkg/encoding/quotedpr... src/pkg/encoding/quotedprintable/quotedprintable.go:193: // NewDecoder returns a new quoted-printable stream decoder. Yes in the base64 package: http://golang.org/pkg/encoding/base64/#NewDecoder On 2014/07/01 16:22:07, bradfitz wrote: > do we use the word "stream" anywhere else for talking about io readers and > writers?
Sign in to reply to this message.
https://codereview.appspot.com/101330049/diff/60001/src/pkg/encoding/quotedpr... File src/pkg/encoding/quotedprintable/quotedprintable.go (right): https://codereview.appspot.com/101330049/diff/60001/src/pkg/encoding/quotedpr... src/pkg/encoding/quotedprintable/quotedprintable.go:25: func Encode(dst, src []byte) (n int) { On 2014/07/01 16:22:07, bradfitz wrote: > If you're going to name n in the result parameter, I'd expect to see it used in > docs, otherwise it doesn't add much and I might remove its name. Done. https://codereview.appspot.com/101330049/diff/60001/src/pkg/encoding/quotedpr... src/pkg/encoding/quotedprintable/quotedprintable.go:66: const hextable = "0123456789ABCDEF" On 2014/07/01 16:22:07, bradfitz wrote: > let's call this upperhex Done. https://codereview.appspot.com/101330049/diff/60001/src/pkg/encoding/quotedpr... src/pkg/encoding/quotedprintable/quotedprintable.go:268: // isNewline returns true if c is a newline character. On 2014/07/01 16:22:07, bradfitz wrote: > s/returns true if/reports whether/ to use Go standard wording. Done.
Sign in to reply to this message.
https://codereview.appspot.com/101330049/diff/80001/src/pkg/encoding/quotedpr... File src/pkg/encoding/quotedprintable/header.go (right): https://codereview.appspot.com/101330049/diff/80001/src/pkg/encoding/quotedpr... src/pkg/encoding/quotedprintable/header.go:22: Q = "Q" let's make these proper types. type Encoding string Q Encoding = "Q" etc Then below it can take an Encoding instead of a string, which is better documentation-wise. https://codereview.appspot.com/101330049/diff/80001/src/pkg/encoding/quotedpr... src/pkg/encoding/quotedprintable/header.go:41: func NewHeaderEncoder(charset string, enc string) (*HeaderEncoder, error) { enc Encoding? The docs didn't say what this meant anyway, so having a type here would help clarify. https://codereview.appspot.com/101330049/diff/80001/src/pkg/encoding/quotedpr... src/pkg/encoding/quotedprintable/header.go:52: return &HeaderEncoder{charset, enc, splitWords}, nil don't use a tagless struct literal like this when multiple fields are the same type. name the fields. https://codereview.appspot.com/101330049/diff/80001/src/pkg/encoding/quotedpr... src/pkg/encoding/quotedprintable/header.go:57: func (e *HeaderEncoder) EncodeHeader(s string) string { if the type is already called HeaderEncoder, does this need to have Header in the method name too? Could it be just Encode? https://codereview.appspot.com/101330049/diff/80001/src/pkg/encoding/quotedpr... src/pkg/encoding/quotedprintable/header.go:77: buf := new(bytes.Buffer) this whole package seems very allocation-heavy and biased towards strings. It'd be nice if we could have a more append-to-[]byte pattern, and let others only convert to strings if they want to. I'm afraid some of this API as-is forces callers to allocate a lot. https://codereview.appspot.com/101330049/diff/80001/src/pkg/encoding/quotedpr... src/pkg/encoding/quotedprintable/header.go:181: // is encoded in the returned charset. So text is not necessarily encoded in Drop the word "So " https://codereview.appspot.com/101330049/diff/80001/src/pkg/encoding/quotedpr... src/pkg/encoding/quotedprintable/header.go:182: // UTF-8. As such, this function does not support decoding headers with multiple And drop "As such, "
Sign in to reply to this message.
https://codereview.appspot.com/101330049/diff/80001/src/pkg/encoding/quotedpr... File src/pkg/encoding/quotedprintable/header.go (right): https://codereview.appspot.com/101330049/diff/80001/src/pkg/encoding/quotedpr... src/pkg/encoding/quotedprintable/header.go:22: Q = "Q" On 2014/07/09 20:15:06, bradfitz wrote: > let's make these proper types. > > type Encoding string > > Q Encoding = "Q" > etc > > Then below it can take an Encoding instead of a string, which is better > documentation-wise. Done. https://codereview.appspot.com/101330049/diff/80001/src/pkg/encoding/quotedpr... src/pkg/encoding/quotedprintable/header.go:41: func NewHeaderEncoder(charset string, enc string) (*HeaderEncoder, error) { On 2014/07/09 20:15:06, bradfitz wrote: > enc Encoding? > > The docs didn't say what this meant anyway, so having a type here would help > clarify. Done. https://codereview.appspot.com/101330049/diff/80001/src/pkg/encoding/quotedpr... src/pkg/encoding/quotedprintable/header.go:52: return &HeaderEncoder{charset, enc, splitWords}, nil On 2014/07/09 20:15:06, bradfitz wrote: > don't use a tagless struct literal like this when multiple fields are the same > type. name the fields. Done. https://codereview.appspot.com/101330049/diff/80001/src/pkg/encoding/quotedpr... src/pkg/encoding/quotedprintable/header.go:57: func (e *HeaderEncoder) EncodeHeader(s string) string { On 2014/07/09 20:15:06, bradfitz wrote: > if the type is already called HeaderEncoder, does this need to have Header in > the method name too? Could it be just Encode? Done. https://codereview.appspot.com/101330049/diff/80001/src/pkg/encoding/quotedpr... src/pkg/encoding/quotedprintable/header.go:77: buf := new(bytes.Buffer) On 2014/07/09 20:15:06, bradfitz wrote: > this whole package seems very allocation-heavy and biased towards strings. It'd > be nice if we could have a more append-to-[]byte pattern, and let others only > convert to strings if they want to. > > I'm afraid some of this API as-is forces callers to allocate a lot. I think the most similar functions in the standard library are found in the html package (http://golang.org/pkg/html/) and they also use strings as parameters and return values. As a reminder, the functions of this package convert the value of a header like "Subject: Café" to "Subject: =?UTF-8?Q?Caf=C3=A9?=". So the string lengths involved are usually quite short which means returning a string does not allocate more than returning a byte slice. In practice, both in the current standard library (https://code.google.com/p/go/source/browse/src/pkg/net/mail/message.go?name=d...) and in my email library (https://github.com/alexcesaro/mail/blob/40cb059905d30e16c1a94cc381b38897aeb0f...) using strings is the most optimal choice and doing otherwise would add an overhead. It is also easier to use with net/mail.Header which is a map of []string. What do you think? https://codereview.appspot.com/101330049/diff/80001/src/pkg/encoding/quotedpr... src/pkg/encoding/quotedprintable/header.go:181: // is encoded in the returned charset. So text is not necessarily encoded in On 2014/07/09 20:15:06, bradfitz wrote: > Drop the word "So " Done. https://codereview.appspot.com/101330049/diff/80001/src/pkg/encoding/quotedpr... src/pkg/encoding/quotedprintable/header.go:182: // UTF-8. As such, this function does not support decoding headers with multiple On 2014/07/09 20:15:06, bradfitz wrote: > And drop "As such, " Done.
Sign in to reply to this message.
On Wed, Jul 16, 2014 at 7:24 PM, <alexandre.cesaro@gmail.com> wrote: > > https://codereview.appspot.com/101330049/diff/80001/src/pkg/encoding/quotedpr... > src/pkg/encoding/quotedprintable/header.go:77: buf := new(bytes.Buffer) > On 2014/07/09 20:15:06, bradfitz wrote: > >> this whole package seems very allocation-heavy and biased towards >> > strings. It'd > >> be nice if we could have a more append-to-[]byte pattern, and let >> > others only > >> convert to strings if they want to. >> > > I'm afraid some of this API as-is forces callers to allocate a lot. >> > > I think the most similar functions in the standard library are found in > the html package (http://golang.org/pkg/html/) and they also use strings > as parameters and return values. > > As a reminder, the functions of this package convert the value of a > header like "Subject: Café" to "Subject: =?UTF-8?Q?Caf=C3=A9?=". So the > string lengths involved are usually quite short which means returning a > string does not allocate more than returning a byte slice. > > In practice, both in the current standard library > (https://code.google.com/p/go/source/browse/src/pkg/net/ > mail/message.go?name=default&r=0d029f2ef37708a258d2cdd9092edb > d9d34c1978#442) > and in my email library > (https://github.com/alexcesaro/mail/blob/40cb059905d30e16c1a94cc381b388 > 97aeb0fab5/gomail/gomail.go#L104) > using strings is the most optimal choice and doing otherwise would add > an overhead. > It is also easier to use with net/mail.Header which is a map of > []string. > > What do you think? > I just found that my assumption that returning a small string was allocating only once was wrong ( https://groups.google.com/d/msg/golang-dev/ymhjBdQemUY/COk59BJcaUYJ). So I now agree with you. I am changing that right away.
Sign in to reply to this message.
There is a caveat to return []byte. Encode currently returns the input if the string does not need to be encoded. Now it needs to convert it to []byte. And it is the most frequent use-case when encoding email headers. Making the change will allocates twice instead of nothing. We could export the function needsEncoding but the API will then be more complex for the user. So I finally think it should be left as it is now. What do you think? On Thu, Jul 17, 2014 at 5:19 PM, Alexandre Cesaro < alexandre.cesaro@gmail.com> wrote: > > On Wed, Jul 16, 2014 at 7:24 PM, <alexandre.cesaro@gmail.com> wrote: > >> >> https://codereview.appspot.com/101330049/diff/80001/src/pkg/encoding/quotedpr... >> src/pkg/encoding/quotedprintable/header.go:77: buf := new(bytes.Buffer) >> On 2014/07/09 20:15:06, bradfitz wrote: >> >>> this whole package seems very allocation-heavy and biased towards >>> >> strings. It'd >> >>> be nice if we could have a more append-to-[]byte pattern, and let >>> >> others only >> >>> convert to strings if they want to. >>> >> >> I'm afraid some of this API as-is forces callers to allocate a lot. >>> >> >> I think the most similar functions in the standard library are found in >> the html package (http://golang.org/pkg/html/) and they also use strings >> as parameters and return values. >> >> As a reminder, the functions of this package convert the value of a >> header like "Subject: Café" to "Subject: =?UTF-8?Q?Caf=C3=A9?=". So the >> string lengths involved are usually quite short which means returning a >> string does not allocate more than returning a byte slice. >> >> In practice, both in the current standard library >> (https://code.google.com/p/go/source/browse/src/pkg/net/ >> mail/message.go?name=default&r=0d029f2ef37708a258d2cdd9092edb >> d9d34c1978#442) >> and in my email library >> (https://github.com/alexcesaro/mail/blob/40cb059905d30e16c1a94cc381b388 >> 97aeb0fab5/gomail/gomail.go#L104) >> using strings is the most optimal choice and doing otherwise would add >> an overhead. >> It is also easier to use with net/mail.Header which is a map of >> []string. >> >> What do you think? >> > > I just found that my assumption that returning a small string was > allocating only once was wrong ( > https://groups.google.com/d/msg/golang-dev/ymhjBdQemUY/COk59BJcaUYJ). > > So I now agree with you. I am changing that right away. > >
Sign in to reply to this message.
Did you get a chance to have a look? I've got some time on my hands to work on this issue if needed.
Sign in to reply to this message.
Here are some drive-by comments and nits. https://codereview.appspot.com/101330049/diff/120001/src/pkg/encoding/quotedp... File src/pkg/encoding/quotedprintable/header.go (right): https://codereview.appspot.com/101330049/diff/120001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/header.go:31: func (enc *Encoding) equals(e Encoding) bool { I'd cut this function, and use 'if enc != Q && enc != B {' and 'if e.encoding == B {' below. https://codereview.appspot.com/101330049/diff/120001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/header.go:91: if e.encoding.equals(B) { There's a lot of deep nesting here. Any chance this can be refactored to make the control flow more obvious? For example, switch e.encoding { case B: e.encodeWordB(buf, s) case Q: e.encodeWordQ(buf, s) } and then encodeWordB can look something more like: if !e.splitWords || base64.StdEncoding.EncodedLen(len(s)) <= maxLen { buf.WriteString(base64.StdEncoding.EncodeToString([]byte(s))) return } var n, last, runeSize int // ... the rest See also: https://code.google.com/p/go-wiki/wiki/CodeReviewComments#Indent_Error_Flow https://codereview.appspot.com/101330049/diff/120001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/header.go:94: buf.WriteString(base64.StdEncoding.EncodeToString([]byte(s))) I think that using a base64.NewEncoder might alloc less than going through EncodeToString and WriteString, but I'm not sure. A couple of benchmarks would be nice; that'd make this easy to test. https://codereview.appspot.com/101330049/diff/120001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/header.go:147: buf.Write([]byte(e.encoding)) I think buf.WriteString(string(e.encoding)) should remove an alloc. Again: Benchmarks. :) https://codereview.appspot.com/101330049/diff/120001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/header.go:150: return 4 + len(e.charset) + len(e.encoding) Easier to follow, and cheap: n := buf.Len() // write stuff return buf.Len() - n https://codereview.appspot.com/101330049/diff/120001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/header.go:186: encodeByte(enc[0:3], b) s/0// https://codereview.appspot.com/101330049/diff/120001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/header.go:263: switch Encoding(strings.ToUpper(enc)) { Is ToUpper here needed for RFC compatibility? https://codereview.appspot.com/101330049/diff/120001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/header.go:301: return dec[:n], fmt.Errorf("quotedprintable: invalid unescaped byte 0x%02x in Q encoded string", c) s/0x%02x/%#02x/ Also, might be useful to include i. https://codereview.appspot.com/101330049/diff/120001/src/pkg/encoding/quotedp... File src/pkg/encoding/quotedprintable/header_test.go (right): https://codereview.appspot.com/101330049/diff/120001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/header_test.go:14: fmt.Println(StdHeaderEncoder.Encode("Cofee")) Coffee has two f's. https://codereview.appspot.com/101330049/diff/120001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/header_test.go:25: return Here and below, log.Fatal or panic lets this be a single line. https://codereview.appspot.com/101330049/diff/120001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/header_test.go:68: t.Errorf("NewHeaderEncoder(%q, %q) = error %v, want %v", test.charset, test.encoding, err, error(nil)) No need for error(nil). Just write nil directly in the format string. (Here and elsewhere.) Or just something like "NewHeaderEncoder(%q, %q): %v". The fact that there's the test fails and we're seeing an error message is probably enough context. https://codereview.appspot.com/101330049/diff/120001/src/pkg/encoding/quotedp... File src/pkg/encoding/quotedprintable/quotedprintable.go (right): https://codereview.appspot.com/101330049/diff/120001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/quotedprintable.go:10: // with other broken QP encoders & decoders. Spell out "and". https://codereview.appspot.com/101330049/diff/120001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/quotedprintable.go:27: for i := 0; i < len(src); i++ { for i, c := range src? https://codereview.appspot.com/101330049/diff/120001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/quotedprintable.go:51: if i == len(src)-1 || Instead of: if cond { return true } return false Use: return cond https://codereview.appspot.com/101330049/diff/120001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/quotedprintable.go:114: for i := 0; i < len(src); i++ { Reduce nesting for readability, if possible. https://codereview.appspot.com/101330049/diff/120001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/quotedprintable.go:243: return 0, fmt.Errorf("quotedprintable: invalid quoted-printable hex byte 0x%02x", b) s/0x%02x/%#02x/
Sign in to reply to this message.
Thank you. I just did the changes. Except those: https://codereview.appspot.com/101330049/diff/120001/src/pkg/encoding/quotedp... File src/pkg/encoding/quotedprintable/header.go (right): https://codereview.appspot.com/101330049/diff/120001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/header.go:31: func (enc *Encoding) equals(e Encoding) bool { On 2014/08/20 21:16:57, josharian wrote: > I'd cut this function, and use 'if enc != Q && enc != B {' and 'if e.encoding == > B {' below. Encoding names are case-insensitive: http://tools.ietf.org/html/rfc2047#section-2 https://codereview.appspot.com/101330049/diff/120001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/header.go:94: buf.WriteString(base64.StdEncoding.EncodeToString([]byte(s))) On 2014/08/20 21:16:57, josharian wrote: > I think that using a base64.NewEncoder might alloc less than going through > EncodeToString and WriteString, but I'm not sure. A couple of benchmarks would > be nice; that'd make this easy to test. I benchmarked encoding "¡Hola, señor!" for the current code: BenchmarkBEncode 1000000 1037 ns/op 248 B/op 6 allocs/op I replaced this line by: enc := base64.NewEncoder(base64.StdEncoding, buf) enc.Write([]byte(s[last:i])) enc.Close() And it does not allocate less and is consistently slower: BenchmarkBEncode 1000000 1077 ns/op 248 B/op 6 allocs/op https://codereview.appspot.com/101330049/diff/120001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/header.go:147: buf.Write([]byte(e.encoding)) On 2014/08/20 21:16:57, josharian wrote: > I think buf.WriteString(string(e.encoding)) should remove an alloc. Again: > Benchmarks. :) Nice one! Before: BenchmarkQEncode 1000000 1148 ns/op 168 B/op 3 allocs/op BenchmarkBEncode 1000000 1086 ns/op 248 B/op 6 allocs/op After: BenchmarkQEncode 1000000 1088 ns/op 160 B/op 2 allocs/op BenchmarkBEncode 2000000 1001 ns/op 240 B/op 5 allocs/op https://codereview.appspot.com/101330049/diff/120001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/header.go:263: switch Encoding(strings.ToUpper(enc)) { On 2014/08/20 21:16:57, josharian wrote: > Is ToUpper here needed for RFC compatibility? Yes: http://tools.ietf.org/html/rfc2047#section-2 https://codereview.appspot.com/101330049/diff/120001/src/pkg/encoding/quotedp... File src/pkg/encoding/quotedprintable/quotedprintable.go (right): https://codereview.appspot.com/101330049/diff/120001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/quotedprintable.go:114: for i := 0; i < len(src); i++ { On 2014/08/20 21:16:58, josharian wrote: > Reduce nesting for readability, if possible. It is hard to do without using a struct and then it is much slower in this case. However the code could be much simpler here. As I said in a previous message: "On an unrelated matter, the Decode function would be much cleaner and simpler if the white spaces at the end of lines were not trimmed. I did it because it was done in the previous code but it is not defined in the spec. Brad it looks like you are the author of the previous code. Is there a specific reason white spaces at the end of lines are trimmed in quoted-printable decoding or can I remove that part of the code?" This white-space trimming also led to inaccurate error messages in the previous code: https://github.com/alexcesaro/qpbench/blob/18da7043520f4aef58cb96502f3e2268d3...
Sign in to reply to this message.
> https://codereview.appspot.com/101330049/diff/120001/src/pkg/encoding/quotedp... > src/pkg/encoding/quotedprintable/header.go:31: func (enc *Encoding) equals(e > Encoding) bool { > On 2014/08/20 21:16:57, josharian wrote: > > I'd cut this function, and use 'if enc != Q && enc != B {' and 'if e.encoding > == > > B {' below. > > Encoding names are case-insensitive: > http://tools.ietf.org/html/rfc2047#section-2 When you receive encodings as data from the outside world, sure. But I think it's reasonable to insist that folks using the API use quotedprintable.B and quotedprintable.Q and not "b" or "q". I won't push it further, though. > https://codereview.appspot.com/101330049/diff/120001/src/pkg/encoding/quotedp... > src/pkg/encoding/quotedprintable/header.go:94: > buf.WriteString(base64.StdEncoding.EncodeToString([]byte(s))) > On 2014/08/20 21:16:57, josharian wrote: > > I think that using a base64.NewEncoder might alloc less than going through > > EncodeToString and WriteString, but I'm not sure. A couple of benchmarks would > > be nice; that'd make this easy to test. > > I benchmarked encoding "¡Hola, señor!" for the current code: > BenchmarkBEncode 1000000 1037 ns/op 248 B/op 6 allocs/op > > I replaced this line by: > enc := base64.NewEncoder(base64.StdEncoding, buf) > enc.Write([]byte(s[last:i])) > enc.Close() > > And it does not allocate less and is consistently slower: > BenchmarkBEncode 1000000 1077 ns/op 248 B/op 6 allocs/op Ah, because you get stuck doing a string/[]byte conversion either way. Sigh. However, a few lines later, you convert in a loop: '[]byte(s[last:i])'. You might want to convert s to a []byte up before the loop and take slices of the []byte each time instead. It doesn't look at a glance like your current benchmarks exercise this code path. > https://codereview.appspot.com/101330049/diff/120001/src/pkg/encoding/quotedp... > src/pkg/encoding/quotedprintable/header.go:147: buf.Write([]byte(e.encoding)) > On 2014/08/20 21:16:57, josharian wrote: > > I think buf.WriteString(string(e.encoding)) should remove an alloc. Again: > > Benchmarks. :) > > Nice one! > > Before: > BenchmarkQEncode 1000000 1148 ns/op 168 B/op 3 allocs/op > BenchmarkBEncode 1000000 1086 ns/op 248 B/op 6 allocs/op > > After: > BenchmarkQEncode 1000000 1088 ns/op 160 B/op 2 allocs/op > BenchmarkBEncode 2000000 1001 ns/op 240 B/op 5 allocs/op Tip: http://godoc.org/code.google.com/p/go.tools/cmd/benchcmp :)
Sign in to reply to this message.
On Thu, Aug 21, 2014 at 8:13 PM, <josharian@gmail.com> wrote: > > https://codereview.appspot.com/101330049/diff/120001/src/ > pkg/encoding/quotedprintable/header.go#newcode31 > >> src/pkg/encoding/quotedprintable/header.go:31: func (enc *Encoding) >> > equals(e > >> Encoding) bool { >> On 2014/08/20 21:16:57, josharian wrote: >> > I'd cut this function, and use 'if enc != Q && enc != B {' and 'if >> > e.encoding > >> == >> > B {' below. >> > > Encoding names are case-insensitive: >> http://tools.ietf.org/html/rfc2047#section-2 >> > > When you receive encodings as data from the outside world, sure. But I > think it's reasonable to insist that folks using the API use > quotedprintable.B and quotedprintable.Q and not "b" or "q". I won't push > it further, though. This function is needed afterwards so enforcing upper-case here won't make the code simpler. That's why I think it should be left as it is. > > > https://codereview.appspot.com/101330049/diff/120001/src/ > pkg/encoding/quotedprintable/header.go#newcode94 > >> src/pkg/encoding/quotedprintable/header.go:94: >> buf.WriteString(base64.StdEncoding.EncodeToString([]byte(s))) >> On 2014/08/20 21:16:57, josharian wrote: >> > I think that using a base64.NewEncoder might alloc less than going >> > through > >> > EncodeToString and WriteString, but I'm not sure. A couple of >> > benchmarks would > >> > be nice; that'd make this easy to test. >> > > I benchmarked encoding "¡Hola, señor!" for the current code: >> BenchmarkBEncode 1000000 1037 ns/op 248 >> B/op 6 >> > allocs/op > > I replaced this line by: >> enc := base64.NewEncoder(base64.StdEncoding, buf) >> enc.Write([]byte(s[last:i])) >> enc.Close() >> > > And it does not allocate less and is consistently slower: >> BenchmarkBEncode 1000000 1077 ns/op 248 >> B/op 6 >> > allocs/op > > Ah, because you get stuck doing a string/[]byte conversion either way. > Sigh. > > However, a few lines later, you convert in a loop: '[]byte(s[last:i])'. > You might want to convert s to a []byte up before the loop and take > slices of the []byte each time instead. It doesn't look at a glance like > your current benchmarks exercise this code path. You are right my previous benchmark was a mess. I did a new one (BenchmarkBEncode encodes ¡Hola, señor!, and BenchmarkBEncode2 encodes 4 concatenations of that string so the word is split): benchmark old ns/op new ns/op delta BenchmarkBEncode 1010 1698 +68.12% BenchmarkBEncode2 3353 3737 +11.45% benchmark old allocs new allocs delta BenchmarkBEncode 5 4 -20.00% BenchmarkBEncode2 9 6 -33.33% benchmark old bytes new bytes delta BenchmarkBEncode 240 1339 +457.92% BenchmarkBEncode2 736 1707 +131.93% There is less allocs using a new encoder but it is slower and allocates more memory. It does not looks like a good trade-off. However converting s to []byte before the loop is a good idea, I will make the change. > > https://codereview.appspot.com/101330049/diff/120001/src/ > pkg/encoding/quotedprintable/header.go#newcode147 > >> src/pkg/encoding/quotedprintable/header.go:147: >> > buf.Write([]byte(e.encoding)) > >> On 2014/08/20 21:16:57, josharian wrote: >> > I think buf.WriteString(string(e.encoding)) should remove an alloc. >> > Again: > >> > Benchmarks. :) >> > > Nice one! >> > > Before: >> BenchmarkQEncode 1000000 1148 ns/op 168 >> B/op 3 >> > allocs/op > >> BenchmarkBEncode 1000000 1086 ns/op 248 >> B/op 6 >> > allocs/op > > After: >> BenchmarkQEncode 1000000 1088 ns/op 160 >> B/op 2 >> > allocs/op > >> BenchmarkBEncode 2000000 1001 ns/op 240 >> B/op 5 >> > allocs/op > > Tip: http://godoc.org/code.google.com/p/go.tools/cmd/benchcmp :) > > > > > https://codereview.appspot.com/101330049/ > Thanks!
Sign in to reply to this message.
https://codereview.appspot.com/101330049/diff/160001/src/pkg/encoding/quotedp... File src/pkg/encoding/quotedprintable/header.go (right): https://codereview.appspot.com/101330049/diff/160001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/header.go:26: Q Encoding = "Q" It's odd if B encoding is in "quotedprintable" package. B encoding is of course not a part of Q encoding and neither of them superior to the other. They are created equal. I personally see B encoding more often than Q encoding since for CJK messages they use B encoding. Q encoding is used when your message consists of mostly ASCII with a few non-ASCII characters. If I want to decode a string that I know in B encoding, and I had to import quotedprintable package to do that, I'd probably feel something's wrong.
Sign in to reply to this message.
On Sat, Aug 23, 2014 at 2:31 AM, <ruiu@google.com> wrote: > > https://codereview.appspot.com/101330049/diff/160001/src/ > pkg/encoding/quotedprintable/header.go > File src/pkg/encoding/quotedprintable/header.go (right): > > https://codereview.appspot.com/101330049/diff/160001/src/ > pkg/encoding/quotedprintable/header.go#newcode26 > src/pkg/encoding/quotedprintable/header.go:26: Q Encoding = "Q" > It's odd if B encoding is in "quotedprintable" package. > B encoding is of course not a part of Q encoding and > neither of them superior to the other. > They are created equal. I personally see B encoding more > often than Q encoding since for CJK messages they use B > encoding. Q encoding is used when your message consists > of mostly ASCII with a few non-ASCII characters. > > If I want to decode a string that I know in B encoding, > and I had to import quotedprintable package to do that, > I'd probably feel something's wrong. > > https://codereview.appspot.com/101330049/ > You are right. We discussed it earlier but nothing was settled. First a recap: - quoted-printable encoding is defined in MIME RFC 2045 - MIME header encoding is defined in RFC 2047, it includes 2 encoding schemes: base64 and Q (which is very similar to quoted-printable) - header and quoted-printable functions share some code (around 20 LOC), because Q encoding is similar to quoted-printable So I propose to organize the code like that: - quoted-printable code in package mime/quotedprintable - RFC 2047 code in package mime or mime/headers - shared code in mime/internal What do you think?
Sign in to reply to this message.
https://codereview.appspot.com/101330049/diff/160001/src/pkg/encoding/quotedp... File src/pkg/encoding/quotedprintable/header.go (right): https://codereview.appspot.com/101330049/diff/160001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/header.go:31: func (enc *Encoding) equals(e Encoding) bool { I don't think this is necessary. Also, you wouldn't put the method on the pointer anyway. I'd just delete it. https://codereview.appspot.com/101330049/diff/160001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/header.go:39: // HeaderEncoder in an encoder for encoded words. // A HeaderEncoder is an RFC 2047 header word encoder. Or something like that. https://codereview.appspot.com/101330049/diff/160001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/header.go:53: func NewHeaderEncoder(charset string, enc Encoding) (*HeaderEncoder, error) { if you make this a method on on the encoding, then it doesn't need to return an error, and StdHeaderEncoder can be initialized by calling it. And you could remove some docs. https://codereview.appspot.com/101330049/diff/160001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/header.go:54: if !enc.equals(Q) && !enc.equals(B) { enc != "B" && enc != "Q" https://codereview.appspot.com/101330049/diff/160001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/header.go:64: return &HeaderEncoder{charset, enc, splitWords}, nil should this charset be strings.ToUpper too? all the examples in RFC 2047 seem to show uppercase, even though it's defined to be case-insensitive. https://codereview.appspot.com/101330049/diff/160001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/header.go:207: // is encoded in the returned charset. Text is not necessarily encoded in what case is the returned charset? I'd normalize it to something. https://codereview.appspot.com/101330049/diff/160001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/header.go:210: func DecodeHeader(header string) (text string, charset string, err error) { (text, charset string, err error) https://codereview.appspot.com/101330049/diff/160001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/header.go:211: var buf bytes.Buffer you could keep a sync.Pool of these *bytes.Buffers to avoid all the allocs https://codereview.appspot.com/101330049/diff/160001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/header.go:266: var rfc2047 = regexp.MustCompile(`^=\?[\w\-]+\?[bBqQ]\?[^?]+\?=`) bad variable name. pick a more descriptive one and add a comment saying where in RFC 2047 this is defined. https://codereview.appspot.com/101330049/diff/160001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/header.go:295: dec := make([]byte, MaxDecodedLen(len(s))) I'd do a first pass to see how many bytes you actually need, and then allocate the correct amount. https://codereview.appspot.com/101330049/diff/160001/src/pkg/encoding/quotedp... File src/pkg/encoding/quotedprintable/quotedprintable.go (right): https://codereview.appspot.com/101330049/diff/160001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/quotedprintable.go:12: // Package quotedprintable implements quoted-printable and message header encoding as I'd like to see the mime/multipart quotedprintable code be deleted in this CL and use this code. But I don't want to see any of its tests deleted. We can make internal packages if necessary. https://codereview.appspot.com/101330049/diff/160001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/quotedprintable.go:14: package quotedprintable move this into mime/quotedprintable https://codereview.appspot.com/101330049/diff/160001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/quotedprintable.go:67: dbuf := make([]byte, MaxEncodedLen(len(src))) I'd use a sync.Pool of *bytes.Buffer instead and then return buf.String()
Sign in to reply to this message.
I think mime/headers is probably too small of a package. We can always put shared code in mime/internal/... if it comes to that, though. On Mon, Aug 25, 2014 at 5:09 AM, Alexandre Cesaro < alexandre.cesaro@gmail.com> wrote: > > On Sat, Aug 23, 2014 at 2:31 AM, <ruiu@google.com> wrote: > >> >> https://codereview.appspot.com/101330049/diff/160001/src/ >> pkg/encoding/quotedprintable/header.go >> File src/pkg/encoding/quotedprintable/header.go (right): >> >> https://codereview.appspot.com/101330049/diff/160001/src/ >> pkg/encoding/quotedprintable/header.go#newcode26 >> src/pkg/encoding/quotedprintable/header.go:26: Q Encoding = "Q" >> It's odd if B encoding is in "quotedprintable" package. >> B encoding is of course not a part of Q encoding and >> neither of them superior to the other. >> They are created equal. I personally see B encoding more >> often than Q encoding since for CJK messages they use B >> encoding. Q encoding is used when your message consists >> of mostly ASCII with a few non-ASCII characters. >> >> If I want to decode a string that I know in B encoding, >> and I had to import quotedprintable package to do that, >> I'd probably feel something's wrong. >> >> https://codereview.appspot.com/101330049/ >> > > > You are right. We discussed it earlier but nothing was settled. First a > recap: > > - quoted-printable encoding is defined in MIME RFC 2045 > - MIME header encoding is defined in RFC 2047, it includes 2 encoding > schemes: base64 and Q (which is very similar to quoted-printable) > - header and quoted-printable functions share some code (around 20 > LOC), because Q encoding is similar to quoted-printable > > > So I propose to organize the code like that: > > - quoted-printable code in package mime/quotedprintable > - RFC 2047 code in package mime or mime/headers > - shared code in mime/internal > > What do you think? >
Sign in to reply to this message.
How about moving header.go under mime as a part of package mime? On Mon, Aug 25, 2014 at 10:10 AM, Brad Fitzpatrick <bradfitz@golang.org> wrote: > I think mime/headers is probably too small of a package. > > We can always put shared code in mime/internal/... if it comes to that, > though. > > > > On Mon, Aug 25, 2014 at 5:09 AM, Alexandre Cesaro < > alexandre.cesaro@gmail.com> wrote: > >> >> On Sat, Aug 23, 2014 at 2:31 AM, <ruiu@google.com> wrote: >> >>> >>> https://codereview.appspot.com/101330049/diff/160001/src/ >>> pkg/encoding/quotedprintable/header.go >>> File src/pkg/encoding/quotedprintable/header.go (right): >>> >>> https://codereview.appspot.com/101330049/diff/160001/src/ >>> pkg/encoding/quotedprintable/header.go#newcode26 >>> src/pkg/encoding/quotedprintable/header.go:26: Q Encoding = "Q" >>> It's odd if B encoding is in "quotedprintable" package. >>> B encoding is of course not a part of Q encoding and >>> neither of them superior to the other. >>> They are created equal. I personally see B encoding more >>> often than Q encoding since for CJK messages they use B >>> encoding. Q encoding is used when your message consists >>> of mostly ASCII with a few non-ASCII characters. >>> >>> If I want to decode a string that I know in B encoding, >>> and I had to import quotedprintable package to do that, >>> I'd probably feel something's wrong. >>> >>> https://codereview.appspot.com/101330049/ >>> >> >> >> You are right. We discussed it earlier but nothing was settled. First a >> recap: >> >> - quoted-printable encoding is defined in MIME RFC 2045 >> - MIME header encoding is defined in RFC 2047, it includes 2 encoding >> schemes: base64 and Q (which is very similar to quoted-printable) >> - header and quoted-printable functions share some code (around 20 >> LOC), because Q encoding is similar to quoted-printable >> >> >> So I propose to organize the code like that: >> >> - quoted-printable code in package mime/quotedprintable >> - RFC 2047 code in package mime or mime/headers >> - shared code in mime/internal >> >> What do you think? >> > >
Sign in to reply to this message.
That sounds fine. On Mon, Aug 25, 2014 at 10:35 AM, Rui Ueyama <ruiu@google.com> wrote: > How about moving header.go under mime as a part of package mime? > > > On Mon, Aug 25, 2014 at 10:10 AM, Brad Fitzpatrick <bradfitz@golang.org> > wrote: > >> I think mime/headers is probably too small of a package. >> >> We can always put shared code in mime/internal/... if it comes to that, >> though. >> >> >> >> On Mon, Aug 25, 2014 at 5:09 AM, Alexandre Cesaro < >> alexandre.cesaro@gmail.com> wrote: >> >>> >>> On Sat, Aug 23, 2014 at 2:31 AM, <ruiu@google.com> wrote: >>> >>>> >>>> https://codereview.appspot.com/101330049/diff/160001/src/ >>>> pkg/encoding/quotedprintable/header.go >>>> File src/pkg/encoding/quotedprintable/header.go (right): >>>> >>>> https://codereview.appspot.com/101330049/diff/160001/src/ >>>> pkg/encoding/quotedprintable/header.go#newcode26 >>>> src/pkg/encoding/quotedprintable/header.go:26: Q Encoding = "Q" >>>> It's odd if B encoding is in "quotedprintable" package. >>>> B encoding is of course not a part of Q encoding and >>>> neither of them superior to the other. >>>> They are created equal. I personally see B encoding more >>>> often than Q encoding since for CJK messages they use B >>>> encoding. Q encoding is used when your message consists >>>> of mostly ASCII with a few non-ASCII characters. >>>> >>>> If I want to decode a string that I know in B encoding, >>>> and I had to import quotedprintable package to do that, >>>> I'd probably feel something's wrong. >>>> >>>> https://codereview.appspot.com/101330049/ >>>> >>> >>> >>> You are right. We discussed it earlier but nothing was settled. First a >>> recap: >>> >>> - quoted-printable encoding is defined in MIME RFC 2045 >>> - MIME header encoding is defined in RFC 2047, it includes 2 >>> encoding schemes: base64 and Q (which is very similar to quoted-printable) >>> - header and quoted-printable functions share some code (around 20 >>> LOC), because Q encoding is similar to quoted-printable >>> >>> >>> So I propose to organize the code like that: >>> >>> - quoted-printable code in package mime/quotedprintable >>> - RFC 2047 code in package mime or mime/headers >>> - shared code in mime/internal >>> >>> What do you think? >>> >> >> >
Sign in to reply to this message.
Ok, so: - quoted-printable code in package mime/quotedprintable - RFC 2047 code in header.go in package mime - shared code in mime/internal I'll do it next week tor the following week. Brad, the Decode function in quotedprintable.go would be much cleaner and simpler if the white spaces at the end of lines were not trimmed. I did it because it was done in the your code but it is not defined in the spec. For example we currently have this test: {in: "foo \n", want: "foo\n"}, Whereas qprint outputs "foo =20\n". Is there a reason white spaces at the end of lines are trimmed or can I remove that part of the code?
Sign in to reply to this message.
The tree closes for non-bugfix changes September 1st. If it's not in by then, it'll have to wait 3 months. On Tue, Aug 26, 2014 at 10:31 AM, Alexandre Cesaro < alexandre.cesaro@gmail.com> wrote: > Ok, so: > > - quoted-printable code in package mime/quotedprintable > - RFC 2047 code in header.go in package mime > - shared code in mime/internal > > I'll do it next week tor the following week. > > Brad, the Decode function in quotedprintable.go would be much cleaner and > simpler if the white spaces at the end of lines were not trimmed. > I did it because it was done in the your code but it is not defined in the > spec. > > For example we currently have this test: > {in: "foo \n", want: "foo\n"}, > > Whereas qprint outputs "foo =20\n". > > Is there a reason white spaces at the end of lines are trimmed or can I > remove that part of the code? >
Sign in to reply to this message.
Ok I will try to do it tomorrow. On Tue, Aug 26, 2014 at 7:35 PM, Brad Fitzpatrick <bradfitz@golang.org> wrote: > The tree closes for non-bugfix changes September 1st. If it's not in by > then, it'll have to wait 3 months. > > > > On Tue, Aug 26, 2014 at 10:31 AM, Alexandre Cesaro < > alexandre.cesaro@gmail.com> wrote: > >> Ok, so: >> >> - quoted-printable code in package mime/quotedprintable >> - RFC 2047 code in header.go in package mime >> - shared code in mime/internal >> >> I'll do it next week tor the following week. >> >> Brad, the Decode function in quotedprintable.go would be much cleaner and >> simpler if the white spaces at the end of lines were not trimmed. >> I did it because it was done in the your code but it is not defined in >> the spec. >> >> For example we currently have this test: >> {in: "foo \n", want: "foo\n"}, >> >> Whereas qprint outputs "foo =20\n". >> >> Is there a reason white spaces at the end of lines are trimmed or can I >> remove that part of the code? >> > >
Sign in to reply to this message.
On Tue, Aug 26, 2014 at 10:31 AM, Alexandre Cesaro < alexandre.cesaro@gmail.com> wrote: > Ok, so: > > - quoted-printable code in package mime/quotedprintable > - RFC 2047 code in header.go in package mime > - shared code in mime/internal > > Sounds good to me, although I'm not sure we will have so much shared code. I'd assume that mime would import quotedprintable, but is there anything in reverse? > > - I'll do it next week tor the following week. > > > Brad, the Decode function in quotedprintable.go would be much cleaner and > simpler if the white spaces at the end of lines were not trimmed. > http://tools.ietf.org/html/rfc2045 says that decoder needs to trim the trailing whitespace (p. 19). Therefore, when decoding a Quoted-Printable body, any trailing white space on a line must be deleted, as it will necessarily have been added by intermediate transport agents. I did it because it was done in the your code but it is not defined in the > spec. > > For example we currently have this test: > {in: "foo \n", want: "foo\n"}, > > Whereas qprint outputs "foo =20\n". > > Is there a reason white spaces at the end of lines are trimmed or can I > remove that part of the code? >
Sign in to reply to this message.
https://codereview.appspot.com/101330049/diff/160001/src/pkg/encoding/quotedp... File src/pkg/encoding/quotedprintable/header.go (right): https://codereview.appspot.com/101330049/diff/160001/src/pkg/encoding/quotedp... src/pkg/encoding/quotedprintable/header.go:64: return &HeaderEncoder{charset, enc, splitWords}, nil On 2014/08/25 17:08:34, bradfitz wrote: > should this charset be strings.ToUpper too? all the examples in RFC 2047 seem > to show uppercase, even though it's defined to be case-insensitive. There are some examples of lower-case charset too: http://tools.ietf.org/html/rfc2047#page-5 I do not see the benefits of enforcing upper case here.
Sign in to reply to this message.
I did the changes. I had to update build/deps_test.go. That is why I removed the regexp dependency in header.go Since I'm not familiar with that file please tell me if I did something wrong. I updated the code in packages mime/multipart and net/mail to use the new packages. In package net/mail I could have used the mime.needsEncoding function in Address.String(). I am wondering if we should export needsEncoding (mime.IsEncodingNeeded?) and remove needsEncoding() in mime.HeaderEncoder.Encode. But then it would be a bit more complicated for the user to encode a field like Subject. Maybe we can export needsEncoding without removing it from mime.HeaderEncoder.Encode? I am not sure what to do. What do you think? I am leaving on holidays and will be back on Sunday. On Wed, Aug 27, 2014 at 11:55 AM, <alexandre.cesaro@gmail.com> wrote: > > https://codereview.appspot.com/101330049/diff/160001/src/ > pkg/encoding/quotedprintable/header.go > File src/pkg/encoding/quotedprintable/header.go (right): > > https://codereview.appspot.com/101330049/diff/160001/src/ > pkg/encoding/quotedprintable/header.go#newcode64 > src/pkg/encoding/quotedprintable/header.go:64: return > &HeaderEncoder{charset, enc, splitWords}, nil > On 2014/08/25 17:08:34, bradfitz wrote: > >> should this charset be strings.ToUpper too? all the examples in RFC >> > 2047 seem > >> to show uppercase, even though it's defined to be case-insensitive. >> > > There are some examples of lower-case charset too: > http://tools.ietf.org/html/rfc2047#page-5 > > I do not see the benefits of enforcing upper case here. > > https://codereview.appspot.com/101330049/ >
Sign in to reply to this message.
https://codereview.appspot.com/101330049/diff/240001/src/pkg/mime/header.go File src/pkg/mime/header.go (right): https://codereview.appspot.com/101330049/diff/240001/src/pkg/mime/header.go#n... src/pkg/mime/header.go:33: type HeaderEncoder struct { Is "header word" a correct term to represent this encoding? I found "encoded-word" but cannot find "header word" in the RFC. https://codereview.appspot.com/101330049/diff/240001/src/pkg/mime/header.go#n... src/pkg/mime/header.go:42: var StdHeaderEncoder = Q.NewHeaderEncoder("UTF-8") Is there any reason to call it the standard? As far as I know MIME does not give priority to UTF-8 than the other encodings, so Std sounds misleading. I'd probably call it UTF8QEncoder or something (sorry I don't have a good suggestion on naming here), or just remove it. https://codereview.appspot.com/101330049/diff/240001/src/pkg/mime/header.go#n... src/pkg/mime/header.go:129: runeSize = getRuneSize(s, i) I'm not comfortable with getRuneSize function. I'd instead write a function that takes at most maxLen bytes from a given string and returns two strings, prefix and rest. And then call base64.StdEncoding.EncodeToString repeatedly to prefixes. https://codereview.appspot.com/101330049/diff/240001/src/pkg/mime/header.go#n... src/pkg/mime/header.go:195: for i+runeSize < len(s) && !utf8.RuneStart(s[i+runeSize]) { I doubt this would correctly handle an invalid UTF-8 sequence that needs to be encoded as U+FFFD. https://codereview.appspot.com/101330049/diff/240001/src/pkg/mime/header.go#n... src/pkg/mime/header.go:216: internal.EncodeByte(enc[:3], b) Can you remove "[:3]"? https://codereview.appspot.com/101330049/diff/240001/src/pkg/mime/header.go#n... src/pkg/mime/header.go:222: // header. This function does not do any charset conversion, the returned text It's probably nice to avoid word "charset" here, because MIME standard misused it. When they say charset, it actually means encoding. If it really means encoding (it is), I'd prefer encoding over charset. https://codereview.appspot.com/101330049/diff/240001/src/pkg/mime/header.go#n... src/pkg/mime/header.go:225: // decoding headers with multiple encoded-words using different charsets. What's the reason to avoid converting encoding in this function? If you don't do that in this function, the user has to do that every time he/she calls DecodeHeader, that does not look nice to me. https://codereview.appspot.com/101330049/diff/240001/src/pkg/mime/header.go#n... src/pkg/mime/header.go:257: return "", "", fmt.Errorf("quotedprintable: multiple charsets in header are not supported: %q and %q used", charset, wordCharset) If you convert encoding in this function, you can handle this case too. https://codereview.appspot.com/101330049/diff/240001/src/pkg/mime/quotedprint... File src/pkg/mime/quotedprintable/quotedprintable.go (right): https://codereview.appspot.com/101330049/diff/240001/src/pkg/mime/quotedprint... src/pkg/mime/quotedprintable/quotedprintable.go:1: // Copyright 2014 The Go Authors. All rights reserved. Now this file is not related to mime, you can move it to src/pkg/encoding/quotedprintable. https://codereview.appspot.com/101330049/diff/240001/src/pkg/net/mail/message.go File src/pkg/net/mail/message.go (right): https://codereview.appspot.com/101330049/diff/240001/src/pkg/net/mail/message... src/pkg/net/mail/message.go:171: name = `"` + a.Name + `"` nit: in order to avoid allocating two strings here and line 176, I'd return `"`+a.Name+`" `+address here. https://codereview.appspot.com/101330049/diff/240001/src/pkg/net/mail/message... src/pkg/net/mail/message.go:173: name = mime.StdHeaderEncoder.Encode(a.Name) and return mime.StdHeaderEncoder.Encode(a.Name) + address here.
Sign in to reply to this message.
Here are some comments. I am on my phone, sorry if my messages are short. https://codereview.appspot.com/101330049/diff/240001/src/pkg/mime/header.go File src/pkg/mime/header.go (right): https://codereview.appspot.com/101330049/diff/240001/src/pkg/mime/header.go#n... src/pkg/mime/header.go:42: var StdHeaderEncoder = Q.NewHeaderEncoder("UTF-8") On 2014/08/28 04:02:15, ruiu wrote: > Is there any reason to call it the standard? As far as I know MIME does not give > priority to UTF-8 than the other encodings, so Std sounds misleading. I'd > probably call it UTF8QEncoder or something (sorry I don't have a good suggestion > on naming here), or just remove it. Ok to remove it. https://codereview.appspot.com/101330049/diff/240001/src/pkg/mime/header.go#n... src/pkg/mime/header.go:129: runeSize = getRuneSize(s, i) On 2014/08/28 04:02:15, ruiu wrote: > I'm not comfortable with getRuneSize function. I'd instead write a function that > takes at most maxLen bytes from a given string and returns two strings, prefix > and rest. And then call base64.StdEncoding.EncodeToString repeatedly to > prefixes. base64.StdEncoding.EncodeToString is not adapted to repeated blocks on small blocks. https://codereview.appspot.com/101330049/diff/240001/src/pkg/mime/header.go#n... src/pkg/mime/header.go:195: for i+runeSize < len(s) && !utf8.RuneStart(s[i+runeSize]) { On 2014/08/28 04:02:15, ruiu wrote: > I doubt this would correctly handle an invalid UTF-8 sequence that needs to be > encoded as U+FFFD. GetRuneSize is just used to split words. If it is an invalid UTF-8 sequence the word will not be split. It is the expected behavior. https://codereview.appspot.com/101330049/diff/240001/src/pkg/mime/header.go#n... src/pkg/mime/header.go:225: // decoding headers with multiple encoded-words using different charsets. On 2014/08/28 04:02:16, ruiu wrote: > What's the reason to avoid converting encoding in this function? If you don't do > that in this function, the user has to do that every time he/she calls > DecodeHeader, that does not look nice to me. If we add the code to convert every single charset, it will bloat binaries. Users don't always need the charset converting code. I guess that most of the time this package will be used to send emails (where charset converting is not needed) rather than to decode emails. Users that need decoding will use a package like go-charset. That's why I think it is more simple not to convert charsets. It will also prevent constantly having requests to support more charsets.
Sign in to reply to this message.
I did the changes. https://codereview.appspot.com/101330049/diff/240001/src/pkg/mime/quotedprint... File src/pkg/mime/quotedprintable/quotedprintable.go (right): https://codereview.appspot.com/101330049/diff/240001/src/pkg/mime/quotedprint... src/pkg/mime/quotedprintable/quotedprintable.go:1: // Copyright 2014 The Go Authors. All rights reserved. On 2014/08/28 04:02:16, ruiu wrote: > Now this file is not related to mime, you can move it to > src/pkg/encoding/quotedprintable. I thought we had settled on the following organization: - quoted-printable code in package mime/quotedprintable - RFC 2047 code in header.go in package mime - shared code in mime/internal Quoted-printable encoding is defined in the MIME specification. It also shares some code with mime/header.go: mime/internal/quotedprintable.go That is why I think it should stay in the mime package.
Sign in to reply to this message.
Message was sent while issue was closed.
I closed this CL an opened a new one because of a bug due to the move of all packages from src to src/pkg: https://codereview.appspot.com/132680044/
Sign in to reply to this message.
Unfortunately this was too late and combined with my vacation immediately after dotGo travels, pushed it past the deadline. Let's pick this up in early December so it has time to bake during the three-month Go 1.5 dev cycle. On Mon, Sep 1, 2014 at 9:31 AM, <alexandre.cesaro@gmail.com> wrote: > I did the changes. > > > https://codereview.appspot.com/101330049/diff/240001/src/ > pkg/mime/quotedprintable/quotedprintable.go > File src/pkg/mime/quotedprintable/quotedprintable.go (right): > > https://codereview.appspot.com/101330049/diff/240001/src/ > pkg/mime/quotedprintable/quotedprintable.go#newcode1 > src/pkg/mime/quotedprintable/quotedprintable.go:1: // Copyright 2014 The > Go Authors. All rights reserved. > On 2014/08/28 04:02:16, ruiu wrote: > >> Now this file is not related to mime, you can move it to >> src/pkg/encoding/quotedprintable. >> > > I thought we had settled on the following organization: > - quoted-printable code in package mime/quotedprintable > - RFC 2047 code in header.go in package mime > - shared code in mime/internal > > Quoted-printable encoding is defined in the MIME specification. It also > shares some code with mime/header.go: mime/internal/quotedprintable.go > That is why I think it should stay in the mime package. > > https://codereview.appspot.com/101330049/ >
Sign in to reply to this message.
We now are early december and the Go 1.5 dev cycle is beginning so it is time to get back at this CL. As I said earlier I closed this CL due to the big package move and opened a new one: https://codereview.appspot.com/132680044/ On Wed, Nov 12, 2014 at 7:13 PM, Brad Fitzpatrick <bradfitz@golang.org> wrote: > Unfortunately this was too late and combined with my vacation immediately > after dotGo travels, pushed it past the deadline. > > Let's pick this up in early December so it has time to bake during the > three-month Go 1.5 dev cycle. > > > On Mon, Sep 1, 2014 at 9:31 AM, <alexandre.cesaro@gmail.com> wrote: > >> I did the changes. >> >> >> https://codereview.appspot.com/101330049/diff/240001/src/ >> pkg/mime/quotedprintable/quotedprintable.go >> File src/pkg/mime/quotedprintable/quotedprintable.go (right): >> >> https://codereview.appspot.com/101330049/diff/240001/src/ >> pkg/mime/quotedprintable/quotedprintable.go#newcode1 >> src/pkg/mime/quotedprintable/quotedprintable.go:1: // Copyright 2014 The >> Go Authors. All rights reserved. >> On 2014/08/28 04:02:16, ruiu wrote: >> >>> Now this file is not related to mime, you can move it to >>> src/pkg/encoding/quotedprintable. >>> >> >> I thought we had settled on the following organization: >> - quoted-printable code in package mime/quotedprintable >> - RFC 2047 code in header.go in package mime >> - shared code in mime/internal >> >> Quoted-printable encoding is defined in the MIME specification. It also >> shares some code with mime/header.go: mime/internal/quotedprintable.go >> That is why I think it should stay in the mime package. >> >> https://codereview.appspot.com/101330049/ >> > >
Sign in to reply to this message.
The tree is now open. See https://groups.google.com/forum/#!topic/golang-dev/otCULnOjs7I for details on how to re-send this using our new git-based process. On Wed, Dec 3, 2014 at 9:47 PM, Alexandre Cesaro <alexandre.cesaro@gmail.com > wrote: > > We now are early december and the Go 1.5 dev cycle is beginning so it is > time to get back at this CL. > > As I said earlier I closed this CL due to the big package move and opened > a new one: https://codereview.appspot.com/132680044/ > > > On Wed, Nov 12, 2014 at 7:13 PM, Brad Fitzpatrick <bradfitz@golang.org> > wrote: > >> Unfortunately this was too late and combined with my vacation immediately >> after dotGo travels, pushed it past the deadline. >> >> Let's pick this up in early December so it has time to bake during the >> three-month Go 1.5 dev cycle. >> >> >> On Mon, Sep 1, 2014 at 9:31 AM, <alexandre.cesaro@gmail.com> wrote: >> >>> I did the changes. >>> >>> >>> https://codereview.appspot.com/101330049/diff/240001/src/ >>> pkg/mime/quotedprintable/quotedprintable.go >>> File src/pkg/mime/quotedprintable/quotedprintable.go (right): >>> >>> https://codereview.appspot.com/101330049/diff/240001/src/ >>> pkg/mime/quotedprintable/quotedprintable.go#newcode1 >>> src/pkg/mime/quotedprintable/quotedprintable.go:1: // Copyright 2014 The >>> Go Authors. All rights reserved. >>> On 2014/08/28 04:02:16, ruiu wrote: >>> >>>> Now this file is not related to mime, you can move it to >>>> src/pkg/encoding/quotedprintable. >>>> >>> >>> I thought we had settled on the following organization: >>> - quoted-printable code in package mime/quotedprintable >>> - RFC 2047 code in header.go in package mime >>> - shared code in mime/internal >>> >>> Quoted-printable encoding is defined in the MIME specification. It also >>> shares some code with mime/header.go: mime/internal/quotedprintable.go >>> That is why I think it should stay in the mime package. >>> >>> https://codereview.appspot.com/101330049/ >>> >> >> >
Sign in to reply to this message.
|