|
|
Created:
4 years, 1 month ago by hanwenn Modified:
4 years ago CC:
lilypond-devel_gnu.org Visibility:
Public. |
Descriptionoutput-distance: write HTML as UTF-8
This should avoid encoding errors on writing data
Patch Set 1 #
Total comments: 4
Patch Set 2 : comments #MessagesTotal messages: 16
Is this likely related to the problems in `make check` that James currently experiences?
Sign in to reply to this message.
On 2020/04/03 22:00:02, dak wrote: > Is this likely related to the problems in `make check` that James currently > experiences? Yes. Unfortunately, the default encoding depends on the environment " In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the " this means that -depending on locale settings- you may get ascii or utf-8 encoding. I didn't get a problem at first, but if I set encoding='ascii' in the open_write_file definition, I also get encoding errors.
Sign in to reply to this message.
https://codereview.appspot.com/563810043/diff/545810044/scripts/build/output-... File scripts/build/output-distance.py (right): https://codereview.appspot.com/563810043/diff/545810044/scripts/build/output-... scripts/build/output-distance.py:1328: system (u'echo HEAD is 人人的乐谱软件 > dir1/tree.gittxt') This seems sort of weird. Forgotten test code?
Sign in to reply to this message.
On 2020/04/03 22:15:06, hanwenn wrote: > On 2020/04/03 22:00:02, dak wrote: > > Is this likely related to the problems in `make check` that James currently > > experiences? > > Yes. > > Unfortunately, the default encoding depends on the environment > > " > In text mode, if encoding is not specified the encoding used is platform > dependent: locale.getpreferredencoding(False) is called to get the > > " > > this means that -depending on locale settings- you may get ascii or utf-8 > encoding. > > I didn't get a problem at first, but if I set encoding='ascii' in the > open_write_file definition, I also get encoding errors. It's even more weird than that, Python changed its default in version 3.7. See also one of my commit messages from January: commit e0c78a4c710c51e1ea87d2b144c0ae713923a2af Author: Jonas Hahnfeld <hahnjo@hahnjo.de> Date: Wed Jan 15 16:39:56 2020 +0100 Issue 5663/1: Use codecs.open() to decode as utf-8 This is in preparation for Python 3.5 where the default encoding depends on the value of the LANG environment variable. As far as I can tell, this was changed later on and at least Python 3.7 and version 3.8 always default to 'utf-8' on Linux. As I'm proposing to make Python 3.5 the required minimum, we can't rely on this and need to force 'utf-8' when reading files that could contain Unicode. So likely James is using Python 3.5 or 3.6, that's why some of us (with other versions of Python) are not seeing the issue. As such: LGTM! Please note that codecs.open() is not needed anymore in Python 3, it was only needed for compatibility with Python 2.4. We should likely replace all occurrences with plain open() as this patch does.
Sign in to reply to this message.
To add another data point: I'm able to reproduce the failure with a self-compiled Python 3.5.9 that I have laying around for testing. The reason is that the latest commit in master has a non-ASCII character in the first line of its commit message [1]. Applying the change to open_write_file solves the issue for me. I'm for pushing this immediately to unblock testing. 1: As a side note, this means there's a very funny workaround: Just push *any* commit to master with a pure ASCII description and you'll not see the failure anymore :D https://codereview.appspot.com/563810043/diff/545810044/scripts/build/output-... File scripts/build/output-distance.py (right): https://codereview.appspot.com/563810043/diff/545810044/scripts/build/output-... scripts/build/output-distance.py:1305: return open (x, 'w', encoding="utf-8") Please change to single quotes to at least conform with the style of this function. We're doing a very bad job at this; we should likely decide on one style and change it everywhere...
Sign in to reply to this message.
On 4/4/20, jonas.hahnfeld@gmail.com <jonas.hahnfeld@gmail.com> wrote: > The reason is that the latest commit in master has a non-ASCII > character in the first line of its commit message [1]. So IIUC it *was* f5f907599ce88d3d610483fa42fa78be12f53d2e in the end… but just because of a char in the commit message!? Man, how fucked up is that? Shall we make it a policy that git messages ought not to include non-ASCII chars? (Or conversely, since we may not have caught the bug otherwise, that git messages preferably *should* include special chars once in a while in case it allows to verify that it doesn’t trigger any weird behavior?) Cheers, V.
Sign in to reply to this message.
On 2020/04/04 07:16:40, hahnjo wrote: > On 2020/04/03 22:15:06, hanwenn wrote: > > On 2020/04/03 22:00:02, dak wrote: > > > Is this likely related to the problems in `make check` that James currently > > > experiences? > > > > Yes. > > > > Unfortunately, the default encoding depends on the environment > > > > " > > In text mode, if encoding is not specified the encoding used is platform > > dependent: locale.getpreferredencoding(False) is called to get the > > > > " > > > > this means that -depending on locale settings- you may get ascii or utf-8 > > encoding. > > > > I didn't get a problem at first, but if I set encoding='ascii' in the > > open_write_file definition, I also get encoding errors. > > It's even more weird than that, Python changed its default in version 3.7. See > also one of my commit messages from January: > > commit e0c78a4c710c51e1ea87d2b144c0ae713923a2af > Author: Jonas Hahnfeld <mailto:hahnjo@hahnjo.de> > Date: Wed Jan 15 16:39:56 2020 +0100 > > Issue 5663/1: Use codecs.open() to decode as utf-8 > > This is in preparation for Python 3.5 where the default encoding > depends on the value of the LANG environment variable. As far as > I can tell, this was changed later on and at least Python 3.7 and > version 3.8 always default to 'utf-8' on Linux. As I'm proposing to > make Python 3.5 the required minimum, we can't rely on this and need > to force 'utf-8' when reading files that could contain Unicode. > > So likely James is using Python 3.5 or 3.6, that's why some of us (with other > versions of Python) are not seeing the issue. > > As such: LGTM! Please note that codecs.open() is not needed anymore in Python 3, > it was only needed for compatibility with Python 2.4. We should likely replace > all occurrences with plain open() as this patch does. Actually I am using Python 2.7. I don't as far as I can tell have any python 3.x on my system. James
Sign in to reply to this message.
On 2020/04/04 09:04:50, pkx166h-lilypond wrote: > Actually I am using Python 2.7. I don't as far as I can tell have any python 3.x > on my system. > > James That's impossible ;-) it's required in master and you tested the patch switching to Python 3, so I'm fairly certain you have it. The executable may be called 'python3' (with 'python' being 2.7), but configure will find it.
Sign in to reply to this message.
On 2020/04/04 09:02:01, valentin_villenave.net wrote: > On 4/4/20, mailto:jonas.hahnfeld@gmail.com <mailto:jonas.hahnfeld@gmail.com> wrote: > > The reason is that the latest commit in master has a non-ASCII > > character in the first line of its commit message [1]. > > So IIUC it *was* f5f907599ce88d3d610483fa42fa78be12f53d2e in the end… > but just because of a char in the commit message!? Man, how fucked up > is that? > > Shall we make it a policy that git messages ought not to include > non-ASCII chars? (Or conversely, since we may not have caught the bug > otherwise, that git messages preferably *should* include special chars > once in a while in case it allows to verify that it doesn’t trigger > any weird behavior?) I don't think there's anything we can cover with a particular policy. IMHO it's just a very weird coincidence of issues from switching to Python 3.
Sign in to reply to this message.
On 2020/04/04 09:07:08, hahnjo wrote: > On 2020/04/04 09:04:50, pkx166h-lilypond wrote: > > Actually I am using Python 2.7. I don't as far as I can tell have any python > 3.x > > on my system. > > > > James > > That's impossible ;-) it's required in master and you tested the patch switching > to Python 3, so I'm fairly certain you have it. The executable may be called > 'python3' (with 'python' being 2.7), but configure will find it. the command 'python --version' gives me 2.7.17 but as you say, configure gives this ... checking for python... python3 checking python3 version... 3.6.9 checking for python3... /usr/bin/python3 ... Thanks James
Sign in to reply to this message.
Valentin Villenave <valentin@villenave.net> writes: > On 4/4/20, jonas.hahnfeld@gmail.com <jonas.hahnfeld@gmail.com> wrote: >> The reason is that the latest commit in master has a non-ASCII >> character in the first line of its commit message [1]. > > So IIUC it *was* f5f907599ce88d3d610483fa42fa78be12f53d2e in the end… > but just because of a char in the commit message!? Man, how fucked up > is that? > > Shall we make it a policy that git messages ought not to include > non-ASCII chars? (Or conversely, since we may not have caught the bug > otherwise, that git messages preferably *should* include special chars > once in a while in case it allows to verify that it doesn’t trigger > any weird behavior?) I think the names of committers will sometimes make sure of that. -- David Kastrup
Sign in to reply to this message.
comments
Sign in to reply to this message.
I will run this through "make check" and push to staging now. https://codereview.appspot.com/563810043/diff/545810044/scripts/build/output-... File scripts/build/output-distance.py (right): https://codereview.appspot.com/563810043/diff/545810044/scripts/build/output-... scripts/build/output-distance.py:1305: return open (x, 'w', encoding="utf-8") On 2020/04/04 07:53:24, hahnjo wrote: > Please change to single quotes to at least conform with the style of this > function. We're doing a very bad job at this; we should likely decide on one > style and change it everywhere... Done. https://codereview.appspot.com/563810043/diff/545810044/scripts/build/output-... scripts/build/output-distance.py:1328: system (u'echo HEAD is 人人的乐谱软件 > dir1/tree.gittxt') On 2020/04/03 23:13:57, dak wrote: > This seems sort of weird. Forgotten test code? on purpose. Added comment.
Sign in to reply to this message.
On 2020/04/04 07:53:24, hahnjo wrote: > 1: As a side note, this means there's a very funny workaround: Just push *any* > commit to master with a pure ASCII description and you'll not see the failure > anymore :D In the distance, a brass band plays "The Star-Spangled Banner."
Sign in to reply to this message.
On Sat, Apr 4, 2020 at 1:43 PM <hanwenn@gmail.com> wrote: > > I will run this through "make check" and push to staging now. > Done. -- Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen
Sign in to reply to this message.
commit 146b73cdd061c264b294f58c6bf7f86667505da5 Author: Han-Wen Nienhuys <hanwen@lilypond.org> Date: Fri Apr 3 23:32:38 2020 +0200
Sign in to reply to this message.
|