|
|
DescriptionAllow digits in identifiers
Set lexer state to INITIAL for top-level expressions,
switching to 'notes' mode inside music-expressions
Patch Set 1 : version allowing numbers at ends #
Total comments: 21
Patch Set 2 : version with digits, but not at the end #
Total comments: 1
Patch Set 3 : Use a plus character to introduce digits #Patch Set 4 : Use a dot character to introduce digits #
MessagesTotal messages: 16
All in all, a large step backwards for making the lexer behave predictably across modes regarding word and command syntax, connected with several timing problems when parsing. It looks like some _severe_ doctoring around with regard to notenames was done to make regtests pass without an actual understanding of the failure modes, introducing some half-baked in-between modes that don't have a purpose apart from papering over the fact that this patch causes the lexer to be in the wrong mode due to parser lookahead at several points of time. Regtests that showed themselves to be impervious to this papering over got some gratuitous braces added until passing. This is more a demonstration how to game our automated patch submission and regtest evaluation system than a serious proposal. http://codereview.appspot.com/6493072/diff/14/input/regression/page-spacing-n... File input/regression/page-spacing-nonstaff-lines-independent.ly (left): http://codereview.appspot.com/6493072/diff/14/input/regression/page-spacing-n... input/regression/page-spacing-nonstaff-lines-independent.ly:11: \addlyrics { high \skip2 } Clear case of "existing use". In previous syntax discussions, there was some agreement on formatting durations to follow directly their note, like c4. \skip2 would be consistent with that. It is also not clear why c5 should not be a valid identifier when ce5 is. http://codereview.appspot.com/6493072/diff/14/input/regression/phrasing-slur-... File input/regression/phrasing-slur-multiple.ly (left): http://codereview.appspot.com/6493072/diff/14/input/regression/phrasing-slur-... input/regression/phrasing-slur-multiple.ly:21: c4\(\(\sp1\( d4\)\(\sp1\( e4\) f\) | This particular way of calling things is probably not important enough to preserve, but \sp1 is basically a modifier on the following \( and the spacier formatting proposed in the change does not have quite the same expressive power. http://codereview.appspot.com/6493072/diff/14/input/regression/ragged-bottom-... File input/regression/ragged-bottom-one-page.ly (right): http://codereview.appspot.com/6493072/diff/14/input/regression/ragged-bottom-... input/regression/ragged-bottom-one-page.ly:13: \repeat unfold 16 { c'4 } Why are the braces needed here? http://codereview.appspot.com/6493072/diff/14/input/regression/skiptypesettin... File input/regression/skiptypesetting-show-first-and-last.ly (right): http://codereview.appspot.com/6493072/diff/14/input/regression/skiptypesettin... input/regression/skiptypesetting-show-first-and-last.ly:11: showLastLength = { R1*2 } And here? http://codereview.appspot.com/6493072/diff/14/lily/include/lily-lexer.hh File lily/include/lily-lexer.hh (right): http://codereview.appspot.com/6493072/diff/14/lily/include/lily-lexer.hh#newc... lily/include/lily-lexer.hh:101: void push_maybe_note_state (); Uh oh... "maybe_note_state" sounds like a real can of worms. http://codereview.appspot.com/6493072/diff/14/lily/lexer.ll File lily/lexer.ll (left): http://codereview.appspot.com/6493072/diff/14/lily/lexer.ll#oldcode397 lily/lexer.ll:397: <chords,notes,figures>{RESTNAME}/[-_] | // pseudo backup rule Did you check that r-. does still work as intended when removing this rule? http://codereview.appspot.com/6493072/diff/14/lily/lexer.ll File lily/lexer.ll (right): http://codereview.appspot.com/6493072/diff/14/lily/lexer.ll#newcode34 lily/lexer.ll:34: flex -b <this lexer file> Did you do this? http://codereview.appspot.com/6493072/diff/14/lily/lexer.ll#newcode476 lily/lexer.ll:476: {A}+ { So words no longer correspond to commands regarding their syntax in note mode? Doesn't that mean that things like \layout { \tempo "Allegro" some-variable = 17 } stop working again? http://codereview.appspot.com/6493072/diff/14/lily/lexer.ll#newcode859 lily/lexer.ll:859: push_note_state (nn); What is this supposed to do? INITIAL state is not supposed to have pitchnames defined. http://codereview.appspot.com/6493072/diff/14/lily/parser.yy File lily/parser.yy (right): http://codereview.appspot.com/6493072/diff/14/lily/parser.yy#newcode1190 lily/parser.yy:1190: '{' { parser->lexer_->push_maybe_note_state (); } music_list '}' Pushing a separate "maybe_note_ state for _every_ braced music list? Seriously? http://codereview.appspot.com/6493072/diff/14/lily/parser.yy#newcode1834 lily/parser.yy:1834: } function_arglist { This spells the death knell for having functions decide on their syntactic role based on their _return_ value when every music function needs to get parsed in a special state. http://codereview.appspot.com/6493072/diff/14/lily/parser.yy#newcode1837 lily/parser.yy:1837: parser->lexer_->pop_state (); You are aware that in general a music function call boundary will be determined by looking at a lookahead token? That means that the lookahead token will, in general, have been parsed in the wrong lexer state. Since your changes introduce gratuitious changes between word syntax based on lexer state, this will generally cause trouble. Probably the reason you had to introduce braces in some regtests. http://codereview.appspot.com/6493072/diff/14/lily/parser.yy#newcode2218 lily/parser.yy:2218: parser->lexer_->pop_state (); Again, likely too late. You should pop state before ANGLE_CLOSE to be on the safe side, though in this particular case the parser might well switch state without the need of a lookahead token.
Sign in to reply to this message.
http://codereview.appspot.com/6493072/diff/14/input/regression/page-spacing-n... File input/regression/page-spacing-nonstaff-lines-independent.ly (left): http://codereview.appspot.com/6493072/diff/14/input/regression/page-spacing-n... input/regression/page-spacing-nonstaff-lines-independent.ly:11: \addlyrics { high \skip2 } On 2012/09/02 11:52:46, dak wrote: > It is also not clear why c5 should not be > a valid identifier when ce5 is. Both /can/ be valid identifiers, if we want. c4 = {...} at top-level defines a variable, then inside any braces for music-entry: c4 is a quarter note on do. \c4 references the variable. More simple than today, when I may use 'recs' as an identifier name, unless I speak Spanish. http://codereview.appspot.com/6493072/diff/14/input/regression/ragged-bottom-... File input/regression/ragged-bottom-one-page.ly (right): http://codereview.appspot.com/6493072/diff/14/input/regression/ragged-bottom-... input/regression/ragged-bottom-one-page.ly:13: \repeat unfold 16 { c'4 } On 2012/09/02 11:52:46, dak wrote: > Why are the braces needed here? Maybe not strictly needed here, but showing the concept of always putting music-with-durations inside at least one level of bracing. http://codereview.appspot.com/6493072/diff/14/lily/lexer.ll File lily/lexer.ll (left): http://codereview.appspot.com/6493072/diff/14/lily/lexer.ll#oldcode397 lily/lexer.ll:397: <chords,notes,figures>{RESTNAME}/[-_] | // pseudo backup rule On 2012/09/02 11:52:46, dak wrote: > Did you check that r-. does still work as intended when removing this rule? The syntax never allowed new identifier definitions in sequential/simultaneous music. \relative c' { new-variable-name = { c d e f } } % never valid So there lexer need not look ahead for un-escaped WORDs while scanning note and rest entries. http://codereview.appspot.com/6493072/diff/14/lily/lexer.ll File lily/lexer.ll (right): http://codereview.appspot.com/6493072/diff/14/lily/lexer.ll#newcode34 lily/lexer.ll:34: flex -b <this lexer file> On 2012/09/02 11:52:46, dak wrote: > Did you do this? A long while ago; I haven't checked it lately, http://codereview.appspot.com/6493072/diff/14/lily/lexer.ll#newcode476 lily/lexer.ll:476: {A}+ { On 2012/09/02 11:52:46, dak wrote: > So words no longer correspond to commands regarding their syntax in note mode? COMMAND is still \WORD so they strictly correspond. Only COMMAND is a token in note-mode, not WORD, because we are not defining new identifiers here. Previously, WORD was matching note-names, but that was wrong because in { c d e_sharp f } 'e_sharp' is a valid WORD, but not a valid note-name. http://codereview.appspot.com/6493072/diff/14/lily/lexer.ll#newcode859 lily/lexer.ll:859: push_note_state (nn); On 2012/09/02 11:52:46, dak wrote: > What is this supposed to do? INITIAL state is not supposed to have pitchnames > defined. To fully-bake this implementation, I would want INITIAL to have pitchnames defined, so I can use INITIAL at top level to recognize the pitch in \relative do' {}. Then entry into note-mode would not need to push onto pitchname_tab_stack_ http://codereview.appspot.com/6493072/diff/14/lily/parser.yy File lily/parser.yy (right): http://codereview.appspot.com/6493072/diff/14/lily/parser.yy#newcode1190 lily/parser.yy:1190: '{' { parser->lexer_->push_maybe_note_state (); } music_list '}' On 2012/09/02 11:52:46, dak wrote: > Pushing a separate "maybe_note_ state for _every_ braced music list? Seriously? For every nested level of braces.
Sign in to reply to this message.
On 2012/09/02 11:52:46, dak wrote: > It looks like some _severe_ doctoring around with regard to notenames was done > to make regtests pass without an actual understanding of the failure modes, > introducing some half-baked in-between modes that don't have a purpose apart > from papering over the fact that this patch causes the lexer to be in the wrong > mode due to parser lookahead at several points of time. My desire is for just one in-between mode, which I want for scanning top-level. That desired mode is to have pitch-names loaded, but without the scanner accepting music with durations. I want to talk about it while it is un-baked. The difficulty in putting the lexer in the correct mode, after music functions with variable argument lists, is related to the mailing list discussion on "GLISS". This is the version that allows digits at the *end* of identifiers, which might not be wise. See the tracker <http://code.google.com/p/lilypond/issues/detail?id=1670> The regression test changes demonstrate its conflict with possibly-useful syntax; I was surprised how few changes were needed. I have been waiting with this for over a year, so of course I will not try to push anything forward until this "GLISS" dicussion is done.
Sign in to reply to this message.
All in all, this creates so many loose ends and problems of the kind I have been working to get tied up that I can basically start from scratch with lexer/parser work. I am very strongly opposed to this, and it is easy to come up with examples that fail for inexplicable reasons because of the interaction of parser lookahead and lexer mode switching. This is a large step towards making LilyPond behave unpredictable in a number of corner cases as well as existing usage as witnessed by the large number of regression tests needing modifications without a documented or user-discernible reason. Numbers in identifiers _will_ have drawbacks (clearly \skip4 is not fitting well with various naming schemes allowing numbers in names). But I think that the drawbacks of this patch are quite more than what would be unavoidable. It is a bad idea for more reasons than the reasons making the general idea of numbers in words/commands a bad idea. http://codereview.appspot.com/6493072/diff/14/lily/lexer.ll File lily/lexer.ll (left): http://codereview.appspot.com/6493072/diff/14/lily/lexer.ll#oldcode397 lily/lexer.ll:397: <chords,notes,figures>{RESTNAME}/[-_] | // pseudo backup rule On 2012/09/02 17:59:53, Keith wrote: > On 2012/09/02 11:52:46, dak wrote: > > Did you check that r-. does still work as intended when removing this rule? > > The syntax never allowed new identifier definitions in sequential/simultaneous > music. That is not an answer to my question. > \relative c' { new-variable-name = { c d e f } } % never valid But \relative c' c new-variable-name = { c d e f } has been valid and will no longer be because new is seen by the lexer while still in music mode.
Sign in to reply to this message.
The version "digits, but not at the end" lets us read vn2_meas345ff = \relative c' cis1 by depending on the ability of the scanner to back up. The input ceses128_3ap is a single word, but upon finding the \ in ceses128_3\p the scanner has to back up to split as ceses 128 _ 3 \p
Sign in to reply to this message.
On 2012/09/03 18:08:23, Keith wrote: > The version "digits, but not at the end" lets us read > vn2_meas345ff = \relative c' cis1 > by depending on the ability of the scanner to back up. > > The input > ceses128_3ap > is a single word, but upon finding the \ in > ceses128_3\p > the scanner has to back up to split as > ceses 128 _ 3 \p flex documentation is pretty clear about backing up being very expensive. I don't remember whether it was only expensive when it happens, or whether the expense was more or less a fixed cost. However, things like a4 are not actually rare in music input.
Sign in to reply to this message.
On Mon, Sep 03, 2012 at 08:07:07PM +0000, dak@gnu.org wrote: > > flex documentation is pretty clear about backing up being very > expensive. I don't remember whether it was only expensive when it > happens, or whether the expense was more or less a fixed cost. However, > things like a4 are not actually rare in music input. > Would there be any future in allowing digits in identifiers provided the sequence started with a 0. So a04 would be an identifier but a4 not. This seems to avoid the backing up problem. Bernard.
Sign in to reply to this message.
On 2012/09/03 20:07:07, dak wrote: > flex documentation is pretty clear about backing up being > very expensive. My copy is less emphatic about this. Let's set the percussion parts for the "New World" symphony! Here LilyPond has to read all the notes for every part in the orchestra, to generate cues for a few well-timed cymbal crashes, so that scanning time has its maximum effect on execution time. time ../lilypond/build/out/bin/lilypond percussion.ly real 0m26.186 0m26.118s user 0m24.819 0m24.778s sys 0m0.569s 0m0.547s On 2012/09/03 20:35:35, bernard_marcade.biz wrote: > Would there be any future in allowing digits in identifiers provided > the sequence started with a 0. So a04 would be an identifier but > a4 not. This seems to avoid the backing up problem. I find I also need to disallow numbers after underscores, so no "violin_01", but then it does avoid the need for backup tables. I guess the rule-of-0s would be too inconvenient to be appreciated, though.
Sign in to reply to this message.
While we are thinking about this, I suggest we remove (later) the rule forbidding backing-up states in the scanner. It made only a 0.2% in the worst-case scenario I could think of. The rule had been violated, giving us the slightly slower scanner, for about ten years (I estimate) prior to the warning-cleanup patch da949cdc one year ago. It costs a lot of programmer time to make the extra rules to save that 0.2%, and I know of no efficient cure for patterns that can require arbitrary amounts of backing up. \violin1_2_3_4_5 http://codereview.appspot.com/6493072/diff/5015/lily/lexer.ll File lily/lexer.ll (left): http://codereview.appspot.com/6493072/diff/5015/lily/lexer.ll#oldcode34 lily/lexer.ll:34: flex -b <this lexer file> `-b, --backup, `%option backup'' Generate backing-up information to `lex.backup'. This is a list of scanner states which require backing up and the input characters on which they do so. By adding rules one can remove backing-up states. If _all_ backing-up states are eliminated and `-Cf' or `-CF' is used, the generated scanner will run faster (see the `--perf-report' flag). Only users who wish to squeeze every last cycle out of their scanners need worry about this option. (*note Performance::).
Sign in to reply to this message.
On 2012/09/05 06:59:16, Keith wrote: > While we are thinking about this, I suggest we remove (later) the rule > forbidding backing-up states in the scanner. It made only a 0.2% in the > worst-case scenario I could think of. > > The rule had been violated, giving us the slightly slower scanner, for about ten > years (I estimate) prior to the warning-cleanup patch da949cdc one year ago. > > It costs a lot of programmer time to make the extra rules to save that 0.2%, Not really. > and > I know of no efficient cure for patterns that can require arbitrary amounts of > backing up. \violin1_2_3_4_5 I am not convinced that patterns requiring arbitrary amounts of backing up make for a good choice of lexical units. Whatever we allow as part of the language is something that convert-ly needs to deal with, and every other system reading LilyPond code. Including humans. "Arbitrary amount of backing up" also applies to them. While I agree that requiring -b seems a bit arbitrary and is not an absolute barrier and the performance impact will quite likely depend heavily on the kind of backup patterns involved here, the use cases for dropping them have not convinced me. It also turns out that in the presence of large backup tables, it becomes hard to guess which of multiple prospectively active rules will get selected (have you tried looking at the respective debug output when rules with backup states come into play non-trivially?). So I am against removing this advice independently from patches that might actually require this removal.
Sign in to reply to this message.
On Wed, 05 Sep 2012 00:50:27 -0700, <dak@gnu.org> wrote: > On 2012/09/05 06:59:16, Keith wrote: >> It costs a lot of programmer time to make the extra rules to save that >> 0.2%, > > Not really. > But, but... flex documentation is pretty clear about [getting rid of] backing up being very expensive : "Getting rid of backing up is messy and often may be an enormous amount of work for a complicated scanner." Okay. Maybe it didn't take you long to /maintain/ backup-free rules, but it took me quite a while to remove backing up on the existing rules, when I made the patch to clean up warnings. > I am not convinced that patterns requiring arbitrary amounts of backing > up make for a good choice of lexical units. > Right. They probably don't. > So I am against removing this advice independently from patches that > might actually require this removal. > Agreed, but I'll still pout a couple more times that you get your Schemy-dashes and underscores but I still have to refer to the motif from measure tousend_sechshundert_siebzig > http://codereview.appspot.com/6493072/
Sign in to reply to this message.
On 2012/09/05 09:26:12, Keith wrote: > Agreed, > but I'll still pout a couple more times that you get your Schemy-dashes and > underscores but I still have to refer to the motif from measure > tousend_sechshundert_siebzig _I_ get my Schemy-dashes? I was _not_, I repeat, _not_ the person who invented variables named line-width and similar and considered writing Oh_my_god_engraver without quotes a fabulous idea. Someone also decided to allow digits in names in INITIAL mode (and possibly some TeX escape sequences as well) but since, like everything in LilyPond, this was not documented and _thank_ _God_ nobody noticed since LilyPond does not start out in INITIAL mode, I have been able to remove this before the damage had been established outside of music mode. If it had been up to me, "my" Schemy-dashes and underscores would have gone where the sun don't shine. But while trying to create some more-or-less consistent syntax according to more-or-less simple rules, I try not to break all too many existing use cases. With the current rule set, we can at least _pretend_ that somebody had a more or less consistent design in mind before starting to write LilyPond. I am making progress on being able to use pseudo-variables like \motif 1670 and hope that I'll get people out of my hair when I get that into shape. No, it is not the same, but it has its own advantages.
Sign in to reply to this message.
On Wed, Sep 5, 2012 at 11:51 AM, <dak@gnu.org> wrote: > If it had been up to me, "my" Schemy-dashes and underscores would have > gone where the sun don't shine. But while trying to create some > more-or-less consistent syntax according to more-or-less simple rules, I > try not to break all too many existing use cases. Do i understand correctly - you say it would be cleaner to have everything in camelCase? Like, AccidentalEngraver, BeamThickness etc? I like it! Janek
Sign in to reply to this message.
LGTM. Nice idea. I'm not sure whether this fits into the large picture w.r.t. syntax normalization as envisioned by David, but at least for me it looks reasonable.
Sign in to reply to this message.
On 2012/10/29 06:20:17, lemzwerg wrote: > LGTM. Nice idea. I'm not sure whether this fits into the large picture w.r.t. > syntax normalization as envisioned by David, but at least for me it looks > reasonable. Well, I replied on the Google code review as well. In a manner, this is the kind of issue that would make it convenient (or at least time- and worry-saving) for me to have the sort of Linus-like dictatorship that the Linux kernel has. With most syntax proposals, "I won't do this myself" is likely enough to let the efforts sizzle out eventually. This is basically the "traditional" approach applied by Han-Wen and Jan, but I don't consider this really an upright way of dealing with things. I have to concede that this approach made them cooperate well (or rather not cooperate, with nobody being particularly unhappy or worried about that) with Graham, something which I failed doing satisfactorily. Now Keith is not as easy to shrug off as that. He delivers code along with his proposals. With regard to syntax changes and/or additions, of course I am the most guilty party for opening cans of worms. Unifying word syntax across modes was perhaps the most disruptive in this area. The result was that "a word is a sequence of letters (including any non-ASCII character by definition), possibly interspersed with single - and _ characters". This is a unification of the word concept that was different in INITIAL and music mode. Now Keith's proposal boils down to "a word is a sequence of letters (including any non-ASCII character by definition) possibly interspersed with single -, . and _ characters, where behind a . there may also come a digit sequence possibly followed by further word constituents optionally starting with -, . and _ again". So far, so bad. Now hooking onto the recent Context.Grob changes probably inspiring this proposal, we have the situation that Context.Grob is equivalent to Context . Grob or to Context . $(string-append "Gr" "ob"), that is, . is acting as an operator here joining several expressions possibly created in different ways. Keith's proposal would not imply that violin . $(+ 1 1) would be the same as violin.2 and not even violin . 2 would work here. The superficial similarity with the dotted syntax breaks down as soon as you try putting it to the test. So in the category "does this change lead to a greater consistency of LilyPond syntax in itself", which is more or less the metric I use for justifying invasive changes to myself, this change introduces complex rules and analogies to existing constructs that work out only at a particular superficial level. It makes LilyPond harder to understand, not simpler. I can wave around my long-term plans which would allow for just writing \violin1 by allowing arrays of violins (I have something in a branch right now, but without further syntax changes I am working on it is not really fitting seamlessly into LilyPond). But long-term plans are not really a suitable excuse for blocking other developers indefinitely. Personally, I consider digits in identifiers not worth screwing LilyPond over. My \"violin1" patch is something I don't consider really a fabulous idea: its main incentive is quieting the demands for more invasive changes that will be quite harder to sort out in its consequences regarding documentation and future changes. It seems that my goal of quieting calls for more invasive changes has actually failed. I expect this to cause a lot of trouble. That is not an absolute counterargument: I expect some of my changes to cause quite a bit of trouble as well. What is, in my book, making a difference in the evaluation is that this trouble, which undoubtedly will tend to end up primarily at my own doorstep, buys us a purely cosmetic change without any functional difference. violin.1 will not be able to have the 1 calculated rather than spelled out. It really just is part of the identifier without numerical meaning. a.b is equivalent to a . b, but a.1 would not be equivalent to a . 1. So while I can't class this is "impossible", I do consider it as "too expensive" in terms of explaining it to the user, and in terms of dealing with followup consequences that will be mostly my responsibility. Now I am perfectly well aware that if "David feels this is a bad idea" is supposed to hold any reasonable amount of power, we are essentially back at a dictatorial situation where I maintain power by virtue of being able to make the most ominous threats (basically, the model of modern representative democracy). Which is, in a way, again cheating our self-chosen systems in a manner I don't appreciate. So I hope that Keith does not view my repeated objections to proposals in this issue as a disregard of his work on them. I hope, pompous as this may sound, that he can view this as a sort of learning situation where his detailed proposals result in more detailed feedback about how the choices in my sort of revolutionary conservatism I tend to exhibit in program design tend to come about. This proposal is clever, and the basic opportunity is well-spotted. I just think it is a bit too clever when we are taking into account the restrictions it still has and the consequences to related areas and the documentation. \"xxx1" has pretty much the same restrictions and underlying ugliness (probably a bit more) and unnecessity. It is just much cheaper and well-confined. It does not have the "this will be a lot of trouble later on" ring to it.
Sign in to reply to this message.
On 2012/10/29 10:05:30, dak wrote: > Keith's proposal would not imply that violin . $(+ 1 1) would > be the same as violin.2 and not even violin . 2 would work here. > I didn't think we wanted such things. Nor do we want \paper { short-indent = 3\cm } to subtract 'indent' from 'short'. We are simply accommodating other language conventions, Scheme or human, in acceptable names. We can choose a different character to introduce digits \violin+1 \violin01 \violin,1 \violin;1 \violin:1 as in patch set 3, for example.
Sign in to reply to this message.
|