|
|
DescriptionOn every GC cycle, increase the heap so we have at least 2000 bytes
per non-simple Smob. This increases the size of the heap relative to
the live set of data.
The number 2000 is based on benchmarking the Robert Carver's "Missa
Dum sacrum mysterium" Edited by Vaughan McAlley (2013), a 104 page
choir score.
By setting GC_INITIAL_HEAP_SIZE and the max heap size, we can see how
GC time and heap size relate. On a Lenovo T460p (i5 6440HQ 2.6Ghz),
this yields the following timings:
* 2G heap: 49.4s (1.8s GC)
* 1G heap: 51.7s (3.3s GC)
* 800M heap: 54.6s (5.8s GC)
* 500M heap: 68.2s (17.9s GC)
We see that 800M - 1G is a sweet spot, where around 10% of time is
spent on GC. Running the same score with 2000 bytes per smob, we get a
final heap size of 1.2G, and a GC time of 3.4s.
The reason this is necessary, is because during the interpretation
stage, we build up a large set of Grobs. Without tuning, GC is
ineffective, because the Grobs are live data and can't be collected.
We can observe this problem by running with GC_PRINT_STATS=1, which
prints
In-use heap: 85% (370824 KiB pointers + 60065 KiB other)
In-use heap: 85% (370725 KiB pointers + 59737 KiB other)
..
Patch Set 1 #
Total comments: 1
Patch Set 2 : nit #
Total comments: 8
Patch Set 3 : move to lily-guile.cc #Patch Set 4 : autoconf #
Total comments: 4
Patch Set 5 : jonas' comments #Patch Set 6 : rebase #Patch Set 7 : hook into GC event #Patch Set 8 : Use smob count as memory proxy #
Total comments: 8
Patch Set 9 : Jonas' comments #Patch Set 10 : comment #
MessagesTotal messages: 65
LGTM https://codereview.appspot.com/561390043/diff/563450046/lily/score-engraver.cc File lily/score-engraver.cc (right): https://codereview.appspot.com/561390043/diff/563450046/lily/score-engraver.c... lily/score-engraver.cc:200: // This double the heap. TODO: don't do this if we get close to s/double/doubles/
Sign in to reply to this message.
nit
Sign in to reply to this message.
https://codereview.appspot.com/561390043/diff/557260051/lily/include/score-en... File lily/include/score-engraver.hh (right): https://codereview.appspot.com/561390043/diff/557260051/lily/include/score-en... lily/include/score-engraver.hh:33: GC_word last_gc_count_; FYI: Because we're using C++11 now, you have the option of providing the default value of this member right here. GC_word last_gc_count_ = -1; If you did that, you wouldn't have to mention this member in any constructors. IMO it would be even better than usual in this case because it would get rid of a preprocessor conditional in the cc file. https://codereview.appspot.com/561390043/diff/557260051/lily/score-engraver.cc File lily/score-engraver.cc (right): https://codereview.appspot.com/561390043/diff/557260051/lily/score-engraver.c... lily/score-engraver.cc:174: #include <gc.h> It would be better to move this to the top, or if there is a reason it can't be moved to the top, to comment. Includes in the middle of a file can frustrate maintainers on occasion when someone uses a namespace or defines file-scope things before them that conflict with what's in them. https://codereview.appspot.com/561390043/diff/557260051/lily/score-engraver.c... lily/score-engraver.cc:199: if (reclaimed < 0.2) { TODO (?) make this threshold configurable
Sign in to reply to this message.
https://codereview.appspot.com/561390043/diff/557260051/lily/include/score-en... File lily/include/score-engraver.hh (right): https://codereview.appspot.com/561390043/diff/557260051/lily/include/score-en... lily/include/score-engraver.hh:26: #include <gc.h> This effectively adds a dependency on libgc which you should search for at configure - it is not necessarily installed in a default location.
Sign in to reply to this message.
https://codereview.appspot.com/561390043/diff/557260051/lily/include/score-en... File lily/include/score-engraver.hh (right): https://codereview.appspot.com/561390043/diff/557260051/lily/include/score-en... lily/include/score-engraver.hh:26: #include <gc.h> On 2020/02/01 11:00:40, hahnjo wrote: > This effectively adds a dependency on libgc which you should search for at > configure - it is not necessarily installed in a default location. I'm confused - does this mean `gc/gc.h` or `libguile/gc.h`? I think the latter only includes `gc/gc.h` transitively through `libguile/bdw-gc.h` if BUILDING_LIBGUILE is defined, so this code needs the first.
Sign in to reply to this message.
On 2020/02/01 11:00:41, hahnjo wrote: > https://codereview.appspot.com/561390043/diff/557260051/lily/include/score-en... > File lily/include/score-engraver.hh (right): > > https://codereview.appspot.com/561390043/diff/557260051/lily/include/score-en... > lily/include/score-engraver.hh:26: #include <gc.h> > This effectively adds a dependency on libgc which you should search for at > configure - it is not necessarily installed in a default location. I've added a TODO for now. One complication is that libgc doesn't come with pkg-config.
Sign in to reply to this message.
move to lily-guile.cc
Sign in to reply to this message.
https://codereview.appspot.com/561390043/diff/557260051/lily/include/score-en... File lily/include/score-engraver.hh (right): https://codereview.appspot.com/561390043/diff/557260051/lily/include/score-en... lily/include/score-engraver.hh:33: GC_word last_gc_count_; On 2020/02/01 10:55:50, Dan Eble wrote: > FYI: Because we're using C++11 now, you have the option of providing the default > value of this member right here. > > GC_word last_gc_count_ = -1; > > If you did that, you wouldn't have to mention this member in any constructors. > IMO it would be even better than usual in this case because it would get rid of > a preprocessor conditional in the cc file. Done. https://codereview.appspot.com/561390043/diff/557260051/lily/score-engraver.cc File lily/score-engraver.cc (right): https://codereview.appspot.com/561390043/diff/557260051/lily/score-engraver.c... lily/score-engraver.cc:174: #include <gc.h> On 2020/02/01 10:55:50, Dan Eble wrote: > It would be better to move this to the top, or if there is a reason it can't be > moved to the top, to comment. Includes in the middle of a file can frustrate > maintainers on occasion when someone uses a namespace or defines file-scope > things before them that conflict with what's in them. moved to lily-guile.{cc,hh} https://codereview.appspot.com/561390043/diff/557260051/lily/score-engraver.c... lily/score-engraver.cc:199: if (reclaimed < 0.2) { On 2020/02/01 10:55:50, Dan Eble wrote: > TODO (?) make this threshold configurable 1) Before we make this fancy, I'd rather see GUILE 2.x working fully. I've added a note on how to disable this as a comment for now. 2) if people want to configure this locally, they should just set GC_MAX_HEAPSIZE to their amount of RAM available.
Sign in to reply to this message.
On 2020/02/01 20:49:09, hanwenn wrote: > On 2020/02/01 11:00:41, hahnjo wrote: > > > https://codereview.appspot.com/561390043/diff/557260051/lily/include/score-en... > > File lily/include/score-engraver.hh (right): > > > > > https://codereview.appspot.com/561390043/diff/557260051/lily/include/score-en... > > lily/include/score-engraver.hh:26: #include <gc.h> > > This effectively adds a dependency on libgc which you should search for at > > configure - it is not necessarily installed in a default location. > > I've added a TODO for now. One complication is that libgc doesn't come with > pkg-config. It seems like a bad idea to change the bgc parameters behind Guile's back. Does Guile not have any GC hooks of its own that one could use for increasing the heap size or its behavior with regard to requesting memory?
Sign in to reply to this message.
On 2020/02/01 20:59:06, dak wrote: > On 2020/02/01 20:49:09, hanwenn wrote: > > On 2020/02/01 11:00:41, hahnjo wrote: > > > > > > https://codereview.appspot.com/561390043/diff/557260051/lily/include/score-en... > > > File lily/include/score-engraver.hh (right): > > > > > > > > > https://codereview.appspot.com/561390043/diff/557260051/lily/include/score-en... > > > lily/include/score-engraver.hh:26: #include <gc.h> > > > This effectively adds a dependency on libgc which you should search for at > > > configure - it is not necessarily installed in a default location. > > > > I've added a TODO for now. One complication is that libgc doesn't come with > > pkg-config. > > It seems like a bad idea to change the bgc parameters behind Guile's back. Can you give a specific reason why? > Does > Guile not have any GC hooks of its own that one could use for increasing the > heap size or its behavior with regard to requesting memory? I guess you could allocate a large block of memory, and then deallocate it again. The whole idea of GUILE moving to libgc is that it gets out of the business of trying to do GC. Why would they insert themselves into that game again?
Sign in to reply to this message.
On 2020/02/01 21:23:51, hanwenn wrote: > On 2020/02/01 20:59:06, dak wrote: > > On 2020/02/01 20:49:09, hanwenn wrote: > > > On 2020/02/01 11:00:41, hahnjo wrote: > > > > > > > > > > https://codereview.appspot.com/561390043/diff/557260051/lily/include/score-en... > > > > File lily/include/score-engraver.hh (right): > > > > > > > > > > > > > > https://codereview.appspot.com/561390043/diff/557260051/lily/include/score-en... > > > > lily/include/score-engraver.hh:26: #include <gc.h> > > > > This effectively adds a dependency on libgc which you should search for at > > > > configure - it is not necessarily installed in a default location. > > > > > > I've added a TODO for now. One complication is that libgc doesn't come with > > > pkg-config. > > > > It seems like a bad idea to change the bgc parameters behind Guile's back. > > Can you give a specific reason why? > > > Does > > Guile not have any GC hooks of its own that one could use for increasing the > > heap size or its behavior with regard to requesting memory? > > I guess you could allocate a large block of memory, and then deallocate it > again. > > The whole idea of GUILE moving to libgc is that it gets out of the business of > trying > to do GC. Why would they insert themselves into that game again? From the Guile-2.2 manual: -- C Function: void * scm_malloc (size_t SIZE) -- C Function: void * scm_calloc (size_t SIZE) [...] These functions will (indirectly) call ‘scm_gc_register_allocation’. -- C Function: void scm_gc_register_allocation (size_t SIZE) Informs the garbage collector that SIZE bytes have been allocated, which the collector would otherwise not have known about. In general, Scheme will decide to collect garbage only after some amount of memory has been allocated. Calling this function will make the Scheme garbage collector know about more allocation, and thus run more often (as appropriate). It is especially important to call this function when large unmanaged allocations, like images, may be freed by small Scheme allocations, like foreign objects. So it would appear that Guile does have an interface for the notion of getting in memory pressure, and we incidentally also call those hooks when we are allocating smob structures.
Sign in to reply to this message.
On 2020/02/01 20:49:09, hanwenn wrote: > On 2020/02/01 11:00:41, hahnjo wrote: > > > https://codereview.appspot.com/561390043/diff/557260051/lily/include/score-en... > > File lily/include/score-engraver.hh (right): > > > > > https://codereview.appspot.com/561390043/diff/557260051/lily/include/score-en... > > lily/include/score-engraver.hh:26: #include <gc.h> > > This effectively adds a dependency on libgc which you should search for at > > configure - it is not necessarily installed in a default location. > > I've added a TODO for now. One complication is that libgc doesn't come with > pkg-config. This is just not true: $ pkg-config --libs bdw-gc -lgc
Sign in to reply to this message.
On 2020/02/02 09:49:50, hahnjo wrote: > On 2020/02/01 20:49:09, hanwenn wrote: > > On 2020/02/01 11:00:41, hahnjo wrote: > > > > > > https://codereview.appspot.com/561390043/diff/557260051/lily/include/score-en... > > > File lily/include/score-engraver.hh (right): > > > > > > > > > https://codereview.appspot.com/561390043/diff/557260051/lily/include/score-en... > > > lily/include/score-engraver.hh:26: #include <gc.h> > > > This effectively adds a dependency on libgc which you should search for at > > > configure - it is not necessarily installed in a default location. > > > > I've added a TODO for now. One complication is that libgc doesn't come with > > pkg-config. > > This is just not true: > $ pkg-config --libs bdw-gc > -lgc Oops! my bad. I didn't pay close enough attention to the file list.
Sign in to reply to this message.
On 2020/02/01 21:59:14, dak wrote: > On 2020/02/01 21:23:51, hanwenn wrote: > > On 2020/02/01 20:59:06, dak wrote: > > > On 2020/02/01 20:49:09, hanwenn wrote: > > > > On 2020/02/01 11:00:41, hahnjo wrote: > > > > > > > > > > > > > > > https://codereview.appspot.com/561390043/diff/557260051/lily/include/score-en... > > > > > File lily/include/score-engraver.hh (right): > > > > > > > > > > > > > > > > > > > > https://codereview.appspot.com/561390043/diff/557260051/lily/include/score-en... > > > > > lily/include/score-engraver.hh:26: #include <gc.h> > > > > > This effectively adds a dependency on libgc which you should search for > at > > > > > configure - it is not necessarily installed in a default location. > > > > > > > > I've added a TODO for now. One complication is that libgc doesn't come > with > > > > pkg-config. > > > > > > It seems like a bad idea to change the bgc parameters behind Guile's back. > > > > Can you give a specific reason why? > > > > > Does > > > Guile not have any GC hooks of its own that one could use for increasing the > > > heap size or its behavior with regard to requesting memory? > > > > I guess you could allocate a large block of memory, and then deallocate it > > again. > > > > The whole idea of GUILE moving to libgc is that it gets out of the business of > > trying > > to do GC. Why would they insert themselves into that game again? > > From the Guile-2.2 manual: > > -- C Function: void * scm_malloc (size_t SIZE) > -- C Function: void * scm_calloc (size_t SIZE) > [...] > > These functions will (indirectly) call > ‘scm_gc_register_allocation’. > > -- C Function: void scm_gc_register_allocation (size_t SIZE) > Informs the garbage collector that SIZE bytes have been allocated, > which the collector would otherwise not have known about. > > In general, Scheme will decide to collect garbage only after some > amount of memory has been allocated. Calling this function will > make the Scheme garbage collector know about more allocation, and > thus run more often (as appropriate). > > It is especially important to call this function when large > unmanaged allocations, like images, may be freed by small Scheme > allocations, like foreign objects. > > So it would appear that Guile does have an interface for the notion of getting > in memory pressure, and we incidentally also call those hooks when we are > allocating smob structures. that is a hook, but it's not the one we need. What we want is a way to make BDW/Guile expand the heap even if it thinks it's not necessary. scm_register_allocation is meant to increase the GC frequency, but we want the opposite. We could do scm_gc_malloc + scm_gc_free, but 1) it's doing more work than we need 2) there is a risk that the GC library will treat our large allocations especially (e.g returning to the O/S on GC_free) I think asking libgc to expand the heap is the best way to ask for a larger heap.
Sign in to reply to this message.
On 2020/02/01 11:00:41, hahnjo wrote: > https://codereview.appspot.com/561390043/diff/557260051/lily/include/score-en... > File lily/include/score-engraver.hh (right): > > https://codereview.appspot.com/561390043/diff/557260051/lily/include/score-en... > lily/include/score-engraver.hh:26: #include <gc.h> > This effectively adds a dependency on libgc which you should search for at > configure - it is not necessarily installed in a default location. added an autoconf stanza.
Sign in to reply to this message.
autoconf
Sign in to reply to this message.
https://codereview.appspot.com/561390043/diff/565600046/configure.ac File configure.ac (right): https://codereview.appspot.com/561390043/diff/565600046/configure.ac#newcode270 configure.ac:270: PKG_CHECK_MODULES(BDWGC, bdw-gc) This should fail if not found, you're using it unconditionally https://codereview.appspot.com/561390043/diff/565600046/lily/lily-guile.cc File lily/lily-guile.cc (right): https://codereview.appspot.com/561390043/diff/565600046/lily/lily-guile.cc#ne... lily/lily-guile.cc:30: // pkg-config or other type of autodiscovery mechanism on Fedora. Remove comment
Sign in to reply to this message.
jonas' comments
Sign in to reply to this message.
https://codereview.appspot.com/561390043/diff/565600046/configure.ac File configure.ac (right): https://codereview.appspot.com/561390043/diff/565600046/configure.ac#newcode270 configure.ac:270: PKG_CHECK_MODULES(BDWGC, bdw-gc) On 2020/02/02 13:21:09, hahnjo wrote: > This should fail if not found, you're using it unconditionally Done. https://codereview.appspot.com/561390043/diff/565600046/lily/lily-guile.cc File lily/lily-guile.cc (right): https://codereview.appspot.com/561390043/diff/565600046/lily/lily-guile.cc#ne... lily/lily-guile.cc:30: // pkg-config or other type of autodiscovery mechanism on Fedora. On 2020/02/02 13:21:09, hahnjo wrote: > Remove comment Done.
Sign in to reply to this message.
On 2020/02/02 13:43:25, hanwenn wrote: > jonas' comments The uploaded diff has the wrong base, it's reverting quite some changes from master.
Sign in to reply to this message.
rebase
Sign in to reply to this message.
On 2020/02/02 13:45:34, hahnjo wrote: > On 2020/02/02 13:43:25, hanwenn wrote: > > jonas' comments > > The uploaded diff has the wrong base, it's reverting quite some changes from > master. I hate Rietveld. Fixed.
Sign in to reply to this message.
I just tried to reproduce the timings for commits already in master and this patch. To be honest I don't see a clear picture yet. Yes, this change seems to improve the time spent for garbage collection, but the real time reported by "time" only decreases by a fraction (less than 50% of the saved time for gc). Also I consistently measure increased total and gc time when toggling the setting of the initial heap size, ie the change in master actually makes it slower for me. My conclusion would be that we need to measure larger scores, not executions less than 10s. This may be the use case that most users care about, but AFAICS it's actually pretty hard to get reliable data for now. I've tried to use the MSDM example from https://lists.gnu.org/archive/html/lilypond-user/2016-11/msg00700.html which runs for around ~40s on my system, but it crashes with Guile 2.2: GUILE signaled an error for the expression beginning here # (define-music-function (parser location )() Unbound variable: ol Should we maybe prioritize correctness over performance for now? Comparing performance with a fully working LilyPond should be much easier.
Sign in to reply to this message.
jonas.hahnfeld@gmail.com writes: > I just tried to reproduce the timings for commits already in master and > this patch. To be honest I don't see a clear picture yet. > > Yes, this change seems to improve the time spent for garbage collection, > but the real time reported by "time" only decreases by a fraction (less > than 50% of the saved time for gc). Also I consistently measure > increased total and gc time when toggling the setting of the initial > heap size, ie the change in master actually makes it slower for me. > > My conclusion would be that we need to measure larger scores, not > executions less than 10s. This may be the use case that most users care > about, but AFAICS it's actually pretty hard to get reliable data for > now. > I've tried to use the MSDM example from > https://lists.gnu.org/archive/html/lilypond-user/2016-11/msg00700.html > which runs for around ~40s on my system, but it crashes with Guile 2.2: > GUILE signaled an error for the expression beginning here > # > > (define-music-function (parser location )() > > Unbound variable: ol The preceding line is col = so this is likely a matter of passing the wrong part of the file into Guile when encountering # . The file contains two 3-byte UTF-8 sequences above which could be thought to throw off the interpretation by 4 bytes. But it actually is off by 6 bytes if it is running into the preceding "ol", as if the special characters/bytes are not seen at all. -- David Kastrup
Sign in to reply to this message.
For testing, use https://github.com/hanwen/lilypond/tree/guile22-experiment in particular, you need https://github.com/hanwen/lilypond/commit/b696550379831ecec1519be6d59cd8a3003... for the UTF-8 parsing. I fixed this a week ago, but due to the delays in getting the preceding fix in ("cleanup embedded SCM parsing"), I couldn't send that for review yet. For me, juggling 15 different outstanding code reviews at the same time is the bane of the current development process. On Sun, Feb 2, 2020 at 9:51 PM David Kastrup <dak@gnu.org> wrote: > > jonas.hahnfeld@gmail.com writes: > > > I just tried to reproduce the timings for commits already in master and > > this patch. To be honest I don't see a clear picture yet. > > > > Yes, this change seems to improve the time spent for garbage collection, > > but the real time reported by "time" only decreases by a fraction (less > > than 50% of the saved time for gc). Also I consistently measure > > increased total and gc time when toggling the setting of the initial > > heap size, ie the change in master actually makes it slower for me. > > > > My conclusion would be that we need to measure larger scores, not > > executions less than 10s. This may be the use case that most users care > > about, but AFAICS it's actually pretty hard to get reliable data for > > now. > > I've tried to use the MSDM example from > > https://lists.gnu.org/archive/html/lilypond-user/2016-11/msg00700.html > > which runs for around ~40s on my system, but it crashes with Guile 2.2: > > GUILE signaled an error for the expression beginning here > > # > > > > (define-music-function (parser location )() > > > > Unbound variable: ol > > The preceding line is > > col = > > so this is likely a matter of passing the wrong part of the file into > Guile when encountering # . The file contains two 3-byte UTF-8 > sequences above which could be thought to throw off the interpretation > by 4 bytes. But it actually is off by 6 bytes if it is running into the > preceding "ol", as if the special characters/bytes are not seen at all. > > -- > David Kastrup -- Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen
Sign in to reply to this message.
On 02/02/2020 22:33, Han-Wen Nienhuys wrote: > For me, juggling 15 different outstanding code reviews at the same > time is the bane of the current development process. what do you suggest? James
Sign in to reply to this message.
On Sun, Feb 2, 2020 at 11:47 PM <pkx166h@posteo.net> wrote: > > On 02/02/2020 22:33, Han-Wen Nienhuys wrote: > > For me, juggling 15 different outstanding code reviews at the same > > time is the bane of the current development process. > > what do you suggest? I think we should move to a git-based code review tool; Github, Gitlab or Gerrit. Gerrit is probably the closest to the current workflow. Then, you could push an entire branch with several commits for review. The state of the change could be administered within the code review itself, so we don't need to keep tools in sync with each other. -- Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen
Sign in to reply to this message.
On Mon, Feb 3, 2020 at 9:35 AM Han-Wen Nienhuys <hanwenn@gmail.com> wrote: > > On Sun, Feb 2, 2020 at 11:47 PM <pkx166h@posteo.net> wrote: > > > > On 02/02/2020 22:33, Han-Wen Nienhuys wrote: > > > For me, juggling 15 different outstanding code reviews at the same > > > time is the bane of the current development process. > > > > what do you suggest? > > I think we should move to a git-based code review tool; Github, Gitlab > or Gerrit. Gerrit is probably the closest to the current workflow. > > Then, you could push an entire branch with several commits for review. > > The state of the change could be administered within the code review > itself, so we don't need to keep tools in sync with each other. In Gerrit, you can define your own labels (with permission who can vote on them). See for example, https://review.gerrithub.io/c/hanwen/lilypond/+/483638 This has a label for "passes make doc" and "regtest visual inspection". One can drive CI systems by querying for the result, eg. https://review.gerrithub.io/q/project:hanwen/lilypond++label:Passes-Make-Doc (with the label) or https://review.gerrithub.io/q/project:hanwen/lilypond++-label:Passes-Make-Doc (without the label). -- Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen
Sign in to reply to this message.
On Mon, Feb 3, 2020 at 9:35 AM Han-Wen Nienhuys <hanwenn@gmail.com> wrote: > > > For me, juggling 15 different outstanding code reviews at the same > > > time is the bane of the current development process. > > > > what do you suggest? > > I think we should move to a git-based code review tool; Github, Gitlab > or Gerrit. Gerrit is probably the closest to the current workflow. I think it would also be good if we can find a mode in which we get rid of the "countdown". Instead of waiting for complaints, a change is pushed once it passes tests and someone LGTM'd it. If we discover a problem with the change afterwards, we could simply revert it and discuss further. -- Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen
Sign in to reply to this message.
On Mon, Feb 3, 2020 at 9:35 AM Han-Wen Nienhuys <hanwenn@gmail.com> wrote: > > On 02/02/2020 22:33, Han-Wen Nienhuys wrote: > > > For me, juggling 15 different outstanding code reviews at the same > > > time is the bane of the current development process. > > > > what do you suggest? > > I think we should move to a git-based code review tool; Github, Gitlab > or Gerrit. Gerrit is probably the closest to the current workflow. For Github, you can define labels and checks to apply to pull-requests, https://help.github.com/en/github/managing-your-work-on-github/applying-label... and you can define rules that require checks to pass before merging the PR. https://developer.github.com/v3/repos/statuses/ Gitlab has support for CI within the gitlab product itself, https://docs.gitlab.com/ee/ci/introduction/index.html#basic-cicd-workflow Gitlab/Github have the advantage over Gerrit that they include bug tracker integration (ie. you can easily track which commits fix which bugs and vice versa). Gerrit offers a more refined reviewing experience. Gitlab and Gerrit are open source, with the latter probably being easier to deploy (as it does less things.) -- Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen
Sign in to reply to this message.
Han-Wen Nienhuys <hanwenn@gmail.com> writes: > On Sun, Feb 2, 2020 at 11:47 PM <pkx166h@posteo.net> wrote: >> >> On 02/02/2020 22:33, Han-Wen Nienhuys wrote: >> > For me, juggling 15 different outstanding code reviews at the same >> > time is the bane of the current development process. >> >> what do you suggest? > > I think we should move to a git-based code review tool; Github, Gitlab > or Gerrit. Gerrit is probably the closest to the current workflow. > > Then, you could push an entire branch with several commits for review. > > The state of the change could be administered within the code review > itself, so we don't need to keep tools in sync with each other. I don't see how this changes the problem of a controversial patch blocking the progress of dependent patches while the controversy is not yet resolved. -- David Kastrup
Sign in to reply to this message.
Han-Wen Nienhuys <hanwenn@gmail.com> writes: > On Mon, Feb 3, 2020 at 9:35 AM Han-Wen Nienhuys <hanwenn@gmail.com> wrote: >> > > For me, juggling 15 different outstanding code reviews at the same >> > > time is the bane of the current development process. >> > >> > what do you suggest? >> >> I think we should move to a git-based code review tool; Github, Gitlab >> or Gerrit. Gerrit is probably the closest to the current workflow. > > I think it would also be good if we can find a mode in which we get > rid of the "countdown". Instead of waiting for complaints, a change is > pushed once it passes tests and someone LGTM'd it. If I may remind you, that was the effective state when I started on LilyPond. I submitted several patches that improved upon internals, so nobody currently active felt up to giving a comment or LGTM. Eventually I raised such a shitstorm over this complete blockage of new work that in the aftermath of dealing with it (that eventually led to the adoption of rules and procedures more amenable to new developers) Valentin left. I am not proud of that outcome. I have a Guile patch submitted in this state of waiting for "someone LGTM it" for 6 years already <https://debbugs.gnu.org/cgi/bugreport.cgi?bug=17474> because it involves internals of Guile, so people will rely on Andy Wingo for the LGTM and Andy Wingo cannot be arsed to even comment. That is a very dissatisfactory state particularly for newcomers since it draws a sharp and terminal dividing line between newcomers to the project and existing members who can just go ahead pushing their own patches or rely on stock LGTMs by others. We did not adopt the current system in a vacuum, we adopted it to address a situation that was very frustrating for new developers. > If we discover a problem with the change afterwards, we could simply > revert it and discuss further. Let me make this clear: you are currently in the process of submitting patches that change code conventions from current C++ practice used by current contributors to standards you decided on decades ago for your own work on LilyPond. If you follow the current discussion, how many LGTMs on those changes do you count? Our rules give you a chance at outsitting objections until your patch makes it in by default. If you instead had to wait for an LGTM, you'd be held up for an indefinite time. People could just ignore you and your work would never get anywhere, unless you use your commit privileges to ignore the process. That's the way things were handled when I started working on LilyPond, and it is not, in my book, a state worth returning to. In short: your proposed solution would not address your problem if you took reviews seriously. Instead you would have to rely on ignoring what people say and go ahead anyway. That is an approach that does not scale to projects of arbitrary size. -- David Kastrup
Sign in to reply to this message.
On Feb 3, 2020, at 04:30, Han-Wen Nienhuys <hanwenn@gmail.com> wrote: > > Instead of waiting for complaints, a change is > pushed once it passes tests and someone LGTM'd it. I've worked in places where commits were handled with self-discipline and mutual accountability, and I've worked in places where a UI enforced upper-management's policy that every change had to be approved by two other developers before it could be merged. I prefer self-discipline and mutual accountability to having to nag people with a superficial understanding of a change to put themselves on record as approving it. Therefore, regarding the countdown, I think it's a bad idea to require approval before pushing, unless we grant that the patch meister can approve pushing with the reason "countdown complete." Regarding accelerating the process, I wouldn't have a problem with a developer pushing a change after tests have passed, after receiving the clear approval of a developer competent in the subject, and when others aren't likely to disagree. It would take a little more trust and clarity of feedback than always waiting for the countdown. I don't expect that it would be the norm. — Dan
Sign in to reply to this message.
Dan Eble <dan@faithful.be> writes: > On Feb 3, 2020, at 04:30, Han-Wen Nienhuys <hanwenn@gmail.com> wrote: >> >> Instead of waiting for complaints, a change is >> pushed once it passes tests and someone LGTM'd it. > > I've worked in places where commits were handled with self-discipline > and mutual accountability, and I've worked in places where a UI > enforced upper-management's policy that every change had to be > approved by two other developers before it could be merged. I prefer > self-discipline and mutual accountability to having to nag people with > a superficial understanding of a change to put themselves on record as > approving it. > > Therefore, regarding the countdown, I think it's a bad idea to require > approval before pushing, unless we grant that the patch meister can > approve pushing with the reason "countdown complete." > > Regarding accelerating the process, I wouldn't have a problem with a > developer pushing a change after tests have passed, after receiving > the clear approval of a developer competent in the subject, and when > others aren't likely to disagree. It would take a little more trust > and clarity of feedback than always waiting for the countdown. I > don't expect that it would be the norm. It does happen, and different developers are differently comfortable taking the responsibility for bypassing the opportunity of feedback depending on the situation. The staging procedure reduces the potential size of consequences of such a step, even in case of completely bypassing any kind of review for trivial changes (trivial typically also implying that no computer-interpreted syntax is changed). It mostly makes sense where it is unexpected others will comment, or when in the course of larger interdependent changes that cannot be defused by rebases, the queue is expected to continue stalling for prolonged amounts of time. -- David Kastrup
Sign in to reply to this message.
Am Montag, den 03.02.2020, 09:58 -0500 schrieb Dan Eble: > On Feb 3, 2020, at 04:30, Han-Wen Nienhuys < > hanwenn@gmail.com > > wrote: > > Instead of waiting for complaints, a change is > > pushed once it passes tests and someone LGTM'd it. > > I've worked in places where commits were handled with self-discipline and mutual accountability, and I've worked in places where a UI enforced upper-management's policy that every change had to be approved by two other developers before it could be merged. I prefer self-discipline and mutual accountability to having to nag people with a superficial understanding of a change to put themselves on record as approving it. > > Therefore, regarding the countdown, I think it's a bad idea to require approval before pushing, unless we grant that the patch meister can approve pushing with the reason "countdown complete." > > Regarding accelerating the process, I wouldn't have a problem with a developer pushing a change after tests have passed, after receiving the clear approval of a developer competent in the subject, and when others aren't likely to disagree. Right now, the Countdown is also the time when other developers get the possibility to disagree. How can you know if others disagree before they tell you? (I worked in another project where it could happen that a patch was accepted within 10 minutes and pushed 30 minutes after upload. No way I can be online 24/7, and there were serious problems with the changes.) > It would take a little more trust and clarity of feedback than always waiting for the countdown. I don't expect that it would be the norm. We already have this opportunity (see David's concurrent reply). No, this should not be the norm and certainly is not, which is good in my opinion. To get forward, I'd propose a fresh thread on lilypond-devel with a concrete proposal on what to change, its pros and cons. Discussing this on a random review thread is just a waste of time. Jonas
Sign in to reply to this message.
On 2020/02/02 22:33:50, hanwenn wrote: > For testing, use > > https://github.com/hanwen/lilypond/tree/guile22-experiment So I ran this with the large example and I see GC Warning: Failed to expand heap by 30635458560 bytes a few times (that's 30 GB, my laptop only has 8 GB!!) and finally warning: g_spawn_sync failed (0): gs: Failed to fork (Cannot allocate memory) Maybe exponential growth is not the best choice here? At least my system can't handle this score anymore. Also most of the 'speedup' by this patch seems to come from segments _after_ music interpretation (only 10% improvement in music interpretation, but around 2x for the total time). I think the later parts just benefit from the enormous heap because GC doesn't need to run at all? But that's not really the idea of this patch, is it?
Sign in to reply to this message.
On Tue, Feb 4, 2020 at 7:50 PM <jonas.hahnfeld@gmail.com> wrote: > On 2020/02/02 22:33:50, hanwenn wrote: > > For testing, use > > > > https://github.com/hanwen/lilypond/tree/guile22-experiment > > So I ran this with the large example and I see > GC Warning: Failed to expand heap by 30635458560 bytes > a few times (that's 30 GB, my laptop only has 8 GB!!) and finally > warning: g_spawn_sync failed (0): gs: Failed to fork (Cannot allocate > memory) > Maybe exponential growth is not the best choice here? At least my system > can't handle this score anymore. > > Also most of the 'speedup' by this patch seems to come from segments > _after_ music interpretation (only 10% improvement in music > interpretation, but around 2x for the total time). I think the later > parts just benefit from the enormous heap because GC doesn't need to run > at all? But that's not really the idea of this patch, is it? > > In part, it actually is. If you have a lot of RAM, you should use it instead of trying to waste CPU cycles recycling it. do you have a pointer to the score for testing? Can you do some timings with different values for GC_MAXIMUM_HEAP_SIZE , eg. GC_MAXIMUM_HEAP_SIZE=100M Also, it would be interesting to GC_PRINT_STATS=1 and see how much time is spent in GC, and how much a typical GC runs reclaim across the process. -- Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen
Sign in to reply to this message.
On 2020/02/04 22:23:45, hanwenn wrote: > On Tue, Feb 4, 2020 at 7:50 PM <mailto:jonas.hahnfeld@gmail.com> wrote: > > > On 2020/02/02 22:33:50, hanwenn wrote: > > > For testing, use > > > > > > https://github.com/hanwen/lilypond/tree/guile22-experiment > > > > So I ran this with the large example and I see > > GC Warning: Failed to expand heap by 30635458560 bytes > > a few times (that's 30 GB, my laptop only has 8 GB!!) and finally > > warning: g_spawn_sync failed (0): gs: Failed to fork (Cannot allocate > > memory) > > Maybe exponential growth is not the best choice here? At least my system > > can't handle this score anymore. > > > > Also most of the 'speedup' by this patch seems to come from segments > > _after_ music interpretation (only 10% improvement in music > > interpretation, but around 2x for the total time). I think the later > > parts just benefit from the enormous heap because GC doesn't need to run > > at all? But that's not really the idea of this patch, is it? > > > > > In part, it actually is. If you have a lot of RAM, you should use it > instead of trying to waste CPU cycles recycling it. > > do you have a pointer to the score for testing? I posted this above: https://lists.gnu.org/archive/html/lilypond-user/2016-11/msg00700.html > Can you do some timings with different values for GC_MAXIMUM_HEAP_SIZE , eg. > > GC_MAXIMUM_HEAP_SIZE=100M Not sure if you actually mean GC_MAXIMUM_HEAP_SIZE or rather GC_INITIAL_HEAP_SIZE? guile22-experiment w/o the growing added in this patch: default: ~2m33s (gc time taken: 153.316484488) GC_MAXIMUM_HEAP_SIZE=100M -> "GC Warning: Out of Memory! Trying to continue..." (kind of makes sense because it needs ~900MB of RAM) GC_INITIAL_HEAP_SIZE=100M: 2m30s (gc time taken: 2.189005726; can't believe this) GC_INITIAL_HEAP_SIZE=900M: 0m59s (gc time taken: 8.895736892) > Also, it would be interesting to GC_PRINT_STATS=1 and see how much time is > spent in GC, and how much a typical GC runs reclaim across the process. Um, that outputs a lot. Anything in particular?
Sign in to reply to this message.
Some more numbers: I took guile22-experiment and removed the following: * GC_set_free_space_divisor, * GC_INITIAL_HEAP_SIZE=40M * heap growing in Score_engraver This gives me: ~2m30s (although I saw one run in 1m55s?!?) GC_INITIAL_HEAP_SIZE=40M: ~2m10s (one run in 1m40s) GC_FREE_SPACE_DIVISOR=1: very diverse - from ~1m10s (2 executions) to ~2m40s (3 executions) GC_NPROCS=1: from ~1m45s to ~2m10s This variation makes any statement about performance impractical. I tried taskset -c 2 + GC_NPROCS=1: ~2m10s and it seems stable, but I have yet to build a theory why OS thread migration actually improves performance... Han-Wen, did you see similar variations in your experiments?
Sign in to reply to this message.
On Feb 5, 2020, at 07:59, jonas.hahnfeld@gmail.com wrote: > > and it seems stable, but I have yet to build a theory why OS thread > migration actually improves performance… Educated guess: The core gets too hot. The OS is not allowed to move the thread to a cooler core, so it throttles the clock rate instead. — Dan
Sign in to reply to this message.
Dan Eble <dan@faithful.be> writes: > On Feb 5, 2020, at 07:59, jonas.hahnfeld@gmail.com wrote: >> >> and it seems stable, but I have yet to build a theory why OS thread >> migration actually improves performance… > > Educated guess: The core gets too hot. The OS is not allowed to move > the thread to a cooler core, so it throttles the clock rate instead. Heat dependent throttling would explain the rather drastic variation in performance since it acts with large latencies. -- David Kastrup
Sign in to reply to this message.
On Wed, Feb 5, 2020 at 12:33 PM <jonas.hahnfeld@gmail.com> wrote: > > Can you do some timings with different values for GC_MAXIMUM_HEAP_SIZE > , eg. > > > > GC_MAXIMUM_HEAP_SIZE=100M > > Not sure if you actually mean GC_MAXIMUM_HEAP_SIZE or rather > GC_INITIAL_HEAP_SIZE? > > I mean maximum, it sets the total amount of memory. If you set the initial heap size, it will still grow the heap, but from a larger starting point. I am wondering * what the amount of RAM is this score needs as a minimum, * how much over the minimum you need to be to get decent runtime. this could give us hints on how aggressive we should scale up the heap. guile22-experiment w/o the growing added in this patch: > default: ~2m33s (gc time taken: 153.316484488) > GC_MAXIMUM_HEAP_SIZE=100M -> "GC Warning: Out of Memory! Trying to > continue..." (kind of makes sense because it needs ~900MB of RAM) > GC_INITIAL_HEAP_SIZE=100M: 2m30s (gc time taken: 2.189005726; can't > believe this) > GC_INITIAL_HEAP_SIZE=900M: 0m59s (gc time taken: 8.895736892) > > > Also, it would be interesting to GC_PRINT_STATS=1 and see how much > time is > > spent in GC, and how much a typical GC runs reclaim across the > process. > > Um, that outputs a lot. Anything in particular? > when you see things like: World-stopped marking took 187 msecs (77 in average) In-use heap: 51% (40071 KiB pointers + 6357 KiB other) it means it took 187ms for GC, and then reclaimed 49% of the memory. If you often see higher percentages, we'll spend more CPU in marking memory without getting fresh memory. I tried to run the carver score, but it needs updating and, [hanwen@localhost lilypond]$ ./out/bin/convert-ly carver/*ly convert-ly (GNU LilyPond) 2.21.0 Traceback (most recent call last): File "./out/bin/convert-ly", line 413, in <module> main () File "./out/bin/convert-ly", line 387, in main f = f.decode (sys.stdin.encoding or "utf-8") AttributeError: 'str' object has no attribute 'decode' [hanwen@localhost lilypond]$ python2 ./out/bin/convert-ly carver/*ly Traceback (most recent call last): File "./out/bin/convert-ly", line 59, in <module> import lilylib as ly File "/home/hanwen/vc/lilypond/out/lib/lilypond/current/python/lilylib.py", line 216 print('command failed:', cmd, file=sys.stderr) ^ SyntaxError: invalid syntax Is there still work left for the python3 conversion? > https://codereview.appspot.com/561390043/ > -- Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen
Sign in to reply to this message.
On Thu, Feb 6, 2020 at 12:19 AM Han-Wen Nienhuys <hanwenn@gmail.com> wrote: > Um, that outputs a lot. Anything in particular? >> > > when you see things like: > > World-stopped marking took 187 msecs (77 in average) > In-use heap: 51% (40071 KiB pointers + 6357 KiB other) > > it means it took 187ms for GC, and then reclaimed 49% of the memory. > > If you often see higher percentages, we'll spend more CPU in marking > memory without getting fresh memory. > > so the question is: are there phases in the process where the percentage printed is high, and if so, what are they? -- Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen
Sign in to reply to this message.
Han-Wen Nienhuys <hanwenn@gmail.com> writes: > On Wed, Feb 5, 2020 at 12:33 PM <jonas.hahnfeld@gmail.com> wrote: > > when you see things like: > > World-stopped marking took 187 msecs (77 in average) > In-use heap: 51% (40071 KiB pointers + 6357 KiB other) > > it means it took 187ms for GC, and then reclaimed 49% of the memory. > > If you often see higher percentages, we'll spend more CPU in marking memory > without getting fresh memory. > > I tried to run the carver score, but it needs updating and, > > [hanwen@localhost lilypond]$ ./out/bin/convert-ly carver/*ly > convert-ly (GNU LilyPond) 2.21.0 > > Traceback (most recent call last): > File "./out/bin/convert-ly", line 413, in <module> > main () > File "./out/bin/convert-ly", line 387, in main > f = f.decode (sys.stdin.encoding or "utf-8") > AttributeError: 'str' object has no attribute 'decode' > [hanwen@localhost lilypond]$ python2 ./out/bin/convert-ly carver/*ly > Traceback (most recent call last): > File "./out/bin/convert-ly", line 59, in <module> > import lilylib as ly > File > "/home/hanwen/vc/lilypond/out/lib/lilypond/current/python/lilylib.py", line > 216 > print('command failed:', cmd, file=sys.stderr) > ^ > SyntaxError: invalid syntax > > Is there still work left for the python3 conversion? > >> https://codereview.appspot.com/561390043/ >> Yes, separate issue. Jonas, I think you should push the fix for that one to staging: from "doesn't work" there cannot be much more of a regression to be afraid of. -- David Kastrup
Sign in to reply to this message.
Am Donnerstag, den 06.02.2020, 00:27 +0100 schrieb David Kastrup: > Han-Wen Nienhuys < > hanwenn@gmail.com > > writes: > > > On Wed, Feb 5, 2020 at 12:33 PM < > > jonas.hahnfeld@gmail.com > > > wrote: > > > > when you see things like: > > > > World-stopped marking took 187 msecs (77 in average) > > In-use heap: 51% (40071 KiB pointers + 6357 KiB other) > > > > it means it took 187ms for GC, and then reclaimed 49% of the memory. > > > > If you often see higher percentages, we'll spend more CPU in marking memory > > without getting fresh memory. > > > > I tried to run the carver score, but it needs updating and, > > > > [hanwen@localhost lilypond]$ ./out/bin/convert-ly carver/*ly > > convert-ly (GNU LilyPond) 2.21.0 > > > > Traceback (most recent call last): > > File "./out/bin/convert-ly", line 413, in <module> > > main () > > File "./out/bin/convert-ly", line 387, in main > > f = f.decode (sys.stdin.encoding or "utf-8") > > AttributeError: 'str' object has no attribute 'decode' > > [hanwen@localhost lilypond]$ python2 ./out/bin/convert-ly carver/*ly > > Traceback (most recent call last): > > File "./out/bin/convert-ly", line 59, in <module> > > import lilylib as ly > > File > > "/home/hanwen/vc/lilypond/out/lib/lilypond/current/python/lilylib.py", line > > 216 > > print('command failed:', cmd, file=sys.stderr) > > ^ > > SyntaxError: invalid syntax > > > > Is there still work left for the python3 conversion? > > > > > https://codereview.appspot.com/561390043/ > > > > > > > > Yes, separate issue. Jonas, I think you should push the fix for that > one to staging: from "doesn't work" there cannot be much more of a > regression to be afraid of. Right, in staging now.
Sign in to reply to this message.
Thanks, I ran the carver score successfully now. I took some pauses between runs so thermal throttling wasn't an issue. My standard guile2.2 branch (with 40M initial heap), takes about 2m wall time, 1m of GC time. The stats printing shows that we have a very full heap: In-use heap: 85% (370824 KiB pointers + 60065 KiB other) In-use heap: 85% (370725 KiB pointers + 59737 KiB other) In-use heap: 85% (370143 KiB pointers + 59687 KiB other) .. so we have to scan a lot of live data to reclaim just 15% of the heap. Tinkering with INITIAL_HEAP lets us tune things a little bit: initial heap 2G: 50s (2.69s GC) initial heap 900M: 0:54 (6s GC) initial heap 500M: 1:09 (33s GC) The GC timings seem wonky, but since BDW is multithreaded, it's possible that the computation times don't completely add up. GUILE 1.8 in default configuration: wall time 40s, 100% CPU. On Thu, Feb 6, 2020 at 1:45 PM Jonas Hahnfeld <hahnjo@hahnjo.de> wrote: > Am Donnerstag, den 06.02.2020, 00:27 +0100 schrieb David Kastrup: > > Han-Wen Nienhuys < > > hanwenn@gmail.com > > > writes: > > > > > On Wed, Feb 5, 2020 at 12:33 PM < > > > jonas.hahnfeld@gmail.com > > > > wrote: > > > > > > when you see things like: > > > > > > World-stopped marking took 187 msecs (77 in average) > > > In-use heap: 51% (40071 KiB pointers + 6357 KiB other) > > > > > > it means it took 187ms for GC, and then reclaimed 49% of the memory. > > > > > > If you often see higher percentages, we'll spend more CPU in marking > memory > > > without getting fresh memory. > > > > > > I tried to run the carver score, but it needs updating and, > > > > > > [hanwen@localhost lilypond]$ ./out/bin/convert-ly carver/*ly > > > convert-ly (GNU LilyPond) 2.21.0 > > > > > > Traceback (most recent call last): > > > File "./out/bin/convert-ly", line 413, in <module> > > > main () > > > File "./out/bin/convert-ly", line 387, in main > > > f = f.decode (sys.stdin.encoding or "utf-8") > > > AttributeError: 'str' object has no attribute 'decode' > > > [hanwen@localhost lilypond]$ python2 ./out/bin/convert-ly carver/*ly > > > Traceback (most recent call last): > > > File "./out/bin/convert-ly", line 59, in <module> > > > import lilylib as ly > > > File > > > "/home/hanwen/vc/lilypond/out/lib/lilypond/current/python/lilylib.py", > line > > > 216 > > > print('command failed:', cmd, file=sys.stderr) > > > ^ > > > SyntaxError: invalid syntax > > > > > > Is there still work left for the python3 conversion? > > > > > > > https://codereview.appspot.com/561390043/ > > > > > > > > > > > > Yes, separate issue. Jonas, I think you should push the fix for that > > one to staging: from "doesn't work" there cannot be much more of a > > regression to be afraid of. > > Right, in staging now. > -- Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen
Sign in to reply to this message.
On Fri, Feb 7, 2020 at 5:53 PM Han-Wen Nienhuys <hanwenn@gmail.com> wrote: > Thanks, I ran the carver score successfully now. > > I took some pauses between runs so thermal throttling wasn't an issue. > > My standard guile2.2 branch (with 40M initial heap), takes about 2m wall > time, 1m of GC time. > > The stats printing shows that we have a very full heap: > > In-use heap: 85% (370824 KiB pointers + 60065 KiB other) > In-use heap: 85% (370725 KiB pointers + 59737 KiB other) > In-use heap: 85% (370143 KiB pointers + 59687 KiB other) > .. > > so we have to scan a lot of live data to reclaim just 15% of the heap. > > Tinkering with INITIAL_HEAP lets us tune things a little bit: > > initial heap 2G: 50s (2.69s GC) > initial heap 900M: 0:54 (6s GC) > initial heap 500M: 1:09 (33s GC) > > The GC timings seem wonky, but since BDW is multithreaded, it's possible > that the computation times don't completely add up. > > Single threaded, the numbers make more sense: MAX=INIT=2G gc time taken: 1.843968805 User time (seconds): 49.36 MAX=INIT=1G gc time taken: 3.291264925 User time (seconds): 51.74 MAX=INIT=800M gc time taken: 5.760042906 User time (seconds): 54.62 500M gc time taken: 17.921457247 User time (seconds): 68.24 It's interesting to note that the multithreaded collector doesn't really help. With a 500M heap, wall clock is 1:08 for the single threaded case, and 1:09 for the multithreaded case. The 800M case seems like a good configuration: it spends about 10% of the time doing GC. $ grep In-use log In-use heap: 0% (0 KiB pointers + 0 KiB other) In-use heap: 2% (20959 KiB pointers + 2430 KiB other) In-use heap: 2% (14698 KiB pointers + 2415 KiB other) In-use heap: 39% (292008 KiB pointers + 29486 KiB other) In-use heap: 47% (334968 KiB pointers + 53702 KiB other) In-use heap: 45% (317237 KiB pointers + 53424 KiB other) In-use heap: 57% (407528 KiB pointers + 67897 KiB other) In-use heap: 53% (378243 KiB pointers + 60449 KiB other) In-use heap: 53% (378249 KiB pointers + 60519 KiB other) In-use heap: 53% (377488 KiB pointers + 60322 KiB other) so we should scale the heap so that approximately 50% is collected on GC. -- Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen
Sign in to reply to this message.
Han-Wen Nienhuys <hanwenn@gmail.com> writes: > Single threaded, the numbers make more sense: > > > MAX=INIT=2G > gc time taken: 1.843968805 > User time (seconds): 49.36 > > MAX=INIT=1G > gc time taken: 3.291264925 > User time (seconds): 51.74 > > MAX=INIT=800M > gc time taken: 5.760042906 > User time (seconds): 54.62 > > 500M > gc time taken: 17.921457247 > User time (seconds): 68.24 > > > It's interesting to note that the multithreaded collector doesn't really > help. I think that this may be due to both/either our use of mark hooks and of finalisers for calling destructors. Either may cause serialisation. Another serialisation is because Guile itself switches BGC to Java mode where finalised objects can no longer be marked (or something like that: the exact semantics I do not remember). And of course the C++ free store still has to do its full job. That's one reason why I think it may be a good idea to "punt" a bit on the encoding stuff by keeping it basically in 8-bit domain: it may give us an operative LilyPond for experimenting with other Guilev2 aspects, like making a custom allocator for Scheme-containing data structures instead of being in a state where only parts are operative. -- David Kastrup My replies have a tendency to cause friction. To help mitigating damage, feel free to forward problematic posts to me adding a subject like "timeout 1d" (for a suggested timeout of 1 day) or "offensive".
Sign in to reply to this message.
hook into GC event
Sign in to reply to this message.
On Fri, Feb 7, 2020 at 7:27 PM David Kastrup <dak@gnu.org> wrote: > Han-Wen Nienhuys <hanwenn@gmail.com> writes: > > > Single threaded, the numbers make more sense: > > > > > > MAX=INIT=2G > > gc time taken: 1.843968805 > > User time (seconds): 49.36 > > > > MAX=INIT=1G > > gc time taken: 3.291264925 > > User time (seconds): 51.74 > > > > MAX=INIT=800M > > gc time taken: 5.760042906 > > User time (seconds): 54.62 > > > > 500M > > gc time taken: 17.921457247 > > User time (seconds): 68.24 > > > > > > It's interesting to note that the multithreaded collector doesn't really > > help. > > I think that this may be due to both/either our use of mark hooks and of > finalisers for calling destructors. Either may cause serialisation. > Another serialisation is because Guile itself switches BGC to Java mode > where finalised objects can no longer be marked (or something like that: > the exact semantics I do not remember). And of course the C++ free > store still has to do its full job. Interesting. I'll try to put this somewhere as a comment. > That's one reason why I think it may be a good idea to "punt" a bit on > the encoding stuff by keeping it basically in 8-bit domain: it may give > us an operative LilyPond for experimenting with other Guilev2 aspects, > like making a custom allocator for Scheme-containing data structures > instead of being in a state where only parts are operative. > > It looks like we'll rather have to punt on the GC scaling. Jonas' run of the carver score exploded so spectacularly, because the GC stats from libgc don't mean what I thought they meant, which means that the scaling was overly aggressive. It seems impossible to scale up the heap automatically for now. We should instead print a message directing people to set GC_INITIAL_HEAPSIZE=xxx in the environment. For more info, see also https://github.com/ivmai/bdwgc/issues/304. Ivan is open to extending the API, but until that percolates through all the distributions ... -- Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen
Sign in to reply to this message.
Use smob count as memory proxy
Sign in to reply to this message.
On 2020/02/08 20:05:53, hanwenn wrote: > Use smob count as memory proxy This looks good to me now; softcoding the 2000 is left as an exercise to the reader in another commit
Sign in to reply to this message.
https://codereview.appspot.com/561390043/diff/567180043/lily/include/smobs.hh File lily/include/smobs.hh (right): https://codereview.appspot.com/561390043/diff/567180043/lily/include/smobs.hh... lily/include/smobs.hh:312: static size_t count; It seems that this is initialized to zero because it is static, but if it simply had an "= 0", I wouldn't have had to go refresh my memory with a web search. Is it correct that there are no static Smobs anywhere in the program? If there were, it would be responsible to spend some time checking whether this could be affected by the "static initialization order fiasco." (Maybe you already have.)
Sign in to reply to this message.
The approach sounds good to me. I can confirm that this works on my system and the runtime is pretty good. Please readd the autoconf logic to detect bdw-gc. FWIW I tried to research on the internals of GC. I found the following statement that decides whether to collect or just expand the heap: https://github.com/ivmai/bdwgc/blob/v8.0.4/alloc.c#L1435 We are hit by the case (GC_fo_entries > (last_fo_entries + 500) && (last_bytes_finalized | GC_bytes_finalized) != 0) Removing this case gets me to a state very close to this patch, both in terms of performance and memory usage. If I understand the approach correctly, GC_fo_entries counts the number of outstanding finalizations. It seems like Guile is making good use of this functionality, so GC tries to reclaim that memory first instead of just expanding the heap. Please note that the number 500 is hard-coded, so we have no means to influence it via environment variables or APIs. I guess something like that could be added, but it would again take some time to bubble through all distributions that we care about. TL;DR: For now it's probably easiest to scale the heap as this patch does. https://codereview.appspot.com/561390043/diff/567180043/lily/include/smobs.hh File lily/include/smobs.hh (right): https://codereview.appspot.com/561390043/diff/567180043/lily/include/smobs.hh... lily/include/smobs.hh:312: static size_t count; On 2020/02/09 03:29:26, Dan Eble wrote: > It seems that this is initialized to zero because it is static, but if it simply > had an "= 0", I wouldn't have had to go refresh my memory with a web search. > > Is it correct that there are no static Smobs anywhere in the program? If there > were, it would be responsible to spend some time checking whether this could be > affected by the "static initialization order fiasco." (Maybe you already have.) IIRC static objects without initialization are put into .bss which is initialized to 0 when loading the binary. Adding "= 0" puts the variable in data, but it'll still be initialized at load time. "static initialization" is only a problem when the constructor is doing work, which is not the case for an integer. Anyway I think adding a comment here and in smobs.cc how this is initialized (refer to static lifetime in C/C++) would likely clarify the intent. https://codereview.appspot.com/561390043/diff/567180043/lily/smobs.cc File lily/smobs.cc (right): https://codereview.appspot.com/561390043/diff/567180043/lily/smobs.cc#newcode23 lily/smobs.cc:23: #include <gc/gc.h> You probably want this to be guarded by GUILEV2? https://codereview.appspot.com/561390043/diff/567180043/lily/smobs.cc#newcode46 lily/smobs.cc:46: printf ("bytes per obj %d\n", size / count); Please remove before committing.
Sign in to reply to this message.
On Sun, Feb 9, 2020 at 10:25 AM <jonas.hahnfeld@gmail.com> wrote: > The approach sounds good to me. I can confirm that this works on my > system and the runtime is pretty good. Please readd the autoconf logic > to detect bdw-gc. > will do. > FWIW I tried to research on the internals of GC. I found the following > statement that decides whether to collect or just expand the heap: > https://github.com/ivmai/bdwgc/blob/v8.0.4/alloc.c#L1435 We are hit by > the case > (GC_fo_entries > (last_fo_entries + 500) && (last_bytes_finalized | > GC_bytes_finalized) != 0) > Removing this case gets me to a state very close to this patch, both in > terms of performance and memory usage. > > If I understand the approach correctly, GC_fo_entries counts the number > of outstanding finalizations. It seems like Guile is making good use of > this functionality, so GC tries to reclaim that memory first instead of > just expanding the heap. Please note that the number 500 is hard-coded, > so we have no means to influence it via environment variables or APIs. I > guess something like that could be added, but it would again take some > time to bubble through all distributions that we care about. > Aha! That explains why we see such poor performance, because we rely on finalizers a lot. If we were to move away from finalizers and let bdwgc handle all of the memory management, then that it would also become a non-issue? TL;DR: For now it's probably easiest to scale the heap as this patch > does. > > https://codereview.appspot.com/561390043/diff/567180043/lily/smobs.cc#newcode23 > lily/smobs.cc:23 > <https://codereview.appspot.com/561390043/diff/567180043/lily/smobs.cc#newcode...>: > #include <gc/gc.h> > You probably want this to be guarded by GUILEV2? > > yep. -- Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen
Sign in to reply to this message.
On 2020/02/09 09:36:40, hanwenn wrote: > On Sun, Feb 9, 2020 at 10:25 AM <mailto:jonas.hahnfeld@gmail.com> wrote: > > FWIW I tried to research on the internals of GC. I found the following > > statement that decides whether to collect or just expand the heap: > > https://github.com/ivmai/bdwgc/blob/v8.0.4/alloc.c#L1435 We are hit by > > the case > > (GC_fo_entries > (last_fo_entries + 500) && (last_bytes_finalized | > > GC_bytes_finalized) != 0) > > Removing this case gets me to a state very close to this patch, both in > > terms of performance and memory usage. > > > > If I understand the approach correctly, GC_fo_entries counts the number > > of outstanding finalizations. It seems like Guile is making good use of > > this functionality, so GC tries to reclaim that memory first instead of > > just expanding the heap. Please note that the number 500 is hard-coded, > > so we have no means to influence it via environment variables or APIs. I > > guess something like that could be added, but it would again take some > > time to bubble through all distributions that we care about. > > Aha! That explains why we see such poor performance, because we rely on > finalizers a lot. If we were to move away from finalizers and let bdwgc > handle all of the memory management, then that it would also become a > non-issue? Disclaimer: I'm not yet very familiar with LilyPond's Scheme code. If you refer to scm_set_smob_free / Smob::free_smob, then I think you are right. Is there an easy way to get rid of this, maybe just for testing?
Sign in to reply to this message.
On 2020/02/09 09:55:12, hahnjo wrote: > On 2020/02/09 09:36:40, hanwenn wrote: > > On Sun, Feb 9, 2020 at 10:25 AM <mailto:jonas.hahnfeld@gmail.com> wrote: > > > FWIW I tried to research on the internals of GC. I found the following > > > statement that decides whether to collect or just expand the heap: > > > https://github.com/ivmai/bdwgc/blob/v8.0.4/alloc.c#L1435 We are hit by > > > the case > > > (GC_fo_entries > (last_fo_entries + 500) && (last_bytes_finalized | > > > GC_bytes_finalized) != 0) > > > Removing this case gets me to a state very close to this patch, both in > > > terms of performance and memory usage. > > > > > > If I understand the approach correctly, GC_fo_entries counts the number > > > of outstanding finalizations. It seems like Guile is making good use of > > > this functionality, so GC tries to reclaim that memory first instead of > > > just expanding the heap. Please note that the number 500 is hard-coded, > > > so we have no means to influence it via environment variables or APIs. I > > > guess something like that could be added, but it would again take some > > > time to bubble through all distributions that we care about. > > > > Aha! That explains why we see such poor performance, because we rely on > > finalizers a lot. If we were to move away from finalizers and let bdwgc > > handle all of the memory management, then that it would also become a > > non-issue? > > Disclaimer: I'm not yet very familiar with LilyPond's Scheme code. If you refer > to scm_set_smob_free / Smob::free_smob, then I think you are right. Is there an > easy way to get rid of this, maybe just for testing? You could define operator new/delete in the smob class (using scm_gc_malloc), and then let BDWGC handle memory for the smobs. The problem is that this will not reclaim any STL containers (ie. vectors) contained in the smobs, and especially engravers are full of them. A way around that is to change all instances of vectors holding SCM values to a new gc_vector type that has a custom allocator. That will be significant work, but probably desirable in the long term.
Sign in to reply to this message.
Jonas' comments
Sign in to reply to this message.
https://codereview.appspot.com/561390043/diff/567180043/lily/include/smobs.hh File lily/include/smobs.hh (right): https://codereview.appspot.com/561390043/diff/567180043/lily/include/smobs.hh... lily/include/smobs.hh:312: static size_t count; On 2020/02/09 03:29:26, Dan Eble wrote: > It seems that this is initialized to zero because it is static, but if it simply > had an "= 0", I wouldn't have had to go refresh my memory with a web search. > > Is it correct that there are no static Smobs anywhere in the program? If there > were, it would be responsible to spend some time checking whether this could be > affected by the "static initialization order fiasco." (Maybe you already have.) I've added an = 0 to the code in .cc. FWIW, This behavior is ANSI C (C89 even), which is a tiny standard compared to the C++11 standard, so I think it is reasonable to expect that people know this. Smobs are objects exposed as SCM values, so it is impossible to create them before GUILE is booted up; the protected status of the Smob_core ctor enforces this. https://codereview.appspot.com/561390043/diff/567180043/lily/smobs.cc File lily/smobs.cc (right): https://codereview.appspot.com/561390043/diff/567180043/lily/smobs.cc#newcode23 lily/smobs.cc:23: #include <gc/gc.h> On 2020/02/09 09:25:20, hahnjo wrote: > You probably want this to be guarded by GUILEV2? Done. https://codereview.appspot.com/561390043/diff/567180043/lily/smobs.cc#newcode46 lily/smobs.cc:46: printf ("bytes per obj %d\n", size / count); On 2020/02/09 09:25:20, hahnjo wrote: > Please remove before committing. Done.
Sign in to reply to this message.
comment
Sign in to reply to this message.
LGTM, thanks for this clearly documented change!
Sign in to reply to this message.
https://codereview.appspot.com/561390043/diff/567180043/lily/include/smobs.hh File lily/include/smobs.hh (right): https://codereview.appspot.com/561390043/diff/567180043/lily/include/smobs.hh... lily/include/smobs.hh:312: static size_t count; On 2020/02/09 03:29:26, Dan Eble wrote: > It seems that this is initialized to zero because it is static, but if it simply > had an "= 0", I wouldn't have had to go refresh my memory with a web search. > > Is it correct that there are no static Smobs anywhere in the program? Depends on what you call "Smob". Protected_scm is a data type _explicitly_ intended for static variables containing SCM values. But it must not contain anything but "immediate" values at program start time since the memory protection machinery does not start until well into main. > If there > were, it would be responsible to spend some time checking whether this could be > affected by the "static initialization order fiasco." (Maybe you already have.) There was something where code was getting run by constructors inserting into a static STL vector that had its own constructors not yet run. Don't remember the details. I think it was part of the motivation to create Protected_scm .
Sign in to reply to this message.
On Feb 9, 2020, at 04:25, jonas.hahnfeld@gmail.com wrote: > > Anyway I think adding a comment here and in smobs.cc how this is > initialized (refer to static lifetime in C/C++) would likely clarify the > intent. Initialization is an area where the legacy of the language is complicated (more connotative adjectives come to mind), and mutable static data is not in daily use, and failure to initialize is a well-known class of error, so a "what it does" comment (that people usually don't like) would help. — Dan
Sign in to reply to this message.
On 2020/02/09 10:08:20, hanwenn wrote: > A way around that is to change all instances of vectors holding SCM values to a > new > gc_vector type that has a custom allocator. That will be significant work, but > probably > desirable in the long term. I'd recommend a two-pronged approach for now, lobbying for changes in the bdwgc that make it suitable for dealing with a larger ratio of finalisers and, while angling for going to vectors with their own allocators, sticking with code that can do either. That way bdwgc will eventually become better for our current work load and if it leads to faster or more resource-efficient behavior to rely on finalisers for marking rather than treating everything as conservative pointers, we still have the option to return to what we are doing, or pick a mixed strategy (there likely would be no point in ever marking SCM arrays via mark hooks if we can just allocate them in mark-everything memory areas: at least that part should never be contentious, and STL structures of SCM are the most irksome with regard to not causing trouble before creation or after destruction).
Sign in to reply to this message.
commit a19aed147bf1605b21cbe7b1909ff6cbf519fb64 Author: Han-Wen Nienhuys <hanwen@lilypond.org> Date: Sat Feb 8 21:02:12 2020 +0100 GUILE2: Scale GC heap with the number of smobs
Sign in to reply to this message.
|