|
|
Created:
5 years, 2 months ago by hanwenn Modified:
5 years, 1 month ago CC:
lilypond-devel_gnu.org Visibility:
Public. |
DescriptionAdd a cooperative FS lock to lilypond-book.
This simplifies the build infrastructure, because it obviates Makefile
hacks to force a single lilypond-book processes during the build
Patch Set 1 #
Total comments: 6
Patch Set 2 : fcntl #Patch Set 3 : timing test #Patch Set 4 : spaces #Patch Set 5 : eager checksums #Patch Set 6 : harden #Patch Set 7 : lockfilename #Patch Set 8 : rebase #
MessagesTotal messages: 36
The current change leaves a few questions unanswered: What should lilypond-book do if there happens to be an old .lock file around? Right now, it just sits there and does nothing which is not obvious to the user. Also, what's the benefit of doing this? Is it worth doing in terms of runtime? https://codereview.appspot.com/555360043/diff/551490043/scripts/lilypond-book.py File scripts/lilypond-book.py (right): https://codereview.appspot.com/555360043/diff/551490043/scripts/lilypond-book... scripts/lilypond-book.py:458: lock_file = os.path.join(options.lily_output_dir + ".lock") At first glance I thought this was wrong and should have two arguments to join (otherwise the call is useless). After seeing that you only mkdir lily_output_dir below, maybe you want a file $(basename lily_output_dir).lock at the parent directory? That's a bold assumption that there is no / at the end... https://codereview.appspot.com/555360043/diff/551490043/scripts/lilypond-book... scripts/lilypond-book.py:460: while 1: while True https://codereview.appspot.com/555360043/diff/551490043/scripts/lilypond-book... scripts/lilypond-book.py:477: os.close(lockfd) You should close the file before removing it.
Sign in to reply to this message.
I think this is worth it because it simplifies the build system, and puts the locking in the place where we actually access the resource. I take your point about dropped files; the best would be fcntl locks, but I am worried that they might not be supported on windows. Would you know? Maybe we can just use fcntl locks on unix, and Windows users should just not try to run parallel lp-book invocations. https://codereview.appspot.com/555360043/diff/551490043/scripts/lilypond-book.py File scripts/lilypond-book.py (right): https://codereview.appspot.com/555360043/diff/551490043/scripts/lilypond-book... scripts/lilypond-book.py:458: lock_file = os.path.join(options.lily_output_dir + ".lock") On 2020/02/23 15:18:26, hahnjo wrote: > At first glance I thought this was wrong and should have two arguments to join > (otherwise the call is useless). After seeing that you only mkdir > lily_output_dir below, maybe you want a file $(basename lily_output_dir).lock at > the parent directory? That's a bold assumption that there is no / at the end... leftover; removed. https://codereview.appspot.com/555360043/diff/551490043/scripts/lilypond-book... scripts/lilypond-book.py:460: while 1: On 2020/02/23 15:18:26, hahnjo wrote: > while True Done. https://codereview.appspot.com/555360043/diff/551490043/scripts/lilypond-book... scripts/lilypond-book.py:477: os.close(lockfd) On 2020/02/23 15:18:26, hahnjo wrote: > You should close the file before removing it. on the contrary. We don't have to close it at all (on exit, all files are closed automatically.). If we close first, there is a larger chance of leaving the lock file hanging around.
Sign in to reply to this message.
On 2020/02/23 15:54:54, hanwenn wrote: > I think this is worth it because it simplifies the build system, and puts the > locking in the place where we actually access the resource. Let me disagree: It complicates lilypond-book with something that no (external) user of the script cares about. So IMHO adding brittle locking requires more justification than that. > I take your point about dropped files; the best would be fcntl locks, but I am > worried that they might not be supported on windows. Would you know? > > Maybe we can just use fcntl locks on unix, and Windows users should just not try > to run parallel lp-book invocations. Can we please first take a step back and see how much benefit there actually is?
Sign in to reply to this message.
fcntl
Sign in to reply to this message.
On 2020/02/23 15:54:54, hanwenn wrote: > I think this is worth it because it simplifies the build system, and puts the > locking in the place where we actually access the resource. Is there any indication that letting Make run multiple instances of lilypond-book with every instance except one at a time locking up is going to be a net win for performance? I still don't see what this is supposed to buy us over using CPU_COUNTS for invoking parallel LilyPond instances. In particular since the parallel LilyPond instances are forked off at a time when LilyPond has completed its startup, and in the context of the current Guile-v2 integration, startup times are relevant. Even though considering the number of files processed in one LilyPond process, their overall impact should still be comparatively confined.
Sign in to reply to this message.
timing test
Sign in to reply to this message.
On 2020/02/23 16:05:08, dak wrote: > On 2020/02/23 15:54:54, hanwenn wrote: > > I think this is worth it because it simplifies the build system, and puts the > > locking in the place where we actually access the resource. > > Is there any indication that letting Make run multiple instances of > lilypond-book with every instance except one at a time locking up is going to be > a net win for performance? input/regression/lilypond-book: rm -rf out-tst; time make out=tst local-test -j4 CPU_COUNT=4 before real 1m16.588s after real 0m25.224s > I still don't see what this is supposed to buy us over using CPU_COUNTS for > invoking parallel LilyPond instances. In particular since the parallel LilyPond > instances are forked off at a time when LilyPond has completed its startup, and > in the context of the current Guile-v2 integration, startup times are relevant. > Even though considering the number of files processed in one LilyPond process, > their overall impact should still be comparatively confined. The problem is that several other things are serialized in the build because of lilypond-book. I used fcntl locking which impervious to stale locks (exiting a process drops locks automatically)
Sign in to reply to this message.
On 2020/02/23 16:23:34, hanwenn wrote: > On 2020/02/23 16:05:08, dak wrote: > > On 2020/02/23 15:54:54, hanwenn wrote: > > > I think this is worth it because it simplifies the build system, and puts > the > > > locking in the place where we actually access the resource. > > > > Is there any indication that letting Make run multiple instances of > > lilypond-book with every instance except one at a time locking up is going to > be > > a net win for performance? > > input/regression/lilypond-book: > > rm -rf out-tst; time make out=tst local-test -j4 CPU_COUNT=4 > > > before > real 1m16.588s > > after > real 0m25.224s So the idea is not as much to run parallel instances of lilypond-book but rather let lilypond-book itself do the serialization. The net result will be that Make counts lilypond-book's use of 4 CPUs as just a single CPU, so unless the parallel makes run into a locking instance of lilypond-book, this will now result in a maximum of 7 jobs in parallel, right?
Sign in to reply to this message.
spaces
Sign in to reply to this message.
On 2020/02/23 16:29:20, dak wrote: > On 2020/02/23 16:23:34, hanwenn wrote: > > On 2020/02/23 16:05:08, dak wrote: > > > On 2020/02/23 15:54:54, hanwenn wrote: > > > > I think this is worth it because it simplifies the build system, and puts > > the > > > > locking in the place where we actually access the resource. > > > > > > Is there any indication that letting Make run multiple instances of > > > lilypond-book with every instance except one at a time locking up is going > to > > be > > > a net win for performance? > > > > input/regression/lilypond-book: > > > > rm -rf out-tst; time make out=tst local-test -j4 CPU_COUNT=4 > > > > > > before > > real 1m16.588s > > > > after > > real 0m25.224s > > So the idea is not as much to run parallel instances of lilypond-book but rather > let lilypond-book itself do the serialization. > > The net result will be that Make counts lilypond-book's use of 4 CPUs as just a > single CPU, so unless the parallel makes run into a locking instance of > lilypond-book, this will now result in a maximum of 7 jobs in parallel, right? Correct. I have another separate plan, which is to do cat $(find -name '*.itely' -or '*.tely' | grep -v '(out|out-www)' ) > concat.itely and then have a special rule run that through lp-book as a whole. That should also get rid of a bunch of inefficiencies.
Sign in to reply to this message.
eager checksums
Sign in to reply to this message.
On 2020/02/23 15:59:14, hahnjo wrote: > On 2020/02/23 15:54:54, hanwenn wrote: > > I think this is worth it because it simplifies the build system, and puts the > > locking in the place where we actually access the resource. > > Let me disagree: It complicates lilypond-book with something that no (external) > user of the script cares about. So IMHO adding brittle locking requires more > justification than that. > > > I take your point about dropped files; the best would be fcntl locks, but I am > > worried that they might not be supported on windows. Would you know? > > > > Maybe we can just use fcntl locks on unix, and Windows users should just not > try > > to run parallel lp-book invocations. > > Can we please first take a step back and see how much benefit there actually is? To be fair, the current situation is that _anybody_ should just not try to run parallel lp-book invocations, whether from our build system, started manually from different shells with the same database, or in any other manner. The lilybook database is quite a big hack with its main purpose being speeding up our doc build. I am not quite sure whether normal lilypond-book invocations would even use it. If they do, the lock might be separately useful to what is going on in our build process.
Sign in to reply to this message.
harden
Sign in to reply to this message.
Jonas, did you want to have another look?
Sign in to reply to this message.
On 2020/02/25 08:09:21, hanwenn wrote: > Jonas, did you want to have another look? Yes, hopefully later today, no guarantee though
Sign in to reply to this message.
So I can see a consistent improvement by ~40s for 'make -j4 CPU_COUNT=4 test', going down from ~4m to 3m30s. The patch at https://codereview.appspot.com/547680043 explains that this is due to parallelism in input/regression/lilypond-book/. I see no influence on 'make -j4 CPU_COUNT=4 doc', staying flat at around 29m on my laptop. If only looking at input/regression/lilypond-book/, can't we just use different 'lily_output_dir's for each target? That should still allow us to run in parallel without the locking solution proposed in this patch. Correct me if I'm wrong: This should have no negative influence since files from input/regression/lilypond-book/ are not reused in other parts of the tests / documentation. Another solution might be serialize only lilypond-book and let tex et al. run concurrently. That should also be harmless, right? In total I'm still not convinced by this complexity.
Sign in to reply to this message.
On Tue, Feb 25, 2020 at 11:09 PM <jonas.hahnfeld@gmail.com> wrote: > Another solution might be serialize only lilypond-book and let tex et > al. run concurrently. That should also be harmless, right? But this is exactly what this patch does. I don't understand your objection. Serializing mechanism in the makefile are obscure and hard to understand, because build systems want to do as many things in parallel as possible. A lock (a file lock, in this case) is the standard solution for serializing concurrent access to a shared resource (a standard problem). What is your objection against using a standard solution? On a philosophical level, it is a lilypond-book implementation detail that it can't deal with concurrent invocation, so the remediation for this problem should be in lilypond-book too. > In total I'm still not convinced by this complexity. > > https://codereview.appspot.com/555360043/ -- Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen
Sign in to reply to this message.
On 2020/02/26 07:59:36, hanwenn wrote: > On Tue, Feb 25, 2020 at 11:09 PM <mailto:jonas.hahnfeld@gmail.com> wrote: > > Another solution might be serialize only lilypond-book and let tex et > > al. run concurrently. That should also be harmless, right? > > But this is exactly what this patch does. I meant "serialize only lilypond-book in the Makefile [...]", sorry for not being specific. I agree that this patch attempts to go this way in lilypond-book, and that's what I object to, see below. > I don't understand your objection. Serializing mechanism in the > makefile are obscure and hard to understand, because build systems > want to do as many things in parallel as possible. ... so it's the build system's responsibility to get things right. In our case this means: Do *not* call lilypond-book in parallel. > A lock (a file lock, in this case) is the standard solution for > serializing concurrent access to a shared resource (a standard > problem). What is your objection against using a standard solution? Yes, locks are a standard solution, but file locks are brittle. I've seen them fail far too often (ever had your apt-get / yum / pacman error out because there was a lock-file?) so I object to adding this complexity if it only helps for a single case in our build (ie input/regression/lilypond-book/). > On a philosophical level, it is a lilypond-book implementation detail > that it can't deal with concurrent invocation, so the remediation for > this problem should be in lilypond-book too. Let me disagree: It's an implementation detail of make that it runs things in parallel. IMHO a build system should ensure that the result of running with multiple jobs is the same as a sequential run.
Sign in to reply to this message.
On 2020/02/26 08:19:39, hahnjo wrote: > On 2020/02/26 07:59:36, hanwenn wrote: > > On Tue, Feb 25, 2020 at 11:09 PM <mailto:jonas.hahnfeld@gmail.com> wrote: > > > Another solution might be serialize only lilypond-book and let tex et > > > al. run concurrently. That should also be harmless, right? > > > > But this is exactly what this patch does. > > I meant "serialize only lilypond-book in the Makefile [...]", sorry for not > being specific. I agree that this patch attempts to go this way in > lilypond-book, and that's what I object to, see below. > > > I don't understand your objection. Serializing mechanism in the > > makefile are obscure and hard to understand, because build systems > > want to do as many things in parallel as possible. > > ... so it's the build system's responsibility to get things right. In our case > this means: Do *not* call lilypond-book in parallel. > > > A lock (a file lock, in this case) is the standard solution for > > serializing concurrent access to a shared resource (a standard > > problem). What is your objection against using a standard solution? > > Yes, locks are a standard solution, but file locks are brittle. I've seen them > fail far too often (ever had your apt-get / yum / pacman error out because there > was a lock-file?) so I object to adding this complexity if it only helps for a > single case in our build (ie input/regression/lilypond-book/). > > > On a philosophical level, it is a lilypond-book implementation detail > > that it can't deal with concurrent invocation, so the remediation for > > this problem should be in lilypond-book too. > > Let me disagree: It's an implementation detail of make that it runs things in > parallel. IMHO a build system should ensure that the result of running with > multiple jobs is the same as a sequential run. That said: I'm also fine if some other developer accepts this patch. See my timing data above to get to your own conclusion. After all, my opinion is just one of a larger range.
Sign in to reply to this message.
On Wed, Feb 26, 2020 at 9:19 AM <jonas.hahnfeld@gmail.com> wrote: > > A lock (a file lock, in this case) is the standard solution for > > serializing concurrent access to a shared resource (a standard > > problem). What is your objection against using a standard solution? > > Yes, locks are a standard solution, but file locks are brittle. I've > seen them fail far too often (ever had your apt-get / yum / pacman error > out because there was a lock-file?) so I object to adding this No, not of late. it's useful to distinguish between "file locks" and "lock files". The latter are a form of the former, but they rely on the lock process to remove lock files if the process aborts. Git uses these files pervasively, the reason being that this is the only way to make locking work on NFS. Maybe you've seen problems with Git? fcntl locks used here are managed by the kernel. If the process holding the lock dies, the lock is freed. So there is no staleness (but they don't work on NFS). I challenge you to come up with a mechanism where one can observe brittle behavior. In this patch, we create a "xxx.lock" file, which is a little ugly. Let me see if we can lock the directory directly. > > > On a philosophical level, it is a lilypond-book implementation detail > > that it can't deal with concurrent invocation, so the remediation for > > this problem should be in lilypond-book too. > > Let me disagree: It's an implementation detail of make that it runs > things in parallel. IMHO a build system should ensure that the result of > running with multiple jobs is the same as a sequential run. > > https://codereview.appspot.com/555360043/ -- Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen
Sign in to reply to this message.
lockfilename
Sign in to reply to this message.
On Wed, Feb 26, 2020 at 9:59 AM Han-Wen Nienhuys <hanwenn@gmail.com> wrote: > In this patch, we create a "xxx.lock" file, which is a little ugly. > Let me see if we can lock the directory directly. you can't (it has to be a file.) see https://gavv.github.io/articles/file-locks/#common-features for more background. -- Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen
Sign in to reply to this message.
On 2020/02/26 08:28:33, hahnjo wrote: > On 2020/02/26 08:19:39, hahnjo wrote: > > > On a philosophical level, it is a lilypond-book implementation detail > > > that it can't deal with concurrent invocation, so the remediation for > > > this problem should be in lilypond-book too. > > > > Let me disagree: It's an implementation detail of make that it runs things in > > parallel. IMHO a build system should ensure that the result of running with > > multiple jobs is the same as a sequential run. > > That said: I'm also fine if some other developer accepts this patch. See my > timing data above to get to your own conclusion. After all, my opinion is just > one of a larger range. My take on this is that this "implementation detail" of parallel invocation resulting in awkward breakage is something that warrants fixing irrespective of our build system. All that the UG states here is ‘--lily-output-dir=DIR’ Write lily-XXX files to directory DIR, link into ‘--output’ directory. Use this option to save building time for documents in different directories which share a lot of identical snippets. It doesn't state at all what happens in cases of contentions. Fixing contentions with a lock is a brute-force solution just not allowing for parallelism, but it is a solution to the contention problem. It is not a solution to lilypond-book starting more jobs than Make knows about. Or to all but one lilypond-book invocation not doing any progress and blocking Make which could instead start other actual single-process tasks. So I see this patch and its approach as an improvement to lilypond-book. I don't see that it solves the parallel build carnage: it just scales down the impact from having to choose between complete serialization and database failure.
Sign in to reply to this message.
On 2020/02/26 11:59:14, dak wrote: > On 2020/02/26 08:28:33, hahnjo wrote: > > On 2020/02/26 08:19:39, hahnjo wrote: > > > > > On a philosophical level, it is a lilypond-book implementation detail > > > > that it can't deal with concurrent invocation, so the remediation for > > > > this problem should be in lilypond-book too. > > > > > > Let me disagree: It's an implementation detail of make that it runs things > in > > > parallel. IMHO a build system should ensure that the result of running with > > > multiple jobs is the same as a sequential run. > > > > That said: I'm also fine if some other developer accepts this patch. See my > > timing data above to get to your own conclusion. After all, my opinion is just > > one of a larger range. > > My take on this is that this "implementation detail" of parallel invocation > resulting in awkward breakage is something that warrants fixing irrespective of > our build system. All that the UG states here is > > ‘--lily-output-dir=DIR’ > Write lily-XXX files to directory DIR, link into ‘--output’ > directory. Use this option to save building time for documents in > different directories which share a lot of identical snippets. > > It doesn't state at all what happens in cases of contentions. Fixing > contentions with a lock is a brute-force solution just not allowing for > parallelism, but it is a solution to the contention problem. > > It is not a solution to lilypond-book starting more jobs than Make knows about. > Or to all but one lilypond-book invocation not doing any progress and blocking > Make which could instead start other actual single-process tasks. So I see this > patch and its approach as an improvement to lilypond-book. I don't see that it > solves the parallel build carnage: it just scales down the impact from having to > choose between complete serialization and database failure. David, I think you are saying this patch is LGTM - could you be explicit, so james understands what is going on?
Sign in to reply to this message.
On 2020/02/28 17:57:06, hanwenn wrote: > On 2020/02/26 11:59:14, dak wrote: > > It doesn't state at all what happens in cases of contentions. Fixing > > contentions with a lock is a brute-force solution just not allowing for > > parallelism, but it is a solution to the contention problem. > > > > It is not a solution to lilypond-book starting more jobs than Make knows about. > > Or to all but one lilypond-book invocation not doing any progress and blocking > > Make which could instead start other actual single-process tasks. So I see this > > patch and its approach as an improvement to lilypond-book. I don't see that it > > solves the parallel build carnage: it just scales down the impact from having to > > choose between complete serialization and database failure. > > David, I think you are saying this patch is LGTM - could you be explicit, so > james understands what is going on? I think this patch is an improvement over the status quo. It's sort of a crutch that works only on some systems and not on NFS as far as I understand. And it doesn't actually work well as a job control measure in connection with parallel Make. But it does improve lilypond-book behavior on some systems. I think that a restricted form of locking is better than nothing. I am incidentally not sure just what kind of file systems minimal VMs without a file system of their own work with: if they get an NFS view, this would not even work with Lilydev which would be bad. But I don't know how VMs do file systems without a partition of their own.
Sign in to reply to this message.
rebase
Sign in to reply to this message.
On 2020/02/28 18:14:14, dak wrote: > On 2020/02/28 17:57:06, hanwenn wrote: > > On 2020/02/26 11:59:14, dak wrote: > > > > It doesn't state at all what happens in cases of contentions. Fixing > > > contentions with a lock is a brute-force solution just not allowing for > > > parallelism, but it is a solution to the contention problem. > > > > > > It is not a solution to lilypond-book starting more jobs than Make knows > about. > > > Or to all but one lilypond-book invocation not doing any progress and > blocking > > > Make which could instead start other actual single-process tasks. So I see > this > > > patch and its approach as an improvement to lilypond-book. I don't see that > it > > > solves the parallel build carnage: it just scales down the impact from > having to > > > choose between complete serialization and database failure. > > > > David, I think you are saying this patch is LGTM - could you be explicit, so > > james understands what is going on? > > I think this patch is an improvement over the status quo. It's sort of a crutch > that works only on some systems and not on NFS as far as I understand. And it > doesn't actually work well as a job control measure in connection with parallel > Make. But it does improve lilypond-book behavior on some systems. I think that > a restricted form of locking is better than nothing. I am incidentally not sure > just what kind of file systems minimal VMs without a file system of their own > work with: if they get an NFS view, this would not even work with Lilydev which > would be bad. But I don't know how VMs do file systems without a partition of > their own. Sigh. I just noticed that opposed to the patch title, this does not just introduce a file lock for lilypond-book but _also_ changes the build system such that now almost double the number of allocated jobs get used. It would be good if different topics weren't conflated into single issues so that it's easier to discuss what one is actually dealing with and make decisions based on the respective merits of the individual parts. "It doesn't actually work well as a job control measure in connection with parallel Make" should likely have been an indicator of what I thought I was talking about.
Sign in to reply to this message.
On Fri, Mar 6, 2020 at 11:18 PM <dak@gnu.org> wrote: > > Sigh. I just noticed that opposed to the patch title, this does not > just introduce a file lock for lilypond-book but _also_ changes the > build system such that now almost double the number of allocated jobs > get used. It would be good if different topics weren't conflated into > single issues so that it's easier to discuss what one is actually > dealing with and make decisions based on the respective merits of the > individual parts. > > "It doesn't actually work well as a job control measure in connection > with parallel Make" should likely have been an indicator of what I > thought I was talking about. Can you tell me what problem you are currently experiencing? -- Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen
Sign in to reply to this message.
Han-Wen Nienhuys <hanwenn@gmail.com> writes: > On Fri, Mar 6, 2020 at 11:18 PM <dak@gnu.org> wrote: >> >> Sigh. I just noticed that opposed to the patch title, this does not >> just introduce a file lock for lilypond-book but _also_ changes the >> build system such that now almost double the number of allocated jobs >> get used. It would be good if different topics weren't conflated into >> single issues so that it's easier to discuss what one is actually >> dealing with and make decisions based on the respective merits of the >> individual parts. >> >> "It doesn't actually work well as a job control measure in connection >> with parallel Make" should likely have been an indicator of what I >> thought I was talking about. > > Can you tell me what problem you are currently experiencing? Harm has a system with memory pressure. That means that he so far has only been able to work with CPU_COUNT=2 make -j2 doc Since now lilypond-doc is no longer serialised, he'd need to reduce to CPU_COUNT=1 make -j2 doc or CPU_COUNT=2 make -j1 doc to get similar memory utilisation, for a considerable loss in performance. I've taken a look at Make's jobserver implementation and it is pretty straightforward. The real solution would, of course, be to make lilypond-book, with its directory-based database, not lock other instances of lilypond-book but take over their job load. However, the current interaction of lilypond-book is giving the whole work to lilypond which splits into n copies with a fixed work load. To make that work, one would rather have one "job server" of LilyPond itself which does all the initialisation work and then waits for job requests. Upon receiving them, it forks off copies working on them. Working with freshly forked copies would have the advantage of having reproducible stats not depending on the exact work distribution, and the disadvantage of things like typical font loading and symbol memoization in frequent code paths happening in each copy. On the other hand, the question of "gc between files?" would not be an issue since one would just throw the current state of memory away. One would probably want fresh forks for regtests because of the stats and reproducibility, and would accept continuous forks for documentation building (I assume that continuous forks, by which I mean one instance of LilyPond processing several files in sequence like we do now, would be faster in the long run but probably not all that much). I previously thought of trying to pin down the job distribution of regtests upon make test-baseline so that only new regtests (rather than the preexisting ones) would get distributed arbitrarily on make check, but starting with fresh forks seems like a much better deal for reproducibility. Of course, that's all for the long haul. To get back to your question: the consequences are worst when the job count is constrained due to memory pressure. My laptop has uncommonly large memory for its overall age and power, so I am not hit worst. The rough doubling of jobs does not cause me to run into swap space. -- David Kastrup
Sign in to reply to this message.
On 2020/03/07 12:39:31, dak wrote: > Han-Wen Nienhuys <mailto:hanwenn@gmail.com> writes: > > > On Fri, Mar 6, 2020 at 11:18 PM <mailto:dak@gnu.org> wrote: > >> > >> Sigh. I just noticed that opposed to the patch title, this does not > >> just introduce a file lock for lilypond-book but _also_ changes the > >> build system such that now almost double the number of allocated jobs > >> get used. It would be good if different topics weren't conflated into > >> single issues so that it's easier to discuss what one is actually > >> dealing with and make decisions based on the respective merits of the > >> individual parts. > >> > >> "It doesn't actually work well as a job control measure in connection > >> with parallel Make" should likely have been an indicator of what I > >> thought I was talking about. > > > > Can you tell me what problem you are currently experiencing? > > Harm has a system with memory pressure. That means that he so far has > only been able to work with > > CPU_COUNT=2 make -j2 doc Well, CPU_COUNT=3 make -j3 doc is mostly no problem > > Since now lilypond-doc is no longer serialised, he'd need to reduce to > > CPU_COUNT=1 make -j2 doc > > or > > CPU_COUNT=2 make -j1 doc Let me check, putting this patch on top of current master, right? With guile-1 or guile-2? I've little time atm, thus I'm not sure when I'm able to start testings...
Sign in to reply to this message.
thomasmorley65@gmail.com writes: > On 2020/03/07 12:39:31, dak wrote: >> >> Harm has a system with memory pressure. That means that he so far has >> only been able to work with >> >> CPU_COUNT=2 make -j2 doc > > Well, > CPU_COUNT=3 make -j3 doc > is mostly no problem Ok. >> Since now lilypond-doc is no longer serialised, he'd need to reduce to >> >> CPU_COUNT=1 make -j2 doc >> >> or >> >> CPU_COUNT=2 make -j1 doc > > Let me check, putting this patch on top of current master, right? > With guile-1 or guile-2? I'd use Guile-1, for the reason that it runs faster, eats less memory, and is more repeatable by virtue of not crashing. > I've little time atm, thus I'm not sure when I'm able to start > testings... The way this works is that running of lilypond-book in one directory blocks running lilypond-book in other directories but nothing else. So you can up, using CPU_COUNT=3 make -j3 with one job of lilypond-book that starts up 3 copies of LilyPond for large workloads, as well as with 3 jobs in other directories. Once those jobs actually run into lilypond-book, they are stalled within lilypond-book without starting LilyPond processes until the first lilypond-book has finished. So the worst memory use is for when one copy of lilypond-book has finished with its LilyPond part and starts the EPS and PDF processing, another copy of lilypond-book takes over and starts its LilyPond processes, and something else happens in another directory. -- David Kastrup
Sign in to reply to this message.
On Sat, Mar 7, 2020 at 4:30 PM David Kastrup <dak@gnu.org> wrote: > that starts up 3 copies of LilyPond for large workloads, as well as with > 3 jobs in other directories. can you point me to places within the build system where that would happen? AFAIK, our build is only parallel per directory, see here https://github.com/lilypond/lilypond/blob/825dd87d0b1b58e56d7c66ef1fc1dd672d9... ie. we use "&&" to serialize make commands that enter different directories. -- Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen
Sign in to reply to this message.
On Sat, Mar 7, 2020 at 1:39 PM David Kastrup <dak@gnu.org> wrote: > >> "It doesn't actually work well as a job control measure in connection > >> with parallel Make" should likely have been an indicator of what I > >> thought I was talking about. > > > > Can you tell me what problem you are currently experiencing? > > Harm has a system with memory pressure. That means that he so far has > only been able to work with > > CPU_COUNT=2 make -j2 doc > > Since now lilypond-doc is no longer serialised, he'd need to reduce to > to get similar memory utilisation, for a considerable loss in > performance. I've taken a look at Make's jobserver implementation and > it is pretty straightforward. The real solution would, of course, be to > make lilypond-book, with its directory-based database, not lock other > instances of lilypond-book but take over their job load. However, the > current interaction of lilypond-book is giving the whole work to > lilypond which splits into n copies with a fixed work load. That's considerable extra complexity, and it wouldn't work for folks that are using lilypond-book for actual work, ie. without a make jobserver. Harm, what kind of machine is this? I should note that 1) lilypond takes up to 600M of memory during the regtest, and I am pretty sure the rest of the jobs (tex, ghostscript) are peanuts compared to that (because jobs like TeX and GS process things page-by-page). This means that 1G was too little before, and 2G should be ample, so I am somewhat skeptical of your diagnosis. A 1G so-dimm (used) costs 3 EUR these days. I don't think it makes economical sense to spend time to optimize for this case. > To get back to your question: the consequences are worst when the job > count is constrained due to memory pressure. My laptop has uncommonly > large memory for its overall age and power, so I am not hit worst. The > rough doubling of jobs does not cause me to run into swap space. I think something is off with the heap use (on GUILE 1.8 at least). We can do the Carver score (which is 100 pages) in 900M heap easily. The 600M number sounds too high, especially given the fact that the snippets are generally tiny fragments of music. -- Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen
Sign in to reply to this message.
On Sun, Mar 8, 2020 at 10:35 AM Han-Wen Nienhuys <hanwenn@gmail.com> wrote: > > To get back to your question: the consequences are worst when the job > > count is constrained due to memory pressure. My laptop has uncommonly > > large memory for its overall age and power, so I am not hit worst. The > > rough doubling of jobs does not cause me to run into swap space. > > I think something is off with the heap use (on GUILE 1.8 at least). We > can do the Carver score (which is 100 pages) in 900M heap easily. The > 600M number sounds too high, especially given the fact that the > snippets are generally tiny fragments of music. GUILE 1.8: $ /usr/bin/time -v lilypond input/regression/mozart-hrn-3.ly .. Maximum resident set size (kbytes): 352280 GUILE 2.2 $ /usr/bin/time -v lilypond input/regression/mozart-hrn-3.ly .. Maximum resident set size (kbytes): 157904 I take some blame for this, because I wrote the heap stretching strategy for GUILE 1.8. -- Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen
Sign in to reply to this message.
On 2020/03/07 15:30:33, dak wrote: > mailto:thomasmorley65@gmail.com writes: > > > On 2020/03/07 12:39:31, dak wrote: > >> > >> Harm has a system with memory pressure. That means that he so far has > >> only been able to work with > >> > >> CPU_COUNT=2 make -j2 doc > > > > Well, > > CPU_COUNT=3 make -j3 doc > > is mostly no problem > > Ok. > > >> Since now lilypond-doc is no longer serialised, he'd need to reduce to > >> > >> CPU_COUNT=1 make -j2 doc > >> > >> or > >> > >> CPU_COUNT=2 make -j1 doc > > > > Let me check, putting this patch on top of current master, right? > > With guile-1 or guile-2? > > I'd use Guile-1, for the reason that it runs faster, eats less memory, > and is more repeatable by virtue of not crashing. > > > I've little time atm, thus I'm not sure when I'm able to start > > testings... > > The way this works is that running of lilypond-book in one directory > blocks running lilypond-book in other directories but nothing else. So > you can up, using CPU_COUNT=3 make -j3 with one job of lilypond-book > that starts up 3 copies of LilyPond for large workloads, as well as with > 3 jobs in other directories. Once those jobs actually run into > lilypond-book, they are stalled within lilypond-book without starting > LilyPond processes until the first lilypond-book has finished. > > So the worst memory use is for when one copy of lilypond-book has > finished with its LilyPond part and starts the EPS and PDF processing, > another copy of lilypond-book takes over and starts its LilyPond > processes, and something else happens in another directory. > > -- > David Kastrup I did some testings, both using guile-1. (1) Checking out febe487bb45c97f97377536a5d15da80cce80297 "stepmake: use patsubst for finding build-dir". I.e. current patch is in. (2) Same checkout, with 7ab9c8fa4faff7a513d0ecfbc7eecf7efd2b8ea8 "Add a FS lock to lilypond-book" reverted. In both cases I did: time CPU_COUNT=5 make -j5 and time CPU_COUNT=5 make -j5 doc While running 'make' and 'make doc', I did some other work in firefox and jEdit. Usually I get problems (meaning a heavy slow down on those other tools), if all cores are working (no surprise) and as soon as SWAP exceeds 600 MB. Though, with both tests I don't experience a big difference. SWAP goes up to 1.1 GB (partly up to 1.2 GB) Timing values are comparable. I've got the impression (1) performs slightly better. Judging from the usability of other tools (firefox, jEdit, etc) So from my part, no objection against this patch.
Sign in to reply to this message.
commit 7ab9c8fa4faff7a513d0ecfbc7eecf7efd2b8ea8 Author: Han-Wen Nienhuys <hanwen@lilypond.org> Date: Sun Mar 1 17:47:53 2020 +0100 Add a FS lock to lilypond-book
Sign in to reply to this message.
|