|
|
Created:
5 years ago by hanwenn Modified:
5 years ago CC:
lilypond-devel_gnu.org Visibility:
Public. |
Description Rewrite Mutable_properties based on vector<SCM>
Alists use are based on linked lists. They have poor locality, which
introduces more CPU cache misses. Since LilyPond is not multithreaded,
the CPU sits idle during these misses.
This is part a series of experiments to rearchitect property
storage. None of them seem especially promising currently.
Patch Set 1 #
Total comments: 8
Patch Set 2 : rebase #
MessagesTotal messages: 13
I gave this patch a try on my system, time lilypond MSDM.ly, 'real' time, 3 repetitions baseline (master, 0e7c26d40f): 0m42,533s; 0m42,547s; 0m42,430s Abstract Grob property storage into Mutable_properties. (https://codereview.appspot.com/561640043) 0m41,102s; 0m41,036s; 0m40,861s Rewrite Mutable_properties based on vector<SCM> (https://codereview.appspot.com/575990043/) 0m42,461s; 0m42,632s; 0m42,212s (I did not apply any of the other optimization patches posted in the last days.) If at all, Issue 561640043 seems to show some benefit. This _could_ be because Grob::Grob (Grob const &s) doesn't call ly_deep_copy anymore, but I've only looked into this very briefly because I think the introduction of Mutable_properties makes sense in any case. However I clearly don't see 7% improvement for this patch. Is there anything I missed?
Sign in to reply to this message.
On 2020/04/13 16:40:34, hahnjo wrote: > I gave this patch a try on my system, time lilypond MSDM.ly, 'real' time, 3 > repetitions > > baseline (master, 0e7c26d40f): > 0m42,533s; 0m42,547s; 0m42,430s > > Abstract Grob property storage into Mutable_properties. > (https://codereview.appspot.com/561640043) > 0m41,102s; 0m41,036s; 0m40,861s > > Rewrite Mutable_properties based on vector<SCM> > (https://codereview.appspot.com/575990043/) > 0m42,461s; 0m42,632s; 0m42,212s > > (I did not apply any of the other optimization patches posted in the last days.) > If at all, Issue 561640043 seems to show some benefit. This _could_ be because > Grob::Grob (Grob const &s) doesn't call ly_deep_copy anymore, but I've only > looked into this very briefly because I think the introduction of > Mutable_properties makes sense in any case. However I clearly don't see 7% > improvement for this patch. Is there anything I missed? No, I'll have a look at some other systems. You're right about the deep copy, and it's an explanation. What machine are you trying this with? For the record, I see the difference with GUILE 1.8 on Fedora 31 on a T460p (i5 6440HQ 2.6Ghz). I'll try with the other machines that I have for comparison.
Sign in to reply to this message.
On 2020/04/13 16:58:41, hanwenn wrote: > On 2020/04/13 16:40:34, hahnjo wrote: > > I gave this patch a try on my system, time lilypond MSDM.ly, 'real' time, 3 > > repetitions > > > > baseline (master, 0e7c26d40f): > > 0m42,533s; 0m42,547s; 0m42,430s > > > > Abstract Grob property storage into Mutable_properties. > > (https://codereview.appspot.com/561640043) > > 0m41,102s; 0m41,036s; 0m40,861s > > > > Rewrite Mutable_properties based on vector<SCM> > > (https://codereview.appspot.com/575990043/) > > 0m42,461s; 0m42,632s; 0m42,212s > > > > (I did not apply any of the other optimization patches posted in the last > days.) > > If at all, Issue 561640043 seems to show some benefit. This _could_ be because > > Grob::Grob (Grob const &s) doesn't call ly_deep_copy anymore, but I've only > > looked into this very briefly because I think the introduction of > > Mutable_properties makes sense in any case. However I clearly don't see 7% > > improvement for this patch. Is there anything I missed? > > No, I'll have a look at some other systems. You're right about the deep copy, > and it's an explanation. > > What machine are you trying this with? For the record, I see the difference with > GUILE 1.8 on Fedora 31 on a T460p (i5 6440HQ 2.6Ghz). I'll try with the other > machines that I have for comparison. Dell XPS13 from 2016 (i5-6200U) with Arch Linux, using GUILE 1.8
Sign in to reply to this message.
On 2020/04/13 16:58:41, hanwenn wrote: > On 2020/04/13 16:40:34, hahnjo wrote: > > I gave this patch a try on my system, time lilypond MSDM.ly, 'real' time, 3 > > repetitions > > > > baseline (master, 0e7c26d40f): > > 0m42,533s; 0m42,547s; 0m42,430s > > > > Abstract Grob property storage into Mutable_properties. > > (https://codereview.appspot.com/561640043) > > 0m41,102s; 0m41,036s; 0m40,861s > > > > Rewrite Mutable_properties based on vector<SCM> > > (https://codereview.appspot.com/575990043/) > > 0m42,461s; 0m42,632s; 0m42,212s > > > > (I did not apply any of the other optimization patches posted in the last > days.) > > If at all, Issue 561640043 seems to show some benefit. This _could_ be because > > Grob::Grob (Grob const &s) doesn't call ly_deep_copy anymore, but I've only > > looked into this very briefly because I think the introduction of > > Mutable_properties makes sense in any case. However I clearly don't see 7% > > improvement for this patch. Is there anything I missed? > > No, I'll have a look at some other systems. You're right about the deep copy, > and it's an explanation. > > What machine are you trying this with? For the record, I see the difference with > GUILE 1.8 on Fedora 31 on a T460p (i5 6440HQ 2.6Ghz). I'll try with the other > machines that I have for comparison. Grumbl. I think some of the timings must have used the -pg flag I used to gather profiling data :-(
Sign in to reply to this message.
On 2020/04/13 17:17:31, hanwenn wrote: > On 2020/04/13 16:58:41, hanwenn wrote: > > On 2020/04/13 16:40:34, hahnjo wrote: > > > I gave this patch a try on my system, time lilypond MSDM.ly, 'real' time, 3 > > > repetitions > > > > > > baseline (master, 0e7c26d40f): > > > 0m42,533s; 0m42,547s; 0m42,430s > > > > > > Abstract Grob property storage into Mutable_properties. > > > (https://codereview.appspot.com/561640043) > > > 0m41,102s; 0m41,036s; 0m40,861s > > > > > > Rewrite Mutable_properties based on vector<SCM> > > > (https://codereview.appspot.com/575990043/) > > > 0m42,461s; 0m42,632s; 0m42,212s > > > > > > (I did not apply any of the other optimization patches posted in the last > > days.) > > > If at all, Issue 561640043 seems to show some benefit. This _could_ be > because > > > Grob::Grob (Grob const &s) doesn't call ly_deep_copy anymore, but I've only > > > looked into this very briefly because I think the introduction of > > > Mutable_properties makes sense in any case. However I clearly don't see 7% > > > improvement for this patch. Is there anything I missed? > > > > No, I'll have a look at some other systems. You're right about the deep copy, > > and it's an explanation. > > > > What machine are you trying this with? For the record, I see the difference > with > > GUILE 1.8 on Fedora 31 on a T460p (i5 6440HQ 2.6Ghz). I'll try with the other > > machines that I have for comparison. > > Grumbl. I think some of the timings must have used the -pg flag I used to > gather profiling data :-( Actually, it looks like I have been measuring noise. Thermal throttling? It looks like I can get reproducible timings if I downclock the CPU from 2.6 to 2.0Ghz
Sign in to reply to this message.
On 2020/04/13 17:43:17, hanwenn wrote: > On 2020/04/13 17:17:31, hanwenn wrote: > > On 2020/04/13 16:58:41, hanwenn wrote: > > > On 2020/04/13 16:40:34, hahnjo wrote: > > > > I gave this patch a try on my system, time lilypond MSDM.ly, 'real' time, > 3 > > > > repetitions > > > > > > > > baseline (master, 0e7c26d40f): > > > > 0m42,533s; 0m42,547s; 0m42,430s > > > > > > > > Abstract Grob property storage into Mutable_properties. > > > > (https://codereview.appspot.com/561640043) > > > > 0m41,102s; 0m41,036s; 0m40,861s > > > > > > > > Rewrite Mutable_properties based on vector<SCM> > > > > (https://codereview.appspot.com/575990043/) > > > > 0m42,461s; 0m42,632s; 0m42,212s > > > > > > > > (I did not apply any of the other optimization patches posted in the last > > > days.) > > > > If at all, Issue 561640043 seems to show some benefit. This _could_ be > > because > > > > Grob::Grob (Grob const &s) doesn't call ly_deep_copy anymore, but I've > only > > > > looked into this very briefly because I think the introduction of > > > > Mutable_properties makes sense in any case. However I clearly don't see 7% > > > > improvement for this patch. Is there anything I missed? > > > > > > No, I'll have a look at some other systems. You're right about the deep > copy, > > > and it's an explanation. > > > > > > What machine are you trying this with? For the record, I see the difference > > with > > > GUILE 1.8 on Fedora 31 on a T460p (i5 6440HQ 2.6Ghz). I'll try with the > other > > > machines that I have for comparison. > > > > Grumbl. I think some of the timings must have used the -pg flag I used to > > gather profiling data :-( > > Actually, it looks like I have been measuring noise. Thermal throttling? > > It looks like I can get reproducible timings if I downclock the CPU from 2.6 to > 2.0Ghz With a downclocked CPU (I interleaved the runs for good measure), my timings match yours. for a in $(seq 1 3) ; do for bin in out/bin/lilypond.* ; do /usr/bin/time -v $bin -I carver MSDM ; done ; done >& timing-all.txt .. Command being timed: "out/bin/lilypond.array-alist -I carver MSDM" User time (seconds): 56.64 System time (seconds): 0.89 -- Command being timed: "out/bin/lilypond.baseline -I carver MSDM" User time (seconds): 57.15 System time (seconds): 0.76 -- Command being timed: "out/bin/lilypond.mutable-props -I carver MSDM" User time (seconds): 55.93 System time (seconds): 0.76 -- Command being timed: "out/bin/lilypond.array-alist -I carver MSDM" User time (seconds): 56.73 System time (seconds): 0.92 -- Command being timed: "out/bin/lilypond.baseline -I carver MSDM" User time (seconds): 56.58 System time (seconds): 0.85 -- Command being timed: "out/bin/lilypond.mutable-props -I carver MSDM" User time (seconds): 55.60 System time (seconds): 0.74 -- Command being timed: "out/bin/lilypond.array-alist -I carver MSDM" User time (seconds): 56.57 System time (seconds): 0.93 -- Command being timed: "out/bin/lilypond.baseline -I carver MSDM" User time (seconds): 56.44 System time (seconds): 0.77 -- Command being timed: "out/bin/lilypond.mutable-props -I carver MSDM" User time (seconds): 55.85 System time (seconds): 0.77
Sign in to reply to this message.
https://codereview.appspot.com/575990043/diff/577740043/lily/mutable-properti... File lily/mutable-properties.cc (right): https://codereview.appspot.com/575990043/diff/577740043/lily/mutable-properti... lily/mutable-properties.cc:50: if (SCM_EQ_P (key (i), k)) scm_is_eq https://codereview.appspot.com/575990043/diff/577740043/lily/mutable-properti... lily/mutable-properties.cc:71: if (SCM_EQ_P (key (i), k)) scm_is_eq https://codereview.appspot.com/575990043/diff/577740043/lily/mutable-properti... lily/mutable-properties.cc:97: if (SCM_EQ_P (key (i), k)) scm_is_eq https://codereview.appspot.com/575990043/diff/577740043/lily/mutable-properti... lily/mutable-properties.cc:109: Mutable_properties::mark () const This can segfault in GUILEv2 and/or depending on some other circumstances since the C++ members are only getting initialized after the GC hook is already operative. You need to sort the C++ structures potentially accessed by the GC hook into a Preinit class (grep for Preinit for existing uses) and inherit from that before inheriting from whatever operates the Smob. Then C++ guarantees an initialisation order consistent with GC hook operation.
Sign in to reply to this message.
On 2020/04/13 20:21:29, dak wrote: > https://codereview.appspot.com/575990043/diff/577740043/lily/mutable-properti... > lily/mutable-properties.cc:109: Mutable_properties::mark () const > This can segfault in GUILEv2 and/or depending on some other circumstances since > the C++ members are only getting initialized after the GC hook is already > operative. > > You need to sort the C++ structures potentially accessed by the GC hook into a > Preinit class (grep for Preinit for existing uses) and inherit from that before > inheriting from whatever operates the Smob. Then C++ guarantees an > initialisation order consistent with GC hook operation. Actually, a less involved implementation where the value cell is SCM is to just use a Scheme vector. That also significantly speeds up garbage collection relying on mark hooks because then only a single scm_gc_mark call is made and Guile does the rest internally.
Sign in to reply to this message.
On 2020/04/13 20:26:10, dak wrote: > On 2020/04/13 20:21:29, dak wrote: > > > > https://codereview.appspot.com/575990043/diff/577740043/lily/mutable-properti... > > lily/mutable-properties.cc:109: Mutable_properties::mark () const > > This can segfault in GUILEv2 and/or depending on some other circumstances > since > > the C++ members are only getting initialized after the GC hook is already > > operative. > > > > You need to sort the C++ structures potentially accessed by the GC hook into a > > Preinit class (grep for Preinit for existing uses) and inherit from that > before > > inheriting from whatever operates the Smob. Then C++ guarantees an > > initialisation order consistent with GC hook operation. > > Actually, a less involved implementation where the value cell is SCM is to just > use a Scheme vector. That also significantly speeds up garbage collection > relying on mark hooks because then only a single scm_gc_mark call is made and > Guile does the rest internally. I actually wanted to start that working on that, but I was happy with the numbers. Now that the numbers turned out to be bogus, I want to dig in deeper into measurements. I want to run lilypond through cachegrind to understand better what is happening. In my mind, this code should be strictly better than the alist-based solution, but that is not borne out by timing.
Sign in to reply to this message.
https://codereview.appspot.com/575990043/diff/577740043/lily/mutable-properti... File lily/mutable-properties.cc (right): https://codereview.appspot.com/575990043/diff/577740043/lily/mutable-properti... lily/mutable-properties.cc:50: if (SCM_EQ_P (key (i), k)) On 2020/04/13 20:21:29, dak wrote: > scm_is_eq Done. https://codereview.appspot.com/575990043/diff/577740043/lily/mutable-properti... lily/mutable-properties.cc:71: if (SCM_EQ_P (key (i), k)) On 2020/04/13 20:21:28, dak wrote: > scm_is_eq Done. https://codereview.appspot.com/575990043/diff/577740043/lily/mutable-properti... lily/mutable-properties.cc:97: if (SCM_EQ_P (key (i), k)) On 2020/04/13 20:21:28, dak wrote: > scm_is_eq Done. https://codereview.appspot.com/575990043/diff/577740043/lily/mutable-properti... lily/mutable-properties.cc:109: Mutable_properties::mark () const On 2020/04/13 20:21:29, dak wrote: > This can segfault in GUILEv2 and/or depending on some other circumstances since > the C++ members are only getting initialized after the GC hook is already > operative. > > You need to sort the C++ structures potentially accessed by the GC hook into a > Preinit class (grep for Preinit for existing uses) and inherit from that before > inheriting from whatever operates the Smob. Then C++ guarantees an > initialisation order consistent with GC hook operation. I read the comment about this, but this only applies to derived classes, right? This is set in the Grob class, which is the base class, so the vector is init'd before smobify_self is called.
Sign in to reply to this message.
On 2020/04/13 21:13:42, hanwenn wrote: > > > > You need to sort the C++ structures potentially accessed by the GC hook into a > > Preinit class (grep for Preinit for existing uses) and inherit from that > before > > inheriting from whatever operates the Smob. Then C++ guarantees an > > initialisation order consistent with GC hook operation. > > I read the comment about this, but this only applies to derived classes, right? > This is set in the Grob class, which is the base class, so the vector is init'd > before smobify_self is called. think-think-think-think... I think you are correct.
Sign in to reply to this message.
On 2020/04/13 17:43:17, hanwenn wrote: > > Actually, it looks like I have been measuring noise. Thermal throttling? > > It looks like I can get reproducible timings if I downclock the CPU from 2.6 to > 2.0Ghz Put the following line in the config /etc/modprobe.d/acpi.conf:options thinkpad-acpi fan_control=1 Then you'll be able to do echo level 7|sudo tee /proc/acpi/ibm/fan [CPU intensive stuff] echo level auto|sudo tee /proc/acpi/ibm/fan There is also level disengaged , but it's a lot more noise (and fan wear) for rather little gain. The fans for T61 tend to die a lot more than those for T400, but removing and reinstalling a fan only from a fan assembly (also containing heat sink and heat pipe) is more involved than just exchanging the fan assembly.
Sign in to reply to this message.
|