Issue 583750043: Use vectors rather than lists for skylines.

Issue 583750043: Use vectors rather than lists for skylines. (Closed)

Can't Edit
Can't Publish+Mail
Start Review

Created:
4 years ago by hanwenn

Modified:
3 years, 11 months ago

Reviewers:
Dan Eble, hahnjo, carl.d.sorensen, dak, Carl

CC:
lilypond-devel_gnu.org

Visibility:
Public.

More Reviews

Description

Use vectors rather than lists for skylines. Linked lists have poor locality. This yields a ~7% speedup on the Carver MDSM score. Remove Skyline::normalize(): * this is rare; it never occurs in the entire regtest * there is no inherent problem with having adjacent empty buildings. * it is expensive to remove elements from the middle of a vector * Skyline::normalize is responsible for ~1% of CPU in the MSDM score. Benchmark data benchmark for arguments: input/regression/mozart-hrn-3 raw data: {'30c845d383': [3.76, 3.77, 3.76], '83b4b71d01': [3.62, 3.59, 3.6]} Delta against 30c845d383: Fix font-name-add-files.ly on GUILE v2 83b4b71d01 - Use vectors rather than lists for skylines. med diff -0.160000 med diff -4.255319 % (83b4b71d01 is faster) benchmark for arguments: -I carver MSDM raw data: {'30c845d383': [53.51, 51.92, 52.25], '83b4b71d01': [49.5, 48.36, 48.41]} Delta against 30c845d383: Fix font-name-add-files.ly on GUILE v2 83b4b71d01 - Use vectors rather than lists for skylines. med diff -3.840000 med diff -7.349282 % (83b4b71d01 is faster)

Patch Set 1 #

Patch Set 2 : reserve #

Patch Set 3 : ptr iso. ref #

Patch Set 4 : notes to self about optimization #

Patch Set 5 : update timings; rebase on interval building. #

Total comments: 4

Patch Set 6 : dan, diff against master #

Total comments: 27

Patch Set 7 : jonas #

Total comments: 2

Patch Set 8 : drop normalize() #

Total comments: 2

Patch Set 9 : jonas final comment #

Created: 3 years, 11 months ago

Download [raw] [tar.bz2]

		Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+119 lines, -145 lines)			Patch
	M	lily/include/skyline.hh	View	1 2 3 4 5 6 7 8	3 chunks	+11 lines, -14 lines	0 comments	Download
	M	lily/skyline.cc	View	1 2 3 4 5 6 7 8	24 chunks	+108 lines, -131 lines	0 comments	Download

Messages

Total messages: 43

Expand All Messages | Collapse All Messages

hanwenn

This is likely a timing fluke due to thermal throttling too. Hold off on reviewing.

4 years ago (2020-04-13 18:51:22 UTC) #4

hahnjo

On 2020/04/13 18:51:22, hanwenn wrote: > This is likely a timing fluke due to thermal ...

4 years ago (2020-04-14 06:57:08 UTC) #5

hanwenn

Sure. On Tue, Apr 14, 2020 at 8:57 AM <jonas.hahnfeld@gmail.com> wrote: > > On 2020/04/13 ...

4 years ago (2020-04-14 07:25:02 UTC) #6

hanwenn

On 2020/04/17 14:58:36, hanwenn wrote: > update timings; rebase on interval building. I measured more ...

4 years ago (2020-04-17 15:02:38 UTC) #8

hahnjo

On 2020/04/17 15:02:38, hanwenn wrote: > On 2020/04/17 14:58:36, hanwenn wrote: > > update timings; ...

4 years ago (2020-04-17 18:53:56 UTC) #10

hanwenn

On Fri, Apr 17, 2020 at 8:53 PM <jonas.hahnfeld@gmail.com> wrote: > > > update timings; ...

4 years ago (2020-04-17 19:07:44 UTC) #11

Dan Eble

On 2020/04/17 19:07:44, hanwenn wrote: > The profile data itself is also a more precise ...

4 years ago (2020-04-17 22:44:12 UTC) #12

hanwenn

On 2020/04/17 22:44:12, Dan Eble wrote: > On 2020/04/17 19:07:44, hanwenn wrote: > > The ...

4 years ago (2020-04-18 11:13:11 UTC) #13

On 2020/04/17 22:44:12, Dan Eble wrote:
> On 2020/04/17 19:07:44, hanwenn wrote:
> > The profile data itself is also a more precise source for measuring
> > improvements that are close to the noise level, like
> > https://codereview.appspot.com/551730043/
> 
> Investigating improvements that are close to the noise level--never mind, I
> won't tell you how to spend your time.

But you do drop a cloaked suggestion.

The noise level is contextually determined: if you do more experiments under
better controlled circumstances, the noise level decreases. But it takes more
time, and it is time I can't do other work on LilyPond; there is only so much
coffee one can drink on a single day.  (You'd think that a 4-core machine would
be able to put the lilypond task on a dedicated CPU, but it looks it doesn't
work like that in practice.)

I'm pretty sure there is still significant gains (say, 10%) to be had in
everything related to skyline processing. For example, we create (way too
detailed) glyph outlines, and even though they're constant, they're not cached.
We can probably also shortcut the glyph -> SCM lists -> skyline path by
constructing the skyline directly from the freetype outline. The skyline
functions construct both boxes and buildings, where just buildings would
suffice.

Many of these inefficiencies will be solved in smaller steps, with smaller (~1%)
gains. For the MSDM score, a 1% gain is 0.4 seconds, while we see 0.3s
variability between runs. So, unfortunately, yes, working on performance has to
happen at the noise level. 

> 5.5% is nice, though.  Thanks.
> 
> https://codereview.appspot.com/583750043/diff/555670043/lily/skyline.cc
> File lily/skyline.cc (right):
> 
>
https://codereview.appspot.com/583750043/diff/555670043/lily/skyline.cc#newco...
> lily/skyline.cc:74: for (vector<Building>::const_iterator i = b.begin (); i !=
> b.end (); i++)
> for (auto i : b)
> 
>
https://codereview.appspot.com/583750043/diff/555670043/lily/skyline.cc#newco...
> lily/skyline.cc:219: for (; cit != scp->end (); cit++)
> ++cit

both done.

hanwenn

I'm uploading a diff against master, so James can run this through tests. Intention is ...

4 years ago (2020-04-18 11:17:14 UTC) #15

hanwenn

https://codereview.appspot.com/583750043/diff/555670043/lily/skyline.cc File lily/skyline.cc (right): https://codereview.appspot.com/583750043/diff/555670043/lily/skyline.cc#newcode74 lily/skyline.cc:74: for (vector<Building>::const_iterator i = b.begin (); i != b.end ...

4 years ago (2020-04-18 11:25:36 UTC) #16

Dan Eble

https://codereview.appspot.com/583750043/diff/577780043/lily/skyline.cc File lily/skyline.cc (right): https://codereview.appspot.com/583750043/diff/577780043/lily/skyline.cc#newcode75 lily/skyline.cc:75: i.print (); I'm sorry, I gave you imperfect advice ...

4 years ago (2020-04-18 11:38:17 UTC) #17

hanwenn

On 2020/04/18 11:38:17, Dan Eble wrote: > https://codereview.appspot.com/583750043/diff/577780043/lily/skyline.cc > File lily/skyline.cc (right): > > https://codereview.appspot.com/583750043/diff/577780043/lily/skyline.cc#newcode75 ...

3 years, 12 months ago (2020-04-21 07:38:33 UTC) #18

dak

I am rather skeptical about the usefulness of such microoptimizations without influence on algorithmic complexity. ...

3 years, 12 months ago (2020-04-21 23:39:55 UTC) #19

Dan Eble

On 2020/04/21 23:39:55, dak wrote: > influence on algorithmic complexity. In return for better memory ...

3 years, 12 months ago (2020-04-21 23:59:33 UTC) #20

hanwenn

current master: 4f906b95aea99f2a47d5ba037f7421e5bb933c42 (origin/master) Issue 5908: get_path_list: use loop instead of tail recursion Command being ...

3 years, 12 months ago (2020-04-22 07:43:17 UTC) #21

current master:

4f906b95aea99f2a47d5ba037f7421e5bb933c42 (origin/master) Issue 5908:
get_path_list: use loop instead of tail recursion

Command being timed: "out/bin/lilypond.4f906b95ae -I carver MSDM"
User time (seconds): 34.30
Maximum resident set size (kbytes): 1139208

this patch:

889131d584f77f44acc8d801ce254d7d2ba85aa4 (HEAD, vector-skyline) Use
vectors rather than lists for skylines.
Command being timed: "out/bin/lilypond.889131d584 -I carver MSDM"
User time (seconds): 31.66
Maximum resident set size (kbytes): 1054432

I disagree David's assessment on several theoretical grounds too:

* "less maintainable manner with hand-written optimisations". I don't
see these in this patch. This (together with the Tie_performer) is the
only place in LilyPond that uses lists. We could get rid of std::list
on maintenance grounds alone.

* The linked list has an overhead of 2 pointers, on a 4x Real
datastructure, ie. 50% overhead. The vector has (assuming a 2x growth
strategy) 100% overhead. There is a little more overhead, but we could
tune that by adjustin the vector growth strategy. With a linked list,
there is no choice in space/time tradeoff.

* The linked list approach is much worse for fragmentation. It has to
allocate a ton on 48-byte objects, some of which have long lifetime.
This will create a highly fragmented pool of 48-byte objects. We don't
have use for so many of these objects elsewhere. By contrast, the
vector approach will create a distribution of differently sized
objects, which can be recycled into other vectors.

* Linked lists provide O(1) random insertion, but this is exactly the
algorithmic property we don't need at all in this problem. By
contrast, the buildings are sorted, and if we store them in an array,
we can use binary search for efficient searching. For example, we
generate glyph outlines for each character in every text in the score.
With binary search, we can quickly check if the bbox of the char is
within the outline, and avoid doing the work to generate and merge the
outline. This likely cuts the amount of merges by a factor 2: the
bottom half of a text above the staff is obscured by the bottom of the
staff, so it doesn't need to be processed at all.

Finally, to Dan's point: I haven't looked at heap profiling. The
google heap-profiler is available at
https://gperftools.github.io/gperftools/heapprofile.html, and I would
be happy to comment on heap profiles that anyone wishes to collect. My
educated guess is that the skyline code, with or without this patch,
will figure highly in the stats, simply because we compute skylines in
so much detail. For now, I don't intend to try this out: the skylines
are such an obvious time sink that that is the area to optimize. This
is reinforced by my other patch for skylines
(https://codereview.appspot.com/547980044/).

On Wed, Apr 22, 2020 at 1:39 AM <dak@gnu.org> wrote:
>
> I am rather skeptical about the usefulness of such microoptimizations
> without influence on algorithmic complexity.  In return for better
> memory locality you buy into quite larger memory fragmentation, and we
> have scores of comparatively modest size already exhausting memory.  All
> that exhausted memory needs to get filled and processed, so it would
> rather seem like the true savings are not to be found in doing the same
> kind of work in a slightly faster but less maintainable manner with
> hand-written optimisations, but rather in figuring out why too much work
> is being done in the first place.
>
> The more one replaces standard tools and operations, the harder it
> becomes to figure out what kind of stuff actually goes wrong and fix it,
> or change the strategies and algorithms wholesale.
>
> https://codereview.appspot.com/583750043/

-- 
Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen

Dan Eble

On 2020/04/22 07:43:17, hanwenn wrote: > Finally, to Dan's point: I haven't looked at heap ...

3 years, 12 months ago (2020-04-22 20:52:04 UTC) #22

dak

On 2020/04/22 07:43:17, hanwenn wrote: > current master: > > 4f906b95aea99f2a47d5ba037f7421e5bb933c42 (origin/master) Issue 5908: > ...

3 years, 12 months ago (2020-04-23 16:01:20 UTC) #23

On 2020/04/22 07:43:17, hanwenn wrote:
> current master:
> 
> 4f906b95aea99f2a47d5ba037f7421e5bb933c42 (origin/master) Issue 5908:
> get_path_list: use loop instead of tail recursion
> 
> Command being timed: "out/bin/lilypond.4f906b95ae -I carver MSDM"
> User time (seconds): 34.30
> Maximum resident set size (kbytes): 1139208
> 
> this patch:
> 
> 889131d584f77f44acc8d801ce254d7d2ba85aa4 (HEAD, vector-skyline) Use
> vectors rather than lists for skylines.
> Command being timed: "out/bin/lilypond.889131d584 -I carver MSDM"
> User time (seconds): 31.66
> Maximum resident set size (kbytes): 1054432
> 
> 
> I disagree David's assessment on several theoretical grounds too:
> 
> * "less maintainable manner with hand-written optimisations". I don't
> see these in this patch. This (together with the Tie_performer) is the
> only place in LilyPond that uses lists. We could get rid of std::list
> on maintenance grounds alone.

This patch may not be the best illustration of the problem, but it does leave
something to be desired as well.  When the flow and functionality functionality
of the skyline code here does not depend on whether one uses vectors or lists,
the actual exchange should be of one typedef.  Then one is free to change the
implementation at a whim and when other conditions may change, like changes in
the memory support.  C++ is a painful language for one reason: not sacrificing
functionality while still being able to separate data types and algorithms in a
modular manner.  We pay the price for using C++ so we should also reap the
rewards in usability.  STL provides a unified interface to containers for a
reason.

> 
> * The linked list has an overhead of 2 pointers, on a 4x Real
> datastructure, ie. 50% overhead. The vector has (assuming a 2x growth
> strategy) 100% overhead. There is a little more overhead, but we could
> tune that by adjustin the vector growth strategy. With a linked list,
> there is no choice in space/time tradeoff.

When I read "adjustin the vector growth strategy", that again sounds like
microoptimisation by replacing STL algorithms with something homespun.  That
just makes no sense since it ultimately will not buy us more than about 30% of
performance while locking us into a code base that can neither be easily
maintained nor brought up to speed in case STL improves or we want to plug in,
say, a Boost library.  If we want to close LilyPond to further development,
squeezing the last 30% of performance out in return for lots worse in
maintainability.

> * The linked list approach is much worse for fragmentation. It has to
> allocate a ton on 48-byte objects, some of which have long lifetime.
> This will create a highly fragmented pool of 48-byte objects. We don't
> have use for so many of these objects elsewhere. By contrast, the
> vector approach will create a distribution of differently sized
> objects, which can be recycled into other vectors.

Vectors are usually grown in fixed growth factors and the elements of the
vectors here are not something with a straightforward size such SCM.  So we have
similar problems with vectors.

At any rate, if the code were written agnostic with regard to the actually used
container, there would be no need to burn a final decision into code and one
could revisit at some future time.  Or write yet-another-container that does a
better job at merging structures with not automatically balanced subdivisions.

I actually do have some half-finished code for that sitting somewhere that
postpones merges of significantly different sized subskylines.  One of the
problem areas was, unhard to guess, ensuring results that would not (or not
significantly) depend on merge order for numeric reasons because that makes
things awfully irreproducible.

dak

On 2020/04/23 16:01:20, dak wrote: > On 2020/04/22 07:43:17, hanwenn wrote: > > * The ...

3 years, 12 months ago (2020-04-23 17:43:46 UTC) #24

hanwenn

On Thu, Apr 23, 2020 at 6:01 PM <dak@gnu.org> wrote: > > I disagree David's ...

3 years, 12 months ago (2020-04-23 21:53:57 UTC) #25

On Thu, Apr 23, 2020 at 6:01 PM <dak@gnu.org> wrote:

> > I disagree David's assessment on several theoretical grounds too:
> >
> > * "less maintainable manner with hand-written optimisations". I don't
> > see these in this patch. This (together with the Tie_performer) is the
> > only place in LilyPond that uses lists. We could get rid of std::list
> > on maintenance grounds alone.
>
> This patch may not be the best illustration of the problem, but it does
> leave something to be desired as well.

I think you are trying to have a philosphical discussion here, but
when you say this, then James puts it back on review. Given that your
discussion below seems largely theoretical, I'm setting it back to
countdown. Holler if you disagree.

> > * The linked list has an overhead of 2 pointers, on a 4x Real
> > datastructure, ie. 50% overhead. The vector has (assuming a 2x growth
> > strategy) 100% overhead. There is a little more overhead, but we could
> > tune that by adjustin the vector growth strategy. With a linked list,
> > there is no choice in space/time tradeoff.
>
> When I read "adjustin the vector growth strategy", that again sounds
> like microoptimisation by replacing STL algorithms with something
> homespun.  That just makes no sense since it ultimately will not buy us
> more than about 30% of performance while locking us into a code base
> that can neither be easily maintained nor brought up to speed in case
> STL improves or we want to plug in, say, a Boost library.  If we want to
> close LilyPond to further development, squeezing the last 30% of
> performance out in return for lots worse in maintainability.

Well, my point is that you have the option to make the tradeoff. If
you use linked lists, you don't get the tradeoff.

The idea that you can just swap out one data structure for the other
by doing a single typedef changes is in my experience a lie. Different
structures have different space/cpu tradeoffs, so you have to use them
in differen ways.

> > * The linked list approach is much worse for fragmentation. It has to
> > allocate a ton on 48-byte objects, some of which have long lifetime.
> > This will create a highly fragmented pool of 48-byte objects. We don't
> > have use for so many of these objects elsewhere. By contrast, the
> > vector approach will create a distribution of differently sized
> > objects, which can be recycled into other vectors.
>
> Vectors are usually grown in fixed growth factors and the elements of
> the vectors here are not something with a straightforward size such SCM.
>  So we have similar problems with vectors.

?

If we have a vector<Building> of capacity 32, that takes 1536 bytes.
After it's merged into another skyline, we deallocate those 1536
bytes, so it can later be used as a vector fo 192 Grob* or 64
Tie_configuration_variation. We use vectors of all kinds of sizes
pervasively, so there will always be an efficient reuse of that chunk
of memory.

> At any rate, if the code were written agnostic with regard to the
> actually used container, there would be no need to burn a final decision
> into code and one could revisit at some future time.  Or write
> yet-another-container that does a better job at merging structures with
> not automatically balanced subdivisions.

The real problem is not that the subdivisions are unbalanced: the
problem is that we first generate a lot of skyline data that we don't
need.

-- 
Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen

dak

Han-Wen Nienhuys <hanwenn@gmail.com> writes: > On Thu, Apr 23, 2020 at 6:01 PM <dak@gnu.org> wrote: ...

3 years, 12 months ago (2020-04-23 22:17:35 UTC) #26

Han-Wen Nienhuys <hanwenn@gmail.com> writes:

> On Thu, Apr 23, 2020 at 6:01 PM <dak@gnu.org> wrote:
>
>> > I disagree David's assessment on several theoretical grounds too:
>> >
>> > * "less maintainable manner with hand-written optimisations". I don't
>> > see these in this patch. This (together with the Tie_performer) is the
>> > only place in LilyPond that uses lists. We could get rid of std::list
>> > on maintenance grounds alone.
>>
>> This patch may not be the best illustration of the problem, but it does
>> leave something to be desired as well.
>
> I think you are trying to have a philosphical discussion here, but
> when you say this, then James puts it back on review. Given that your
> discussion below seems largely theoretical, I'm setting it back to
> countdown. Holler if you disagree.

I disagree.

>> When I read "adjustin the vector growth strategy", that again sounds
>> like microoptimisation by replacing STL algorithms with something
>> homespun.  That just makes no sense since it ultimately will not buy us
>> more than about 30% of performance while locking us into a code base
>> that can neither be easily maintained nor brought up to speed in case
>> STL improves or we want to plug in, say, a Boost library.  If we want to
>> close LilyPond to further development, squeezing the last 30% of
>> performance out in return for lots worse in maintainability.
>
> Well, my point is that you have the option to make the tradeoff. If
> you use linked lists, you don't get the tradeoff.

You can still benchmark it as needed.

> The idea that you can just swap out one data structure for the other
> by doing a single typedef changes is in my experience a lie. Different
> structures have different space/cpu tradeoffs, so you have to use them
> in differen ways.

Can you state how iterating through a container in sorted order requires
different code using STL lists and STL vectors?

>> At any rate, if the code were written agnostic with regard to the
>> actually used container, there would be no need to burn a final decision
>> into code and one could revisit at some future time.  Or write
>> yet-another-container that does a better job at merging structures with
>> not automatically balanced subdivisions.
>
> The real problem is not that the subdivisions are unbalanced: the
> problem is that we first generate a lot of skyline data that we don't
> need.

Most of the skyline data is not needed because it does not survive a
merge with other skyline data.  That is different to other
divide-and-conquer kinds of algorithms because those algorithms retain
the amount of data while merging.  If the merged skylines are
consistently of quite different size, the divide-and-conquer O (lg N)
performance moves more in the O (n^2) direction, an effect known from
Quicksort.

Now that's not at issue with this patch.  What I point out here is that
moving from one STL container to another STL container is a standard
programming technique that is _exactly_ why the STL library has
iterators, and C++11 has for loops that can easily iterate through
containers in sequence without even requiring convoluted iterator
declarations.  So there is just no point in not doing this in a manner
where it isn't hardcoding one container type throughout.

-- 
David Kastrup

hahnjo

I generally agree with David that having the code agnostic of the container would be ...

3 years, 12 months ago (2020-04-24 07:08:51 UTC) #27

hanwenn

Here is how I have experienced this discussion: DAK: this is micro-optimization that causes memory ...

3 years, 12 months ago (2020-04-24 12:34:01 UTC) #28

hanwenn

https://codereview.appspot.com/583750043/diff/577780043/lily/skyline.cc File lily/skyline.cc (right): https://codereview.appspot.com/583750043/diff/577780043/lily/skyline.cc#newcode183 lily/skyline.cc:183: vector<Building>::iterator i; On 2020/04/24 07:08:50, hahnjo wrote: > move ...

3 years, 12 months ago (2020-04-24 12:37:02 UTC) #30

hahnjo

Han-Wen, any reason not to use range-based loops in the places I pointed to in ...

3 years, 12 months ago (2020-04-24 13:04:37 UTC) #31

hanwenn

On Fri, Apr 24, 2020 at 3:04 PM <jonas.hahnfeld@gmail.com> wrote: > > Han-Wen, any reason ...

3 years, 12 months ago (2020-04-24 13:09:53 UTC) #32

dak

hanwenn@gmail.com writes: > Here is how I have experienced this discussion: > > DAK: this ...

3 years, 12 months ago (2020-04-24 13:15:10 UTC) #33

hahnjo

On 2020/04/24 13:09:53, hanwenn wrote: > On Fri, Apr 24, 2020 at 3:04 PM <mailto:jonas.hahnfeld@gmail.com> ...

3 years, 12 months ago (2020-04-24 13:16:17 UTC) #34

hanwenn

On Fri, Apr 24, 2020 at 3:15 PM David Kastrup <dak@gnu.org> wrote: > > > ...

3 years, 12 months ago (2020-04-24 13:18:11 UTC) #35

dak

Han-Wen Nienhuys <hanwenn@gmail.com> writes: > On Fri, Apr 24, 2020 at 3:15 PM David Kastrup ...

3 years, 12 months ago (2020-04-24 13:29:43 UTC) #36

dak

On 2020/04/24 13:15:10, dak wrote: > mailto:hanwenn@gmail.com writes: > > > Here is how I ...

3 years, 11 months ago (2020-04-26 10:26:17 UTC) #38

hahnjo

Looks good with respect to using auto. One question inline, might be a missing reference. ...

3 years, 11 months ago (2020-04-26 11:28:19 UTC) #39

hanwenn

https://codereview.appspot.com/583750043/diff/557770050/lily/skyline.cc File lily/skyline.cc (right): https://codereview.appspot.com/583750043/diff/557770050/lily/skyline.cc#newcode584 lily/skyline.cc:584: for (auto const b : buildings_) On 2020/04/26 11:28:19, ...

3 years, 11 months ago (2020-04-26 12:37:17 UTC) #40

hahnjo

LGTM (interesting that you found unused methods from 2012 :D)

3 years, 11 months ago (2020-04-26 15:01:05 UTC) #42

hanwenn

3 years, 11 months ago (2020-05-02 22:24:49 UTC) #43

commit eaf40071f56ca2ca337dc7684c0da3f307f070bd
Author: Han-Wen Nienhuys <hanwen@lilypond.org>
Date:   Fri Apr 17 16:37:44 2020 +0200

    Use vectors rather than lists for skylines.

Expand All Messages | Collapse All Messages