|
|
Created:
4 years, 2 months ago by hanwenn Modified:
4 years, 2 months ago CC:
lilypond-devel_gnu.org Visibility:
Public. |
DescriptionSimplify and speed up uniquify
Previously we sorted the array twice. Instead, we use a hash set. This
makes the procedure O(N) rather than O(N log N).
Patch Set 1 #
Total comments: 2
Patch Set 2 : vsize #Patch Set 3 : dan eble #MessagesTotal messages: 12
Not sure about the speedup (we might have small sizes often enough that the constant factor becomes relevant), but the code is quite a net win with regard to obviously doing what it should do. It probably takes a whole lot more allocation calls, but at least the impact on memory is just temporary. LGTM
Sign in to reply to this message.
On 2020/01/24 13:29:34, dak wrote: > Not sure about the speedup (we might have small sizes often enough that the > constant factor becomes relevant), but the code is quite a net win with regard > to obviously doing what it should do. It probably takes a whole lot more > allocation calls, but at least the impact on memory is just temporary. > > LGTM I was hoping the standard hashmap would use probing, but looks like it's chaining If the allocation cost becomes problematic, we can use another hashmap instead.
Sign in to reply to this message.
On 2020/01/24 14:06:22, hanwenn wrote: > If the allocation cost becomes problematic, we can use another hashmap instead. "We" could profile it and know before pushing whether it is worth it.
Sign in to reply to this message.
On 2020/01/24 14:11:59, Dan Eble wrote: > On 2020/01/24 14:06:22, hanwenn wrote: > > If the allocation cost becomes problematic, we can use another hashmap > instead. > > "We" could profile it and know before pushing whether it is worth it. I don't think that this is really performance-critical. Given the kind of headache we had with non-deterministic runs, something as simple as that is nice. A more efficient way to do the thing via sorting indirect pointers would be just to sort the indirect pointers, go through the indirect pointers and null out all except the smallest non-unique direct pointers in the array and then compact the array to the non-null entries. But I still think I'd prefer Han-Wen's version.
Sign in to reply to this message.
https://codereview.appspot.com/583390043/diff/551390044/lily/grob.cc File lily/grob.cc (right): https://codereview.appspot.com/583390043/diff/551390044/lily/grob.cc#newcode971 lily/grob.cc:971: int j = 0; vsize j = 0; :-)
Sign in to reply to this message.
vsize
Sign in to reply to this message.
https://codereview.appspot.com/583390043/diff/551390044/lily/grob.cc File lily/grob.cc (right): https://codereview.appspot.com/583390043/diff/551390044/lily/grob.cc#newcode971 lily/grob.cc:971: int j = 0; On 2020/01/24 15:17:13, dak wrote: > vsize j = 0; > :-) Done.
Sign in to reply to this message.
On 2020/01/24 15:17:02, dak wrote: > On 2020/01/24 14:11:59, Dan Eble wrote: > > On 2020/01/24 14:06:22, hanwenn wrote: > > > If the allocation cost becomes problematic, we can use another hashmap > > instead. > > > > "We" could profile it and know before pushing whether it is worth it. > > I don't think that this is really performance-critical. I accept your informed intuition, but I also have more suggestions. Don't assume the hash function is perfect: create the table with more buckets, say 2*size. Instead of calling find() then insert(), just call insert() and check whether it worked, something like this (untested): if (seen.insert(grobs[i]).second) grobs[j++] = grobs[i]; https://en.cppreference.com/w/cpp/container/unordered_set/insert
Sign in to reply to this message.
On 2020/01/24 15:50:22, Dan Eble wrote: > On 2020/01/24 15:17:02, dak wrote: > > On 2020/01/24 14:11:59, Dan Eble wrote: > > > On 2020/01/24 14:06:22, hanwenn wrote: > > > > If the allocation cost becomes problematic, we can use another hashmap > > > instead. > > > > > > "We" could profile it and know before pushing whether it is worth it. > > > > I don't think that this is really performance-critical. > > I accept your informed intuition, but I also have more suggestions. > > Don't assume the hash function is perfect: create the table with more buckets, > say 2*size. I'd not do that. It's second-guessing the implementation. The size should be good as a hint. In particular if there are actual duplicates... > Instead of calling find() then insert(), just call insert() and check whether it > worked, something like this (untested): > > if (seen.insert(grobs[i]).second) > grobs[j++] = grobs[i]; Ok, that's a good idea. > https://en.cppreference.com/w/cpp/container/unordered_set/insert
Sign in to reply to this message.
On Jan 24, 2020, at 11:11, dak@gnu.org wrote: >> Don't assume the hash function is perfect: create the table with more > buckets, >> say 2*size. > > I'd not do that. It's second-guessing the implementation. The size > should be good as a hint. In particular if there are actual > duplicates... You are right. I hadn't read the documentation closely enough. The constructor argument is a minimum bucket count.
Sign in to reply to this message.
dan eble
Sign in to reply to this message.
On 2020/01/24 15:50:22, Dan Eble wrote: > On 2020/01/24 15:17:02, dak wrote: > > On 2020/01/24 14:11:59, Dan Eble wrote: > > > On 2020/01/24 14:06:22, hanwenn wrote: > > > > If the allocation cost becomes problematic, we can use another hashmap > > > instead. > > > > > > "We" could profile it and know before pushing whether it is worth it. > > > > I don't think that this is really performance-critical. > > I accept your informed intuition, but I also have more suggestions. > > Don't assume the hash function is perfect: create the table with more buckets, > say 2*size. > > Instead of calling find() then insert(), just call insert() and check whether it > worked, something like this (untested): > > if (seen.insert(grobs[i]).second) > grobs[j++] = grobs[i]; > > https://en.cppreference.com/w/cpp/container/unordered_set/insert Done.
Sign in to reply to this message.
|