Hello golang-dev@googlegroups.com (cc: msolomon@google.com, sougou@google.com), I'd like you to review this change to https://dvyukov%40google.com@code.google.com/p/go/
11 years, 4 months ago
(2012-12-28 17:58:13 UTC)
#1
On 2012/12/28 19:52:11, sougou wrote: > vtocc test run on 10M queries using 100 connections: ...
11 years, 4 months ago
(2012-12-28 20:34:08 UTC)
#3
On 2012/12/28 19:52:11, sougou wrote:
> vtocc test run on 10M queries using 100 connections:
> Old runtime StackSys started at 26MB and kept growing @2MB/min.
> New runtime StackSys started at 3MB and stayed there.
Wow! looks impressive:
> memstats.Sys 125461816
> memstats.Sys 43672888
I am essentially trading speed for memory. I think I need to benchmark
performance when global cache is accessed frequently. Do we have a parallel
benchmark with deep stacks?
I saw a 10% drop in throughput compared to a build from October (5.5k vs ...
11 years, 4 months ago
(2012-12-28 20:59:30 UTC)
#4
I saw a 10% drop in throughput compared to a build from October (5.5k vs 6k+),
but the circumstances might not have been the same. I'll rerun them using the
same query sets, etc. and see what I get.
If you see some performance degradation try to tune StackPerThreadLWM/StackPerThreadHWM (to say 64/128). On Sat, ...
11 years, 4 months ago
(2012-12-28 21:04:01 UTC)
#5
If you see some performance degradation try to tune
StackPerThreadLWM/StackPerThreadHWM (to say 64/128).
On Sat, Dec 29, 2012 at 12:59 AM, <sougou@google.com> wrote:
> I saw a 10% drop in throughput compared to a build from October (5.5k vs
> 6k+), but the circumstances might not have been the same. I'll rerun
> them using the same query sets, etc. and see what I get.
>
> https://codereview.appspot.com/6997052/
On 2012/12/28 17:58:13, dvyukov wrote: > Hello mailto:golang-dev@googlegroups.com (cc: mailto:msolomon@google.com, mailto:sougou@google.com), > > I'd like ...
11 years, 3 months ago
(2012-12-29 13:00:52 UTC)
#6
On 2012/12/28 17:58:13, dvyukov wrote:
> Hello mailto:golang-dev@googlegroups.com (cc: mailto:msolomon@google.com,
mailto:sougou@google.com),
>
> I'd like you to review this change to
> https://dvyukov%2540google.com%40code.google.com/p/go/
Hey, thanks for an interesting patch. Just want to give some feedback here.
I have a ray tracer app which consumed memory during run (800x600 image
rendering, 100 rays per pixel, 10 goroutines, 1 goroutine per image row,
GOMAXPROCS=4) starting from 20 megs and ending up with 100 megs or so in 5
minutes of intensive CPU usage. I know there are no allocs in the ray tracer
itself, so it must be stack segments cache. So, I tried your patch set. It's
amazing. Memory consumption went down to 20 megs at the beginning (same) and 30
megs at the end. And I see no performance degradation whatsoever.
So, summary:
Before: 20 to 100 megs during run.
After: 20 to 30 megs during run.
In both cases rendering time was 5 minutes 5 seconds.
Так что спасибо огромное за патч, я как раз хотел выяснить от чего такое
поведение у проги.
On 2012/12/29 13:00:52, nsf wrote: > On 2012/12/28 17:58:13, dvyukov wrote: > > Hello mailto:golang-dev@googlegroups.com ...
11 years, 3 months ago
(2012-12-29 13:17:21 UTC)
#7
On 2012/12/29 13:00:52, nsf wrote:
> On 2012/12/28 17:58:13, dvyukov wrote:
> > Hello mailto:golang-dev@googlegroups.com (cc: mailto:msolomon@google.com,
> mailto:sougou@google.com),
> >
> > I'd like you to review this change to
> > https://dvyukov%252540google.com%2540code.google.com/p/go/
>
> Hey, thanks for an interesting patch. Just want to give some feedback here.
>
> I have a ray tracer app which consumed memory during run (800x600 image
> rendering, 100 rays per pixel, 10 goroutines, 1 goroutine per image row,
> GOMAXPROCS=4) starting from 20 megs and ending up with 100 megs or so in 5
> minutes of intensive CPU usage. I know there are no allocs in the ray tracer
> itself, so it must be stack segments cache. So, I tried your patch set. It's
> amazing. Memory consumption went down to 20 megs at the beginning (same) and
30
> megs at the end. And I see no performance degradation whatsoever.
>
> So, summary:
> Before: 20 to 100 megs during run.
> After: 20 to 30 megs during run.
>
> In both cases rendering time was 5 minutes 5 seconds.
>
> Так что спасибо огромное за патч, я как раз хотел выяснить от чего такое
> поведение у проги.
Great results! Thanks for testing.
I've added the benchmark, BenchmarkStackGrowthDeep, that growth and shrinks stack in multiple goroutines. There is ...
11 years, 3 months ago
(2012-12-29 20:08:51 UTC)
#8
I've added the benchmark, BenchmarkStackGrowthDeep, that growth and shrinks
stack in multiple goroutines. There is significant slowdown on this synthetic
benchmark:
benchmark old ns/op new ns/op delta
BenchmarkStackGrowthDeep 94101 109271 +16.12%
BenchmarkStackGrowthDeep-2 47576 70916 +49.06%
BenchmarkStackGrowthDeep-4 25687 67188 +161.56%
BenchmarkStackGrowthDeep-8 13592 77776 +472.22%
BenchmarkStackGrowthDeep-16 9695 78721 +711.98%
BenchmarkStackGrowthDeep-32 11679 76796 +557.56%
BenchmarkStackGrowthDeep-64 12623 88951 +604.67%
I would prefer to fix the performance in a separate patch.
On 2012/12/29 <dvyukov@google.com> wrote: > I've added the benchmark, BenchmarkStackGrowthDeep, that growth and > shrinks ...
11 years, 3 months ago
(2012-12-29 20:34:19 UTC)
#9
On 2012/12/29 <dvyukov@google.com> wrote:
> I've added the benchmark, BenchmarkStackGrowthDeep, that growth and
> shrinks stack in multiple goroutines. There is significant slowdown on
> this synthetic benchmark:
>
> benchmark old ns/op new ns/op delta
> BenchmarkStackGrowthDeep 94101 109271 +16.12%
> BenchmarkStackGrowthDeep-2 47576 70916 +49.06%
> BenchmarkStackGrowthDeep-4 25687 67188 +161.56%
> BenchmarkStackGrowthDeep-8 13592 77776 +472.22%
> BenchmarkStackGrowthDeep-16 9695 78721 +711.98%
> BenchmarkStackGrowthDeep-32 11679 76796 +557.56%
> BenchmarkStackGrowthDeep-64 12623 88951 +604.67%
>
> I would prefer to fix the performance in a separate patch.
>
>
> https://codereview.appspot.com/6997052/
Is it related to the watermark values? They seem very small and I
would expect the stack segment cache size to be roughly of the same
order of magnitude of the size of an OS thread (a few megabytes).
Rémy.
On Sun, Dec 30, 2012 at 12:34 AM, Rémy Oudompheng <remyoudompheng@gmail.com> wrote: > On 2012/12/29 ...
11 years, 3 months ago
(2012-12-29 20:40:03 UTC)
#10
On Sun, Dec 30, 2012 at 12:34 AM, Rémy Oudompheng
<remyoudompheng@gmail.com> wrote:
> On 2012/12/29 <dvyukov@google.com> wrote:
>> I've added the benchmark, BenchmarkStackGrowthDeep, that growth and
>> shrinks stack in multiple goroutines. There is significant slowdown on
>> this synthetic benchmark:
>>
>> benchmark old ns/op new ns/op delta
>> BenchmarkStackGrowthDeep 94101 109271 +16.12%
>> BenchmarkStackGrowthDeep-2 47576 70916 +49.06%
>> BenchmarkStackGrowthDeep-4 25687 67188 +161.56%
>> BenchmarkStackGrowthDeep-8 13592 77776 +472.22%
>> BenchmarkStackGrowthDeep-16 9695 78721 +711.98%
>> BenchmarkStackGrowthDeep-32 11679 76796 +557.56%
>> BenchmarkStackGrowthDeep-64 12623 88951 +604.67%
>>
>> I would prefer to fix the performance in a separate patch.
>>
>>
>> https://codereview.appspot.com/6997052/
>
> Is it related to the watermark values? They seem very small and I
> would expect the stack segment cache size to be roughly of the same
> order of magnitude of the size of an OS thread (a few megabytes).
Yes, it is related, but just increasing them won't help.
AFAIR, by default Go runtime requests just 64k for thread (g0) stack.
On 2012/12/29 20:40:03, dvyukov wrote: > On Sun, Dec 30, 2012 at 12:34 AM, Rémy ...
11 years, 3 months ago
(2012-12-30 18:08:27 UTC)
#11
On 2012/12/29 20:40:03, dvyukov wrote:
> On Sun, Dec 30, 2012 at 12:34 AM, Rémy Oudompheng
> <mailto:remyoudompheng@gmail.com> wrote:
> > On 2012/12/29 <mailto:dvyukov@google.com> wrote:
> >> I've added the benchmark, BenchmarkStackGrowthDeep, that growth and
> >> shrinks stack in multiple goroutines. There is significant slowdown on
> >> this synthetic benchmark:
> >>
> >> benchmark old ns/op new ns/op delta
> >> BenchmarkStackGrowthDeep 94101 109271 +16.12%
> >> BenchmarkStackGrowthDeep-2 47576 70916 +49.06%
> >> BenchmarkStackGrowthDeep-4 25687 67188 +161.56%
> >> BenchmarkStackGrowthDeep-8 13592 77776 +472.22%
> >> BenchmarkStackGrowthDeep-16 9695 78721 +711.98%
> >> BenchmarkStackGrowthDeep-32 11679 76796 +557.56%
> >> BenchmarkStackGrowthDeep-64 12623 88951 +604.67%
> >>
> >> I would prefer to fix the performance in a separate patch.
> >>
> >>
> >> https://codereview.appspot.com/6997052/
> >
> > Is it related to the watermark values? They seem very small and I
> > would expect the stack segment cache size to be roughly of the same
> > order of magnitude of the size of an OS thread (a few megabytes).
>
>
> Yes, it is related, but just increasing them won't help.
>
> AFAIR, by default Go runtime requests just 64k for thread (g0) stack.
Please hold on. I am working on similar patch that makes slowdown more
reasonable:
BenchmarkStackGrowthDeep 97231 94391 -2.92%
BenchmarkStackGrowthDeep-2 47230 58562 +23.99%
BenchmarkStackGrowthDeep-4 24993 49356 +97.48%
BenchmarkStackGrowthDeep-8 15105 30072 +99.09%
BenchmarkStackGrowthDeep-16 10005 15623 +56.15%
BenchmarkStackGrowthDeep-32 12517 13069 +4.41%
https://codereview.appspot.com/7029044/
The code almost completely rewritten, so it makes little sense to review this
patch.
On 2012/12/30 18:08:27, dvyukov wrote: > On 2012/12/29 20:40:03, dvyukov wrote: > > On Sun, ...
11 years, 3 months ago
(2013-01-03 17:43:48 UTC)
#12
On 2012/12/30 18:08:27, dvyukov wrote:
> On 2012/12/29 20:40:03, dvyukov wrote:
> > On Sun, Dec 30, 2012 at 12:34 AM, Rémy Oudompheng
> > <mailto:remyoudompheng@gmail.com> wrote:
> > > On 2012/12/29 <mailto:dvyukov@google.com> wrote:
> > >> I've added the benchmark, BenchmarkStackGrowthDeep, that growth and
> > >> shrinks stack in multiple goroutines. There is significant slowdown on
> > >> this synthetic benchmark:
> > >>
> > >> benchmark old ns/op new ns/op delta
> > >> BenchmarkStackGrowthDeep 94101 109271 +16.12%
> > >> BenchmarkStackGrowthDeep-2 47576 70916 +49.06%
> > >> BenchmarkStackGrowthDeep-4 25687 67188 +161.56%
> > >> BenchmarkStackGrowthDeep-8 13592 77776 +472.22%
> > >> BenchmarkStackGrowthDeep-16 9695 78721 +711.98%
> > >> BenchmarkStackGrowthDeep-32 11679 76796 +557.56%
> > >> BenchmarkStackGrowthDeep-64 12623 88951 +604.67%
> > >>
> > >> I would prefer to fix the performance in a separate patch.
> > >>
> > >>
> > >> https://codereview.appspot.com/6997052/
> > >
> > > Is it related to the watermark values? They seem very small and I
> > > would expect the stack segment cache size to be roughly of the same
> > > order of magnitude of the size of an OS thread (a few megabytes).
> >
> >
> > Yes, it is related, but just increasing them won't help.
> >
> > AFAIR, by default Go runtime requests just 64k for thread (g0) stack.
>
> Please hold on. I am working on similar patch that makes slowdown more
> reasonable:
>
> BenchmarkStackGrowthDeep 97231 94391 -2.92%
> BenchmarkStackGrowthDeep-2 47230 58562 +23.99%
> BenchmarkStackGrowthDeep-4 24993 49356 +97.48%
> BenchmarkStackGrowthDeep-8 15105 30072 +99.09%
> BenchmarkStackGrowthDeep-16 10005 15623 +56.15%
> BenchmarkStackGrowthDeep-32 12517 13069 +4.41%
> https://codereview.appspot.com/7029044/
>
> The code almost completely rewritten, so it makes little sense to review this
> patch.
I've sent https://codereview.appspot.com/7029044/ for review. That changes
increases per-thread caches (12MB StackSys consumption on the test instead of
4MB), but scales much more gracefully.
Issue 6997052: code review 6997052: runtime: less aggressive per-thread stack segment caching
(Closed)
Created 11 years, 4 months ago by dvyukov
Modified 11 years, 3 months ago
Reviewers:
Base URL:
Comments: 0