Issue 84790044: code review 84790044: runtime: no longer skip stack growth test in short mode

0intro

Hello dvyukov (cc: golang-codereviews@googlegroups.com, rsc), I'd like you to review this change to https://code.google.com/p/go

11 years, 2 months ago (2014-04-09 21:16:46 UTC) #1

dave_cheney.net

LGTM. On Thu, Apr 10, 2014 at 7:16 AM, <0intro@gmail.com> wrote: > Reviewers: dvyukov, > ...

11 years, 2 months ago (2014-04-09 21:18:05 UTC) #2

bradfitz

LGTM There might be a few more of these too. At least one in http ...

11 years, 2 months ago (2014-04-09 21:23:49 UTC) #3

rsc

next time you are going to disable a test for one OS, please use a ...

11 years, 2 months ago (2014-04-10 00:37:31 UTC) #4

0intro

*** Submitted as https://code.google.com/p/go/source/detail?r=9ef11603cde8 *** runtime: no longer skip stack growth test in short mode ...

11 years, 2 months ago (2014-04-10 04:37:54 UTC) #5

gobot

This CL appears to have broken the linux-arm-cheney-imx6 builder. See http://build.golang.org/log/d18afd8e1ca997a4672687d4d32f9f68d2fb503c

11 years, 2 months ago (2014-04-10 06:00:16 UTC) #6

0intro

The test is failing on both linux/arm and solaris. The test seems rather slow on ...

11 years, 2 months ago (2014-04-10 06:03:51 UTC) #7

0intro

It also happen sporadically on darwin/amd64 and netbsd/amd64. -- David du Colombier

11 years, 2 months ago (2014-04-10 06:29:36 UTC) #8

aram2

I can't reproduce builder failures on any of my machines, investigating... -- Aram Hăvărneanu

11 years, 2 months ago (2014-04-10 09:55:04 UTC) #9

aram2

Hmm, although I can't reproduce the failure, I do see some interesting performance degradation for ...

11 years, 2 months ago (2014-04-10 10:05:46 UTC) #10

Hmm, although I can't reproduce the failure, I do see some interesting
performance degradation for TestStackGrowth with cpu >= 2.

  : oos:runtime; GOMAXPROCS=2 go test -v -cpu=1,2,4 -run TestStackGrowth
runtime
  === RUN TestStackGrowth
  === RUN TestStackGrowthCallback
  --- PASS: TestStackGrowthCallback (2.23 seconds)
  --- PASS: TestStackGrowth (2.23 seconds)
  === RUN TestStackGrowth-2
  === RUN TestStackGrowthCallback-2
  --- PASS: TestStackGrowthCallback-2 (6.07 seconds)
  --- PASS: TestStackGrowth-2 (28.45 seconds)
  === RUN TestStackGrowth-4
  === RUN TestStackGrowthCallback-4
  --- PASS: TestStackGrowthCallback-4 (12.44 seconds)
  --- PASS: TestStackGrowth-4 (21.40 seconds)
  PASS
  ok   runtime 67.345s
  : oos:runtime; GOMAXPROCS=2 go test -v -cpu=1,2,4 -run TestStackGrowth
runtime
  === RUN TestStackGrowth
  === RUN TestStackGrowthCallback
  --- PASS: TestStackGrowthCallback (2.30 seconds)
  --- PASS: TestStackGrowth (2.30 seconds)
  === RUN TestStackGrowth-2
  === RUN TestStackGrowthCallback-2
  --- PASS: TestStackGrowthCallback-2 (7.99 seconds)
  --- PASS: TestStackGrowth-2 (28.72 seconds)
  === RUN TestStackGrowth-4
  === RUN TestStackGrowthCallback-4
  --- PASS: TestStackGrowthCallback-4 (12.23 seconds)
  --- PASS: TestStackGrowth-4 (21.16 seconds)
  PASS
  ok   runtime 67.270s

Only one one machine though, this is fine:

  04757c51-b5c2-4d23-9a69-1e9e305bc4da:aram$ GOMAXPROCS=2
/home/aram/go-up/bin/go test -cpu=1,2,4 -v -run TestStackGrowth
runtime
  === RUN TestStackGrowth
  === RUN TestStackGrowthCallback
  --- PASS: TestStackGrowthCallback (2.48 seconds)
  --- PASS: TestStackGrowth (2.48 seconds)
  === RUN TestStackGrowth-2
  === RUN TestStackGrowthCallback-2
  --- PASS: TestStackGrowthCallback-2 (2.61 seconds)
  --- PASS: TestStackGrowth-2 (4.57 seconds)
  === RUN TestStackGrowth-4
  === RUN TestStackGrowthCallback-4
  --- PASS: TestStackGrowthCallback-4 (5.63 seconds)
  --- PASS: TestStackGrowth-4 (5.73 seconds)
  PASS
  ok      runtime 14.741s
  04757c51-b5c2-4d23-9a69-1e9e305bc4da:aram$ GOMAXPROCS=2
/home/aram/go-up/bin/go test -cpu=1,2,4 -v -run TestStackGrowth
runtime
  === RUN TestStackGrowth
  === RUN TestStackGrowthCallback
  --- PASS: TestStackGrowthCallback (2.34 seconds)
  --- PASS: TestStackGrowth (2.34 seconds)
  === RUN TestStackGrowth-2
  === RUN TestStackGrowthCallback-2
  --- PASS: TestStackGrowthCallback-2 (3.07 seconds)
  --- PASS: TestStackGrowth-2 (4.76 seconds)
  === RUN TestStackGrowth-4
  === RUN TestStackGrowthCallback-4
  --- PASS: TestStackGrowthCallback-4 (5.49 seconds)
  --- PASS: TestStackGrowth-4 (5.61 seconds)
  PASS
  ok      runtime 14.344s

I'll test on the builder too.

-- 
Aram Hăvărneanu

Sign in to reply to this message.

0intro

It also happen on the Plan 9 builder, but it's much less frequent than before ...

11 years, 2 months ago (2014-04-10 16:27:30 UTC) #11

rsc

On Thu, Apr 10, 2014 at 6:05 AM, Aram Hăvărneanu <aram.h@mgk.ro> wrote: > Hmm, although ...

11 years, 2 months ago (2014-04-10 17:28:23 UTC) #12

aram2

> This looks like what was happening on the Plan 9 builder. There, the > ...

11 years, 2 months ago (2014-04-10 17:55:59 UTC) #13

> This looks like what was happening on the Plan 9 builder. There, the
> problem was that stoptheworld called notetsleep called semasleep to sleep
> for 100us, but in the tsemacquire system call that rounded down to 0ms,
> so it was hard-spinning on the cpu instead of sleeping and therefore
> rescheduling. The fix was to round up so that some real sleeping (and
> therefore rescheduling) always happens in that situation. GOMAXPROCS>1
> would exacerbate this.

That was my first hypothesis, but I can't see it. It looks like there's
actual sleeping going on. Also, the problem seems specific to the kernel
version. I can see it on older kernels, but on latest kernel from Joyent
I can't trigger this problem at all.

FWIW, this is on the "bad" machine:

  : oos:runtime; time runtime.test -test.cpu=4 -test.run
TestStackGrowth$ runtime
  PASS

  real 0m31.928s
  user 0m4.615s
  sys 0m0.393s
  : oos:runtime;

And this is on the "good" machine:

  real    0m6.064s
  user    0m3.954s
  sys     0m0.592s

The faulty machine doesn't have higher user time, and the ratio of the two
user times is very close to the ratio of the performance of the two CPUs.

So I ran this:

  dtrace -n 'syscall:::entry /pid==$target/ { @num[probefunc] =
count(); }' -c "./runtime.test -test.v -test.cpu=4 -test.run
TestStackGrowth$ runtime"

And compared the output on the bad Solaris machine to the good Solaris
machine. This is the bad machine:

dtrace: description 'syscall:::entry ' matched 231 probes
=== RUN TestStackGrowth-4
--- PASS: TestStackGrowth-4 (29.70 seconds)
PASS
dtrace: pid 13974 has exited

  getpid                                                            1
  getrlimit                                                         1
  rexit                                                             1
  brk                                                               2
  priocntlsys                                                       3
  write                                                             3
  sysconfig                                                         4
  lwp_continue                                                      8
  lwp_create                                                        8
  uucopy                                                            8
  schedctl                                                          9
  sigaltstack                                                       9
  setcontext                                                       10
  lwp_sigmask                                                      25
  mmap                                                             26
  sigaction                                                        68
  nanosleep                                                      5601
  yield                                                         36150
  lwp_park                                                      48428

And this is the good machine:

dtrace: description 'syscall:::entry ' matched 233 probes
=== RUN TestStackGrowth-4
--- PASS: TestStackGrowth-4 (5.30 seconds)
PASS
dtrace: pid 95944 has exited

  close                                                             1
  getpid                                                            1
  getrlimit                                                         1
  open                                                              1
  read                                                              1
  rexit                                                             1
  brk                                                               2
  write                                                             3
  sysconfig                                                         4
  priocntlsys                                                       5
  lwp_continue                                                      9
  lwp_create                                                        9
  uucopy                                                            9
  schedctl                                                         10
  sigaltstack                                                      10
  setcontext                                                       11
  lwp_sigmask                                                      28
  mmap                                                             28
  sigaction                                                        68
  nanosleep                                                     13993
  lwp_park                                                      73866
  yield                                                        151500

The good machine yields a lot more than the bad machine. This could be
because either the program state is different (for whatever reason),
or the program state is the same but libc yields differently. Let's
test this out. We'll look at lwp_park and yield, and aggregate the
stacks. We'll use this:

  dtrace -n 'syscall::lwp_park:entry, syscall::yield:entry
/pid==$target/ { @num[ustack()] = count(); }' -c "./runtime.test
-test.v -test.cpu=4 -test.run TestStackGrowth$ runtime"

This is the bad machine:

              libc.so.1`yield+0xa
              libc.so.1`sema_post+0x1fc
              libc.so.1`sem_post+0x38
              runtime.test`runtime.asmsysvicall6+0x50
             3141

              libc.so.1`__lwp_park+0x17
              libc.so.1`sema_reltimedwait+0x28
              libc.so.1`sem_reltimedwait_np+0x4b
              runtime.test`runtime.asmsysvicall6+0x50
             4028

              libc.so.1`__lwp_park+0x17
              libc.so.1`sema_wait+0x13
              libc.so.1`sem_wait+0x38
              runtime.test`runtime.asmsysvicall6+0x50
            20658

              libc.so.1`__lwp_unpark+0x14
              libc.so.1`sem_post+0x38
              runtime.test`runtime.asmsysvicall6+0x50
            24678

              libc.so.1`yield+0xa
              runtime.test`runtime.asmsysvicall6+0x50
            33669

And this is the good machine

              libc.so.1`yield+0xa
              libc.so.1`sema_post+0x1fc
              libc.so.1`sem_post+0x38
              runtime.test`runtime.asmsysvicall6+0x50
              175

              libc.so.1`__lwp_park+0x17
              libc.so.1`sema_reltimedwait+0x28
              libc.so.1`sem_reltimedwait_np+0x4b
              runtime.test`runtime.asmsysvicall6+0x50
             4090

              libc.so.1`__lwp_park+0x17
              libc.so.1`sema_wait+0x13
              libc.so.1`sem_wait+0x38
              runtime.test`runtime.asmsysvicall6+0x50
            32882

              libc.so.1`__lwp_unpark+0x14
              libc.so.1`sem_post+0x38
              runtime.test`runtime.asmsysvicall6+0x50
            36964

              libc.so.1`yield+0xa
              runtime.test`runtime.asmsysvicall6+0x50
           162616

So, we call sem_post (which eventually yields) twenty times less often, we
call sem_reltimedwait_np the same number of times, sem_post and sem_wait
is called more on the good machine, but not by much, and yield is called
five times more often on the good machine.

Now we know something in the program calls yield more often. Yield is
called directly only by runtime·osyield. Sem_post is called only by
runtime·semawakeup.

This syscall data is interesting, but we still don't know exactly where
we waste time. Instead of guesssing, we can measure:

  /opt/DTT/procsystime -e ./runtime.test -test.v -test.cpu=4 -test.run
TestStackGrowth$ runtime

This is the bad machine:

/opt/DTT/procsystime -eT ./runtime.test -test.v -test.cpu=4 -test.run
TestStackGrowth$ runtime
=== RUN TestStackGrowth-4
--- PASS: TestStackGrowth-4 (27.24 seconds)
PASS

Elapsed Times for command ./runtime.test -test.v -test.cpu=4 -test.run
TestStackGrowth$ runtime,

         SYSCALL          TIME (ns)
          getpid               1672
       getrlimit               3374
      systeminfo               3375
           close               3855
       sysconfig               6766
             brk              14064
     sigaltstack              17449
      setcontext              20898
     priocntlsys              24026
            open              25306
            stat              33366
    lwp_continue              34503
     resolvepath              42401
     lwp_sigmask              42990
        schedctl              48513
         mmapobj              50641
           write              73168
          getcwd              78745
       sigaction              91300
         memcntl             155442
      lwp_create             195676
          uucopy             222416
            mmap             251470
           yield          166226654
       nanosleep        51001870920
        lwp_park       190318570199
          TOTAL:       241488109189

And this is the good machine:

04757c51-b5c2-4d23-9a69-1e9e305bc4da:runtime$ pfexec
/opt/DTT/procsystime -eT ./runtime.test -test.v -test.cpu=4 -test.run
TestStackGrowth$ runtime
=== RUN TestStackGrowth-4
--- PASS: TestStackGrowth-4 (5.05 seconds)
PASS

Elapsed Times for command ./runtime.test -test.v -test.cpu=4 -test.run
TestStackGrowth$ runtime,

         SYSCALL          TIME (ns)
          getpid               1882
       getrlimit               3183
            read               4265
      systeminfo               5092
       sysconfig              10346
             brk              11545
           close              12225
     sigaltstack              17543
    lwp_continue              24542
            stat              39306
      setcontext              40303
     resolvepath              49826
     lwp_sigmask              50834
           write              55122
        schedctl              55252
          getcwd              64392
            open              67225
         mmapobj              78281
       sigaction              86023
     priocntlsys             130477
      lwp_create             159157
          uucopy             181476
            mmap             314897
         memcntl             439203
           yield          448265812
       nanosleep        12854501047
        lwp_park        30347023675
          TOTAL:        43651692931

You might wonder why the total times are much greater than the real
times. That is because this tool measures time spent in system calls
accross all threads, and there are many threads in a Go process. The
lwp_park numbers are very skewed because many threads in a Go process
are parked almost all the time.

I suspect all threads are created early, they live for the whole process
lifetime, and only 4 threads are doing anything, the reast are parked. I'm
not insane to use this hypothesis without testing it, so let's test it:

  dtrace -qn 'syscall::lwp_create:entry, syscall::lwp_exit:entry
/pid==$target/ { printf("%d %d %s\n", timestamp, tid, probefunc) }' -c
"./runtime.test -test.v -test.cpu=4 -test.run TestStackGrowth$
runtime"

This is the bad machine:

=== RUN TestStackGrowth-4
22270995985559 1 lwp_create
22271148086316 1 lwp_create
22271148201956 3 lwp_create
22271866554369 4 lwp_create
22271877971811 1 lwp_create
22271878464099 5 lwp_create
22271890776999 7 lwp_create
22271984380961 4 lwp_create

And this is the good machine:

11721043519613302 1 lwp_create
11721043603792260 1 lwp_create
11721043603941641 3 lwp_create
11721044386989497 4 lwp_create
11721044388789803 1 lwp_create
11721044392657155 6 lwp_create
11721044417728140 7 lwp_create
11721044916077749 4 lwp_create
11721044916149965 8 lwp_create

I wasn't very wrong, but I wasn't perfectly accurate either. Threads are
created early and live long, but they are created in two steps. First,
a bunch of threads are created, then, with a delay of 0.1s for the
bad machine and 0.5s for the good machine, another set of threads is
created. For the bad machine this set only has one thread, for the good
machine two threads are created almost at the same time. What are these
threads? The obvious suspect is the printing this line

    === RUN TestStackGrowth-4

could create a thread, but this doesn't explain why two threads are
created for the good machine. Let's disable that write to eliminate
unknowns.

Bad machine:

23198516208356 1 lwp_create
23198678977029 1 lwp_create
23198679085298 3 lwp_create
23199387572555 4 lwp_create
23199388103588 4 lwp_create
23199388764763 5 lwp_create
23199390491253 5 lwp_create
23199450598099 8 lwp_create

Good machine:

11722471357043810 1 lwp_create
11722471491789019 1 lwp_create
11722471491917276 3 lwp_create
11722472251577077 4 lwp_create
11722472251859397 4 lwp_create
11722472252279168 5 lwp_create
11722472282499772 5 lwp_create
11722472440879247 6 lwp_create

There is still a 0.15 delay, but it's consistent accross machine, and at
least now they have the same number of threads! Unfortunately, we can't
affort not to use -v and still use procsystime, becase we can't time
the tests precisely. We can't use time(1) to call procsystime because
dtrace initialization takes long enough to invalidate the tests on the
good machine (I tried). The best strategy is to abandon procsystime and
use dtrace directly. Let's do this.

We'll just measure lwp_park block time and plot it. We use this:

  dtrace -n 'syscall::lwp_park:entry /pid==$target/ { self->ts =
timestamp } syscall::lwp_park:return /self->ts/ { @time[tid] =
quantize(timestamp - self->ts); self->ts = 0; } ' -c "./runtime.test
-test.cpu=4 -test.run TestStackGrowth$ runtime"

We get these nice plots (fixed width font required). For the bad machine:

        1
           value  ------------- Distribution ------------- count
            1024 |                                         0
            2048 |@@@@@@@                                  910
            4096 |@@@@@@@@@@@@                             1630
            8192 |@                                        100
           16384 |@@                                       229
           32768 |@@@@@@                                   743
           65536 |@@@@@                                    736
          131072 |@@                                       268
          262144 |@                                        100
          524288 |                                         6
         1048576 |                                         1
         2097152 |                                         0
         4194304 |                                         0
         8388608 |@@@@                                     515
        16777216 |@                                        134
        33554432 |                                         7
        67108864 |                                         7
       134217728 |                                         1
       268435456 |                                         1
       536870912 |                                         0
      1073741824 |                                         1
      2147483648 |                                         0
      4294967296 |                                         0
      8589934592 |                                         1
     17179869184 |                                         0

        2
           value  ------------- Distribution ------------- count
            4096 |                                         0
            8192 |                                         4
           16384 |                                         8
           32768 |                                         10
           65536 |                                         23
          131072 |@                                        53
          262144 |@@@@@                                    396
          524288 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@         2347
         1048576 |@                                        40
         2097152 |                                         7
         4194304 |                                         2
         8388608 |                                         5
        16777216 |                                         0

        3
           value  ------------- Distribution ------------- count
      8589934592 |                                         0
     17179869184 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
     34359738368 |                                         0

        4
           value  ------------- Distribution ------------- count
             512 |                                         0
            1024 |                                         2
            2048 |@@@@@@@@                                 2149
            4096 |@@@@@@@@@@@@@@@@@@                       4619
            8192 |@@@@                                     1140
           16384 |@@@@                                     1028
           32768 |@@@                                      826
           65536 |                                         113
          131072 |                                         25
          262144 |                                         45
          524288 |                                         29
         1048576 |                                         15
         2097152 |                                         2
         4194304 |                                         4
         8388608 |                                         88
        16777216 |@                                        242
        33554432 |                                         41
        67108864 |                                         17
       134217728 |                                         14
       268435456 |                                         7
       536870912 |                                         3
      1073741824 |                                         0
        5
           value  ------------- Distribution ------------- count
             512 |                                         0
            1024 |                                         1
            2048 |@@                                       231
            4096 |@@@@@@@@@@@@@@@                          1499
            8192 |@                                        110
           16384 |@                                        125
           32768 |@@@                                      328
           65536 |@@                                       197
          131072 |@@@                                      356
          262144 |                                         42
          524288 |                                         22
         1048576 |                                         50
         2097152 |                                         4
         4194304 |                                         1
         8388608 |@@@@@@@@                                 857
        16777216 |@@                                       251
        33554432 |                                         15
        67108864 |                                         6
       134217728 |                                         0
       268435456 |                                         0
       536870912 |                                         1
      1073741824 |                                         0
      2147483648 |                                         1
      4294967296 |                                         1
      8589934592 |                                         0

        6
           value  ------------- Distribution ------------- count
            1024 |                                         0
            2048 |@@                                       451
            4096 |@@@@@@@@@@@@@@@@                         3030
            8192 |@                                        179
           16384 |@@                                       315
           32768 |@@@                                      646
           65536 |@@                                       327
          131072 |@@@                                      557
          262144 |@@                                       342
          524288 |@                                        135
         1048576 |                                         45
         2097152 |                                         10
         4194304 |                                         2
         8388608 |@@@@@@@                                  1373
        16777216 |@@                                       331
        33554432 |                                         25
        67108864 |                                         15
       134217728 |                                         4
       268435456 |                                         0

        7
           value  ------------- Distribution ------------- count
             512 |                                         0
            1024 |                                         1
            2048 |@@@@@                                    1181
            4096 |@@@@@@@@@@@@@@@@                         3972
            8192 |@@                                       414
           16384 |@@                                       555
           32768 |@@@@                                     874
           65536 |@                                        305
          131072 |@@                                       513
          262144 |@                                        221
          524288 |@                                        128
         1048576 |                                         6
         2097152 |                                         0
         4194304 |                                         2
         8388608 |@@@@@                                    1246
        16777216 |@                                        285
        33554432 |                                         28
        67108864 |                                         7
       134217728 |                                         5
       268435456 |                                         1
       536870912 |                                         0

        8
           value  ------------- Distribution ------------- count
             512 |                                         0
            1024 |                                         3
            2048 |@@@@@@@                                  1631
            4096 |@@@@@@@@@@@                              2327
            8192 |@@                                       394
           16384 |@@@                                      617
           32768 |@@                                       397
           65536 |@@@                                      568
          131072 |@@@                                      590
          262144 |@                                        311
          524288 |                                         85
         1048576 |                                         8
         2097152 |                                         0
         4194304 |                                         0
         8388608 |@@@@@@@                                  1496
        16777216 |@                                        294
        33554432 |                                         13
        67108864 |                                         12
       134217728 |                                         3
       268435456 |                                         3
       536870912 |                                         0
      1073741824 |                                         1
      2147483648 |                                         0

        9
           value  ------------- Distribution ------------- count
            1024 |                                         0
            2048 |@@                                       3
            4096 |@@@@@@@@@                                16
            8192 |@                                        1
           16384 |@                                        1
           32768 |@@                                       3
           65536 |                                         0
          131072 |@@@@@                                    9
          262144 |@                                        1
          524288 |@@@                                      5
         1048576 |                                         0
         2097152 |                                         0
         4194304 |                                         0
         8388608 |@@@@@@@@@@@@@@@                          28
        16777216 |@@                                       4
        33554432 |                                         0
        67108864 |@                                        1
       134217728 |                                         0
       268435456 |                                         0
       536870912 |                                         0
      1073741824 |                                         0
      2147483648 |                                         0
      4294967296 |                                         0
      8589934592 |                                         0
     17179869184 |@                                        1
     34359738368 |                                         0

For the good machine:

        1
           value  ------------- Distribution ------------- count
             512 |                                         0
            1024 |                                         9
            2048 |@@@@@@@@@                                1533
            4096 |@@@@@@@@@@@@                             1896
            8192 |@@                                       284
           16384 |@@                                       375
           32768 |@@@@                                     675
           65536 |@@@@@                                    890
          131072 |@                                        203
          262144 |@                                        100
          524288 |@@                                       353
         1048576 |@                                        130
         2097152 |                                         19
         4194304 |                                         17
         8388608 |                                         6
        16777216 |                                         1
        33554432 |                                         1
        67108864 |                                         1
       134217728 |                                         2
       268435456 |                                         0
       536870912 |                                         2
      1073741824 |                                         1
      2147483648 |                                         0

        2
           value  ------------- Distribution ------------- count
            4096 |                                         0
            8192 |                                         15
           16384 |                                         42
           32768 |@                                        64
           65536 |@                                        147
          131072 |@@@@@@@                                  769
          262144 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@            3325
          524288 |@@                                       270
         1048576 |                                         13
         2097152 |                                         0
         4194304 |                                         0
         8388608 |                                         1
        16777216 |                                         0

        3
           value  ------------- Distribution ------------- count
      2147483648 |                                         0
      4294967296 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
      8589934592 |                                         0

        4
           value  ------------- Distribution ------------- count
             256 |                                         0
             512 |                                         3
            1024 |                                         27
            2048 |@@@@@@@@@                                2661
            4096 |@@@@@@@@                                 2475
            8192 |@                                        353
           16384 |@@@                                      849
           32768 |@@@@@                                    1461
           65536 |@@@@@@@                                  2068
          131072 |@                                        425
          262144 |                                         136
          524288 |@@@                                      894
         1048576 |@                                        302
         2097152 |                                         29
         4194304 |                                         32
         8388608 |                                         60
        16777216 |                                         12
        33554432 |                                         2
        67108864 |                                         1
       134217728 |                                         0
       268435456 |                                         1
       536870912 |                                         0

        5
           value  ------------- Distribution ------------- count
             512 |                                         0
            1024 |                                         9
            2048 |@@@@@@@                                  2047
            4096 |@@@@@@@@@@@@@@                           4371
            8192 |@                                        438
           16384 |@@@                                      1008
           32768 |@@@@@                                    1387
           65536 |@@@@@                                    1562
          131072 |@                                        321
          262144 |                                         92
          524288 |@@                                       747
         1048576 |                                         110
         2097152 |                                         31
         4194304 |                                         20
         8388608 |                                         8
        16777216 |                                         6
        33554432 |                                         0
        67108864 |                                         2
       134217728 |                                         5
       268435456 |                                         0

        6
           value  ------------- Distribution ------------- count
             512 |                                         0
            1024 |                                         100
            2048 |@@@@@@@@@@@@@@@@@@@@@                    9123
            4096 |@@@@@@@                                  3155
            8192 |@@@                                      1363
           16384 |@@@                                      1478
           32768 |@                                        626
           65536 |@                                        352
          131072 |                                         92
          262144 |                                         83
          524288 |                                         121
         1048576 |@                                        394
         2097152 |@                                        621
         4194304 |                                         36
         8388608 |                                         6
        16777216 |                                         0

        7
           value  ------------- Distribution ------------- count
             512 |                                         0
            1024 |                                         10
            2048 |@@@@@@@                                  316
            4096 |@@@@                                     190
            8192 |@                                        62
           16384 |@@                                       111
           32768 |@@@@@@                                   254
           65536 |@@@@@@@                                  317
          131072 |@@@@                                     160
          262144 |                                         22
          524288 |@@@@@@                                   275
         1048576 |@                                        50
         2097152 |                                         2
         4194304 |                                         3
         8388608 |                                         3
        16777216 |                                         4
        33554432 |                                         3
        67108864 |                                         0
       134217728 |                                         0
       268435456 |                                         0
       536870912 |                                         0
      1073741824 |                                         0
      2147483648 |                                         1
      4294967296 |                                         0

        8
           value  ------------- Distribution ------------- count
             512 |                                         0
            1024 |                                         14
            2048 |@@@@@                                    1271
            4096 |@@@@@@@@@@@@@                            3326
            8192 |@                                        253
           16384 |@@@                                      853
           32768 |@@@@@@                                   1414
           65536 |@@@@@@@                                  1755
          131072 |@                                        374
          262144 |                                         127
          524288 |@@@                                      657
         1048576 |@                                        164
         2097152 |                                         25
         4194304 |                                         19
         8388608 |                                         11
        16777216 |                                         4
        33554432 |                                         1
        67108864 |                                         5
       134217728 |                                         1
       268435456 |                                         1
       536870912 |                                         0

        9
           value  ------------- Distribution ------------- count
             512 |                                         0
            1024 |                                         8
            2048 |@@@@@@@@@                                2014
            4096 |@@@@@@@@@                                2115
            8192 |@                                        300
           16384 |@@@                                      615
           32768 |@@@@@                                    1201
           65536 |@@@@@@@                                  1677
          131072 |@@                                       352
          262144 |@                                        118
          524288 |@@@                                      615
         1048576 |@                                        241
         2097152 |                                         29
         4194304 |                                         17
         8388608 |                                         7
        16777216 |                                         2
        33554432 |                                         3
        67108864 |                                         1
       134217728 |                                         0
       268435456 |                                         0
       536870912 |                                         0
      1073741824 |                                         1
      2147483648 |                                         0

Compare the plots for threads 1, 4, 5, 6, 7, 8. Let's take thread 7
for example:

bad:

          131072 |@@                                       513
          262144 |@                                        221
          524288 |@                                        128
         1048576 |                                         6
         2097152 |                                         0
         4194304 |                                         2
         8388608 |@@@@@                                    1246
        16777216 |@                                        285
        33554432 |                                         28

good:

          131072 |@@@@                                     160
          262144 |                                         22
          524288 |@@@@@@                                   275
         1048576 |@                                        50
         2097152 |                                         2
         4194304 |                                         3
         8388608 |                                         3
        16777216 |                                         4
        33554432 |                                         3

In general, the statistics for time spent parked match the low numbers,
but the bad machine spends a second "cluster" of time, centered around
10ms wait time. I don't know what that 10ms is yet.

-- 
Aram Hăvărneanu

Sign in to reply to this message.

rsc

Please hg sync to pick up CL 86620043 and then run with GODEBUG=gctrace=1.

11 years, 2 months ago (2014-04-10 18:35:11 UTC) #14

aram2

> Please hg sync to pick up CL 86620043 and then run with > GODEBUG=gctrace=1. ...

11 years, 2 months ago (2014-04-10 19:11:27 UTC) #15

rsc

grep hires /etc/system on both machines. The delay is being caused by the two or ...

11 years, 2 months ago (2014-04-11 03:56:52 UTC) #16

aram2

On Fri, Apr 11, 2014 at 5:56 AM, Russ Cox <rsc@golang.org> wrote: > grep hires ...

11 years, 2 months ago (2014-04-11 09:53:50 UTC) #17

aram2

I filled issue 7763 so we have a record of this. Unfortunately, using HPETs on ...

11 years, 2 months ago (2014-04-11 11:25:36 UTC) #18

aram2

Ok, so I replaced the sleeps with yields, and problem is solved: http://paste.ubuntu.com/7234785/ -- Aram ...

11 years, 2 months ago (2014-04-11 12:54:34 UTC) #19

aram2

Also, I decided agains the optional HPET thing, it's complicated and involves signals. I'll pass. ...

11 years, 2 months ago (2014-04-11 12:58:03 UTC) #20

rsc

On Fri, Apr 11, 2014 at 7:25 AM, Aram Hăvărneanu <aram.h@mgk.ro> wrote: > I filled ...

11 years, 2 months ago (2014-04-11 13:24:57 UTC) #21

aram2

11 years, 2 months ago (2014-04-11 13:35:13 UTC) #22

On Fri, Apr 11, 2014 at 3:24 PM, Russ Cox <rsc@golang.org> wrote:
> It's fine but after the release freeze is over.

No, it's ridiculously complicated, I'll pass.

> As for your bad-vs-good machines, I'm basically certain that the bad machine
> is using 100 Hz timers and the good machine is using 1000 Hz timers

Yes, you're right, the good machines ticks at 1kHz, I should have
measured before.

-- 
Aram Hăvărneanu

Sign in to reply to this message.

Issue 84790044: code review 84790044: runtime: no longer skip stack growth test in short mode (Closed)

Description

Patch Set 1 #

Patch Set 2 : diff -r 98e63c935796 https://code.google.com/p/go #

Patch Set 3 : diff -r cd08410c17bf https://code.google.com/p/go #

Messages