Issue 108700045: code review 108700045: runtime: implement monotonic clocks on windows

brainman

Hello golang-codereviews@googlegroups.com, I'd like you to review this change to https://go.googlecode.com/hg/

10 years, 11 months ago (2014-07-16 05:50:31 UTC) #1

brainman

Sorry, Dmitriy, but GetTickCount works for me. I will use it for now. But I ...

10 years, 11 months ago (2014-07-16 05:51:37 UTC) #2

dvyukov

https://codereview.appspot.com/108700045/diff/40001/src/pkg/runtime/os_windows.c File src/pkg/runtime/os_windows.c (right): https://codereview.appspot.com/108700045/diff/40001/src/pkg/runtime/os_windows.c#newcode274 src/pkg/runtime/os_windows.c:274: static uint64 tclow, tchigh; you declare it as uint64, ...

10 years, 11 months ago (2014-07-16 08:12:11 UTC) #3

https://codereview.appspot.com/108700045/diff/40001/src/pkg/runtime/os_windows.c
File src/pkg/runtime/os_windows.c (right):

https://codereview.appspot.com/108700045/diff/40001/src/pkg/runtime/os_window...
src/pkg/runtime/os_windows.c:274: static uint64 tclow, tchigh;
you declare it as uint64, but then use only low 32-bits and always shift
I would increment it by (1<<32) and then just directly add

https://codereview.appspot.com/108700045/diff/40001/src/pkg/runtime/os_window...
src/pkg/runtime/os_windows.c:279: tchigh++;
hello, data race
after fifty days, several threads can increment tchigh by fifty days

https://codereview.appspot.com/108700045/diff/40001/src/pkg/runtime/os_window...
src/pkg/runtime/os_windows.c:280: tclow = c;
you. absolutely. must. not. write. to. a shared. variable. on fast. path.
this kills performance

you need something along the lines of:

static Lock lock;
static uint64 high;
static uint32 phase;
uint32 t;

t = GetTickCount();
if((t>>31) != phase) {
  runtime.lock(&lock);
  if((t>>31) != phase) {
    if(phase == 1)
      high += 1ull<<32;
    phase = !phase;
  }
  runtime.unlock(&lock);
}
return high + t;

But even that is incorrect, because a thread can get t=1<<32-1, pass the phase
check; then another thread gets t=0 and increments high; then the first thread
reads already incremented high and returns value that is off by 50 days.

static Lock lock;
static uint32 high; // high bit is phase, 32 bits so that atomic load is
inexpensive
uint32 t, h;

t = GetTickCount();
h = runtime.atomicload(&high);
if((t^h)>>31) != 0) {
  runtime.lock(&lock);
  h = runtime.atomicload(&high);
  if((t^h)>>31) != 0) {
    if((h>>31) == 1)
      h += 1;
    h ^= 1<<31; // switch phase
    runtime.atomicstore(&high, h);
  }
  runtime.unlock(&lock);
}
return ((uint64)(h&~(1<<31)))<<32 + t;

This is incorrect as well. Because a slow thread can do double phase switch.
I think the following should do:

static Lock lock;
static uint32 high; // high bit is phase, 32 bits so that atomic load is
inexpensive
uint32 t, h;

t = GetTickCount();
h = runtime.atomicload(&high);
if((t^h)>>31) != 0) {  // phase check
  runtime.lock(&lock);
  t = GetTickCount();  // reload so that a slow thread don't do double phase
switch
  h = runtime.atomicload(&high);
  if((t^h)>>31) != 0) {
    if((h>>31) == 1)
      h += 1;
    h ^= 1<<31; // switch phase
    runtime.atomicstore(&high, h);
  }
  runtime.unlock(&lock);
}
return ((uint64)(h&~(1<<31)))<<32 + t;

Sign in to reply to this message.

dvyukov

On 2014/07/16 05:51:37, brainman wrote: > Sorry, Dmitriy, but GetTickCount works for me. I will ...

10 years, 11 months ago (2014-07-16 08:13:46 UTC) #4

brainman

Hello golang-codereviews@googlegroups.com, dvyukov@google.com (cc: golang-codereviews@googlegroups.com), Please take another look.

10 years, 11 months ago (2014-07-17 06:23:44 UTC) #5

brainman

On 2014/07/16 08:12:11, dvyukov wrote: > hello, data race > after fifty days, several threads ...

10 years, 11 months ago (2014-07-17 06:24:11 UTC) #6

On 2014/07/16 08:12:11, dvyukov wrote:
> hello, data race
> after fifty days, several threads can increment tchigh by fifty days
> ...

My fault. I was going to do something about it, but I have forgotten. Used your
code, but, please, check - I had to add some braces. I also concern about
performance (time package benchmarks). windows/386 is OK:

benchmark                          old ns/op       new ns/op       delta
BenchmarkAfterFunc                 11718750        11875000        +1.33%
BenchmarkAfter                     15625000000     15625000000     +0.00%
BenchmarkStop                      664062          679687          +2.35%
BenchmarkSimultaneousAfterFunc     835937          851562          +1.87%
BenchmarkStartStop                 368750          375000          +1.69%
BenchmarkTicker                    15625000        15625000        +0.00%
BenchmarkNow                       83.6            85.9            +2.75%
BenchmarkNowUnixNano               85.2            89.1            +4.58%
BenchmarkFormat                    773             757             -2.07%
BenchmarkFormatNow                 765             757             -1.05%
BenchmarkParse                     671             679             +1.19%
BenchmarkHour                      118             118             +0.00%
BenchmarkSecond                    116             115             -0.86%
BenchmarkYear                      207             203             -1.93%
BenchmarkDay                       214             206             -3.74%

But windows/amd64 has some significant slowdowns:

benchmark                          old ns/op      new ns/op       delta
BenchmarkAfterFunc                 5802192        5856515         +0.94%
BenchmarkAfter                     1968252000     15620631000     +693.63%
BenchmarkStop                      660316         648173          -1.84%
BenchmarkSimultaneousAfterFunc     859584         830782          -3.35%
BenchmarkStartStop                 326641         329376          +0.84%
BenchmarkTicker                    1954576        15619032        +699.10%
BenchmarkNow                       60.9           57.5            -5.58%
BenchmarkNowUnixNano               63.5           59.6            -6.14%
BenchmarkFormat                    576            585             +1.56%
BenchmarkFormatNow                 585            575             -1.71%
BenchmarkParse                     560            556             -0.71%
BenchmarkHour                      78.0           75.9            -2.69%
BenchmarkSecond                    74.4           72.9            -2.02%
BenchmarkYear                      103            99.5            -3.40%
BenchmarkDay                       114            113             -0.88%

I don't understand why. My windows/amd64 is running virtual, but still. Not sure
what to do about that slowness.

Alex

Sign in to reply to this message.

dvyukov

the code looks good to me to test it, I would (1) add 1<<31-1000 to ...

10 years, 11 months ago (2014-07-17 17:30:14 UTC) #7

brainman

On 2014/07/17 17:30:14, dvyukov wrote: > ... Thank you Dmitry. I will try it. Alex

10 years, 11 months ago (2014-07-18 07:53:43 UTC) #8

brainman

On 2014/07/17 17:30:14, dvyukov wrote: > > as for the slowdown, both tests set timers ...

10 years, 11 months ago (2014-07-22 05:13:23 UTC) #9

On 2014/07/17 17:30:14, dvyukov wrote:
> 
> as for the slowdown, both tests set timers for 1 ns. I see 2 possibilities:
(1)
> currently the timer thread never sleeps in this test; but with this change it
> tries to sleep for 1 ns, but OS actually suspends it to 1ms; ...

That is the case. addtimer starts timerproc on new thread. In old version (using
GetSystemTimeAsFileTime) time would change by the time timerproc starts, so it
didn't need to wait for 1 ns. New version (using GetTickCount) time is changing
slower (every 10-16 ms), so timeproc see same time as addtime does, so it
decides to wait. Minimum wait is around 15 ms on that system, so
time.NewTicker(1) actually starts ticking after 15 ms, while old version worked
instantaneously (no wait involved).

It seems GetTickCount is too grainy - it does not change often enough. I looked
at GetTickCount64, and it is only slightly better. I tried to use
QueryPerformanceCounter and QueryPerformanceFrequency (just a rough initial
version), and these look better then everything else:

benchmark                          old ns/op      new ns/op     delta
BenchmarkAfterFunc                 5937728        9101912       +53.29%
BenchmarkAfter                     1990310800     10024799      -99.50%
BenchmarkStop                      643091         1540098       +139.48%
BenchmarkSimultaneousAfterFunc     832063         1679752       +101.88%
BenchmarkStartStop                 321887         1117230       +247.09%
BenchmarkTicker                    1953200        1952223       -0.05%
BenchmarkNow                       58.4           53.3          -8.73%
BenchmarkNowUnixNano               58.3           50.2          -13.89%
BenchmarkFormat                    489            499           +2.04%
BenchmarkFormatNow                 495            500           +1.01%
BenchmarkParse                     474            468           -1.27%
BenchmarkHour                      63.7           66.8          +4.87%
BenchmarkSecond                    61.0           63.8          +4.59%
BenchmarkYear                      83.5           86.7          +3.83%
BenchmarkDay                       94.6           97.2          +2.75%

still there are some slow downs that I don't understand, but not as large as
previous version. Maybe we should use QPC/QPF for amd64 and GetTickCount for
386. What do you think? I still have not tried your suggestion of just reading
the clock directly ...

Alex

Sign in to reply to this message.

dvyukov

On 2014/07/22 05:13:23, brainman wrote: > On 2014/07/17 17:30:14, dvyukov wrote: > > > > ...

10 years, 11 months ago (2014-07-22 07:51:46 UTC) #10

On 2014/07/22 05:13:23, brainman wrote:
> On 2014/07/17 17:30:14, dvyukov wrote:
> > 
> > as for the slowdown, both tests set timers for 1 ns. I see 2 possibilities:
> (1)
> > currently the timer thread never sleeps in this test; but with this change
it
> > tries to sleep for 1 ns, but OS actually suspends it to 1ms; ...
> 
> That is the case. addtimer starts timerproc on new thread. In old version
(using
> GetSystemTimeAsFileTime) time would change by the time timerproc starts, so it
> didn't need to wait for 1 ns. New version (using GetTickCount) time is
changing
> slower (every 10-16 ms), so timeproc see same time as addtime does, so it
> decides to wait. Minimum wait is around 15 ms on that system, so
> time.NewTicker(1) actually starts ticking after 15 ms, while old version
worked
> instantaneously (no wait involved).
> 
> It seems GetTickCount is too grainy - it does not change often enough. I
looked
> at GetTickCount64, and it is only slightly better. I tried to use
> QueryPerformanceCounter and QueryPerformanceFrequency (just a rough initial
> version), and these look better then everything else:
> 
> benchmark                          old ns/op      new ns/op     delta
> BenchmarkAfterFunc                 5937728        9101912       +53.29%
> BenchmarkAfter                     1990310800     10024799      -99.50%
> BenchmarkStop                      643091         1540098       +139.48%
> BenchmarkSimultaneousAfterFunc     832063         1679752       +101.88%
> BenchmarkStartStop                 321887         1117230       +247.09%
> BenchmarkTicker                    1953200        1952223       -0.05%
> BenchmarkNow                       58.4           53.3          -8.73%
> BenchmarkNowUnixNano               58.3           50.2          -13.89%
> BenchmarkFormat                    489            499           +2.04%
> BenchmarkFormatNow                 495            500           +1.01%
> BenchmarkParse                     474            468           -1.27%
> BenchmarkHour                      63.7           66.8          +4.87%
> BenchmarkSecond                    61.0           63.8          +4.59%
> BenchmarkYear                      83.5           86.7          +3.83%
> BenchmarkDay                       94.6           97.2          +2.75%
> 
> still there are some slow downs that I don't understand, but not as large as
> previous version. Maybe we should use QPC/QPF for amd64 and GetTickCount for
> 386. What do you think? I still have not tried your suggestion of just reading
> the clock directly ...


Humm...  I am sure that I saw 1ms increments with GetTickCount (after we bump
system timer precision). Basically I run this:

package main

import (
	"fmt"
	"time"
	"unsafe"
)

func main() {
	last := time.Now()
	last1 := *((*time.Duration)(unsafe.Pointer(uintptr(0x7FFE0008)))) * 100
	for i := 0; i < 10000; i++ {
		now := time.Now()
		now1 := *((*time.Duration)(unsafe.Pointer(uintptr(0x7FFE0008)))) * 100
		fmt.Printf("%v %v %v %v %v\n", now.Sub(last)-(now1-last1), now.Sub(last),
now1-last1, now, now1)
		last = now
		last1 = now1
	}
}

and saw that time.Now() and the raw reads produce exactly the same values.

Maybe you just have an old OS, where GetTickCount is buggy?

Hummm. DO you have any ideas as to why QPC is slower on some benchmarks?
BenchmarkNow is runs faster, and QPC has enough prevision, so... why some
benchmarks become slower?...
In either case I don't think that these benchmarks are very representative
(especially tons of successive 1ns sleeps).

Sign in to reply to this message.

brainman

On 2014/07/22 07:51:46, dvyukov wrote: > >... I am sure that I saw 1ms increments ...

10 years, 11 months ago (2014-07-22 09:29:26 UTC) #11

dvyukov

On 2014/07/22 09:29:26, brainman wrote: > On 2014/07/22 07:51:46, dvyukov wrote: > > > >... ...

10 years, 11 months ago (2014-07-22 09:39:37 UTC) #12

On 2014/07/22 09:29:26, brainman wrote:
> On 2014/07/22 07:51:46, dvyukov wrote:
> > 
> >...  I am sure that I saw 1ms increments with GetTickCount (after we bump
> > system timer precision). …
> 
> It, probably, is 1ms. But it is much longer than what we have now, and whole
> logic changes because of it. Right now we just return without wait, like I
> explained above, but after using GetTickCount we end up waiting on semaphore.
> timerproc thread starts quickly enough (less than 1ms, I guess), so it sees
the
> same runtime.nanotime value as timer that is been added. So it all goes into
> wait for 1ns, but end up waiting around 15ms (we wait on windows event - I
don’t
> think it waits for less than 15ms). QPC and reading 0x7FFE0008 is similar to
the
> current approach, because they both change after about 100ns (much faster than
> GetTickCount).

Now I am puzzled about difference in behavior between GetTickCount and
0x7FFE0008. GetTickCount just reads that word...
Maybe there are two variables: one is at 0x7FFE0008 and is updated every 100ns,
and another is at 0x7FFE0004 and is updated every 1ms...


> I am bit suspicious of reading 0x7FFE0008 approach. Will it work everywhere?
> What about multi CPUs? It is not documented.

Well, yes, these are all valid questions.
All I can say is that I used 0x7FFE0008 like 10 years ago on XP.



> QPC, on the other hand, is very documented. And Microsoft recommends it above
> all. But it does not work on my XP
> https://code.google.com/p/go/issues/detail?id=6007#c36. Perhaps we should use
> QPC on amd64, and GetTickCount on 386?
> 
> > Hummm. DO you have any ideas as to why QPC is slower on some benchmarks?
> 
> No, I don’t. Should I investigate?
> 
> > BenchmarkNow is runs faster, and QPC has enough prevision, so... why some
> > benchmarks become slower?...
> 
> I don’t understand your question. Try again.


I mean that (1) QPC is faster than what we have now (BenchmarkNow measures
that), (2) QPC has high precision (at least 100ns, or maybe higher), so the huge
slowdown due to rescheduling that we observed with GetTickCount must not be
happening. Now the question is: What else is different between the current impl
and QPC what makes some of the benchmarks slower?
It's not very important question to answer, but it's just puzzling.


> > In either case I don't think that these benchmarks are very representative
> > (especially tons of successive 1ns sleeps).
> 
> I agree, for the most of these. But I am concerned that time.NewThread(1)
> becomes slow and expensive. Given that it is easily handled on unix, it is,
> probably, quite common in peoples code.
> 
> Alex

Sign in to reply to this message.

dvyukov

QPC with fallback to GetTickCount on XP looks fine to me. I hope there are ...

10 years, 11 months ago (2014-07-22 09:42:48 UTC) #13

patrick2

On 2014/07/22 09:39:37, dvyukov wrote: > I mean that (1) QPC is faster than what ...

10 years, 11 months ago (2014-07-22 09:51:59 UTC) #14

brainman

On 2014/07/22 09:39:37, dvyukov wrote: > > Now I am puzzled about difference in behavior ...

10 years, 11 months ago (2014-07-22 10:21:29 UTC) #15

aram2

On Tue, Jul 22, 2014 at 12:51 PM, <patrick@mezard.eu> wrote: > I have encountered *massive* ...

10 years, 11 months ago (2014-07-22 10:22:19 UTC) #16

brainman

On 2014/07/22 09:42:48, dvyukov wrote: > QPC with fallback to GetTickCount on XP looks fine ...

10 years, 11 months ago (2014-07-22 10:22:48 UTC) #17

brainman

On 2014/07/22 09:51:59, patrick2 wrote: > > Do the Windows hosts run on real hardware ...

10 years, 11 months ago (2014-07-22 10:23:31 UTC) #18

dvyukov

On Tue, Jul 22, 2014 at 2:21 PM, Aram Hăvărneanu <aram.h@mgk.ro> wrote: > On Tue, ...

10 years, 11 months ago (2014-07-22 10:29:01 UTC) #19

brainman

On 2014/07/22 10:29:01, dvyukov wrote: > > Alex, this can be a reason to not ...

10 years, 11 months ago (2014-07-22 11:04:59 UTC) #20

brainman

On 2014/07/22 07:51:46, dvyukov wrote: > > Humm... I am sure that I saw 1ms ...

10 years, 11 months ago (2014-07-23 02:27:35 UTC) #21

dvyukov

On 2014/07/23 02:27:35, brainman wrote: > On 2014/07/22 07:51:46, dvyukov wrote: > > > > ...

10 years, 11 months ago (2014-07-23 07:52:56 UTC) #22

On 2014/07/23 02:27:35, brainman wrote:
> On 2014/07/22 07:51:46, dvyukov wrote:
> > 
> > Humm...  I am sure that I saw 1ms increments with GetTickCount (after we
bump
> > system timer precision). Basically I run this:
> > 
> > package main
> > 
> > import (
> > 	"fmt"
> > 	"time"
> > 	"unsafe"
> > )
> > 
> > func main() {
> > 	last := time.Now()
> > 	last1 := *((*time.Duration)(unsafe.Pointer(uintptr(0x7FFE0008)))) * 100
> > 	for i := 0; i < 10000; i++ {
> > 		now := time.Now()
> > 		now1 := *((*time.Duration)(unsafe.Pointer(uintptr(0x7FFE0008)))) * 100
> > 		fmt.Printf("%v %v %v %v %v\n", now.Sub(last)-(now1-last1), now.Sub(last),
> > now1-last1, now, now1)
> > 		last = now
> > 		last1 = now1
> > 	}
> > }
> > 
> > and saw that time.Now() and the raw reads produce exactly the same values.
> > 
> 
> Just for the record. I measured time difference between values (when the value
> changes) returned from different Windows API in a loop like this. Both
> GetTickCount and GetTickCount64 change every 15-16 millisecond, while
> time.Now().UnixNano() (uses GetSystemTimeAsFileTime under covers) and "reading
> 0x7FFE0008" changes every 1 millisecond. This is all on windows/amd64.
> 
> I am not convinced that QueryPerformanceCounter is the answer either - it is
> probably too expensive / might take too long to run (comparing to all the
> above).
> 
> I am not sure what to do here.

Yeah...

I would go with 0x7FFE0008. It's known on the internet, so it's unlikely that it
will change (at least not in an existing OS version):
http://www.dcl.hpi.uni-potsdam.de/research/WRK/2007/08/getting-os-information...
http://uninformed.org/index.cgi?v=2&a=2&p=15
http://www.informit.com/articles/article.aspx?p=167857&seqNum=8
http://forum.sysinternals.com/0x7ffe0000-what-is-in-it_topic10012.html

This link also suggests that CRT depends on it:
https://github.com/sherpya/nloader/blob/master/README.md
I don't remember whether it's possible to link CRT statically in Visual Studio.
But if it is possible, then it means that existing binary applications compiled
with Visual Studio will also break, which is not going to happen.

It's definitely the fastest option. And it can also be the simplest option.

Sign in to reply to this message.

brainman

On 2014/07/23 07:52:56, dvyukov wrote: > > I would go with 0x7FFE0008. ... That works ...

10 years, 11 months ago (2014-07-24 07:43:44 UTC) #23

dvyukov

On 2014/07/24 07:43:44, brainman wrote: > On 2014/07/23 07:52:56, dvyukov wrote: > > > > ...

10 years, 11 months ago (2014-07-24 08:15:04 UTC) #24

On 2014/07/24 07:43:44, brainman wrote:
> On 2014/07/23 07:52:56, dvyukov wrote:
> > 
> > I would go with 0x7FFE0008. ...
> 
> That works very well on my windows 7 pc (windows/amd64) - thanks for the
links.
> But now I have problem with windows xp (windows/386). time.TestAfter (and
> time.Sleep) ends too early:
> 
> sleep_test.go:154: After(100ms) slept for only 93750000 ns
> 
> It appears, windows xp (unlike windows 7) have different time precisions:
> counter at 0x7FFE0008 changes every 1ms, while GetSystemTimeAsFileTime changes
> every 15-16ms. time.TestAfter uses both counters: it does time.Sleep (which
uses
> counter at 0x7FFE0008) and than measures that sleep with
> GetSystemTimeAsFileTime. Obviously that will not work - especially for small
> durations. For example sleeping for 43ms, then GetSystemTimeAsFileTime says
> we've slept for 31.25ms. It didn't increment by the next 15ms yet.
> 
> I don't see how we could resolve that unless we relax some tests. Or perhaps I
> could use GetTickCount on windows/386. GetTickCount changes every 15ms on
> windows, so at least the 2 will be more in sync. Still ...
> 
> What do you think?

Oh, boy.

GetTickCount ticks every 15-16 ms. Now they question is: does
GetSystemTimeAsFileTime tick as 15-16 ms as well, or it ticks at 15.25ms? If
it's the latter, can't we get the same situation when GetSystemTimeAsFileTime is
slightly behind (e.g. 0.75 ms behind)? Or it's GetTickCount that is always
behind (if they round it down)?

Since we are already using _KUSER_SHARED_DATA, maybe it will be simple to use
_KUSER_SHARED_DATA.TickCountLow (at 0x7FFE0000) on XP. I think that it ticks at
the same 15-16 ms on XP.

Another idea: can't we just round off the 0x7FFE0008 value on XP, so that it as
if ticks at the same 15-16ms? But I suspect that it can be tricky to round it to
exactly match GetSystemTimeAsFileTime...

Another observation: the test is actually incorrect, because it expects that
system time does not change. When you sleep for 1 second, system time can be any
random value after the sleep.

I don't have a good answer for you. But I would prefer good and fast code for
newer OSes, and a workaround for the XP bug.

Sign in to reply to this message.

dvyukov

On 2014/07/24 08:15:04, dvyukov wrote: > On 2014/07/24 07:43:44, brainman wrote: > > On 2014/07/23 ...

10 years, 11 months ago (2014-07-24 08:18:54 UTC) #25

On 2014/07/24 08:15:04, dvyukov wrote:
> On 2014/07/24 07:43:44, brainman wrote:
> > On 2014/07/23 07:52:56, dvyukov wrote:
> > > 
> > > I would go with 0x7FFE0008. ...
> > 
> > That works very well on my windows 7 pc (windows/amd64) - thanks for the
> links.
> > But now I have problem with windows xp (windows/386). time.TestAfter (and
> > time.Sleep) ends too early:
> > 
> > sleep_test.go:154: After(100ms) slept for only 93750000 ns
> > 
> > It appears, windows xp (unlike windows 7) have different time precisions:
> > counter at 0x7FFE0008 changes every 1ms, while GetSystemTimeAsFileTime
changes
> > every 15-16ms. time.TestAfter uses both counters: it does time.Sleep (which
> uses
> > counter at 0x7FFE0008) and than measures that sleep with
> > GetSystemTimeAsFileTime. Obviously that will not work - especially for small
> > durations. For example sleeping for 43ms, then GetSystemTimeAsFileTime says
> > we've slept for 31.25ms. It didn't increment by the next 15ms yet.
> > 
> > I don't see how we could resolve that unless we relax some tests. Or perhaps
I
> > could use GetTickCount on windows/386. GetTickCount changes every 15ms on
> > windows, so at least the 2 will be more in sync. Still ...
> > 
> > What do you think?
> 
> 
> Oh, boy.
> 
> GetTickCount ticks every 15-16 ms. Now they question is: does
> GetSystemTimeAsFileTime tick as 15-16 ms as well, or it ticks at 15.25ms? If
> it's the latter, can't we get the same situation when GetSystemTimeAsFileTime
is
> slightly behind (e.g. 0.75 ms behind)? Or it's GetTickCount that is always
> behind (if they round it down)?
> 
> Since we are already using _KUSER_SHARED_DATA, maybe it will be simple to use
> _KUSER_SHARED_DATA.TickCountLow (at 0x7FFE0000) on XP. I think that it ticks
at
> the same 15-16 ms on XP.
> 
> Another idea: can't we just round off the 0x7FFE0008 value on XP, so that it
as
> if ticks at the same 15-16ms? But I suspect that it can be tricky to round it
to
> exactly match GetSystemTimeAsFileTime...
> 
> Another observation: the test is actually incorrect, because it expects that
> system time does not change. When you sleep for 1 second, system time can be
any
> random value after the sleep.
> 
> I don't have a good answer for you. But I would prefer good and fast code for
> newer OSes, and a workaround for the XP bug.


Well, another possibility is to manually bump GetSystemTimeAsFileTime precision
on XP along the lines of:
When we query GetSystemTimeAsFileTime also query 0x7FFE0008 and remember these
values. When we next time query GetSystemTimeAsFileTime, if it returns the same
value as before, then query 0x7FFE0008, calculate the diff and add to
GetSystemTimeAsFileTime value. This way GetSystemTimeAsFileTime will tick at 1
ms at XP.
I don't know how complex a reliable monotonic solution will be.

Sign in to reply to this message.

brainman

I tested all possible candidate APIs again. This program: package main import ( "fmt" "runtime" ...

10 years, 11 months ago (2014-07-25 04:55:35 UTC) #26

I tested all possible candidate APIs again. This program:

package main

import (
	"fmt"
	"runtime"
	"syscall"
	"unsafe"
)

var (
	modkernel32        = syscall.NewLazyDLL("kernel32.dll")
	procGetTickCount64 = modkernel32.NewProc("GetTickCount64")
	procGetTickCount   = modkernel32.NewProc("GetTickCount")
)

func GetTickCount64() uint64 {
	r, _, _ := syscall.Syscall(procGetTickCount64.Addr(), 0, 0, 0, 0)
	return uint64(r)
}

func GetTickCount() uint32 {
	r, _, _ := syscall.Syscall(procGetTickCount.Addr(), 0, 0, 0, 0)
	return uint32(r)
}

func getSystemTimeAsFileTime() uint64 {
	var ft syscall.Filetime
	syscall.GetSystemTimeAsFileTime(&ft)
	return uint64(ft.Nanoseconds())
}

func getTickCount() uint64 {
	return uint64(GetTickCount()) * 1000000
}

func getTickCount64() uint64 {
	return GetTickCount64() * 1000000
}

type KSYSTEM_TIME struct {
	LowPart   uint32
	High1Time int32
	High2Time int32
}

var (
	// KUSER_SHARED_DATA
	interruptTimeAddr = (*KSYSTEM_TIME)(unsafe.Pointer(uintptr(0x7FFE0008)))
	systemTimeAddr    = (*KSYSTEM_TIME)(unsafe.Pointer(uintptr(0x7FFE0014)))
)

func readUSD(addr *KSYSTEM_TIME) uint64 {
	var t KSYSTEM_TIME
	for i := 0; ; i++ {
		t.High1Time = addr.High1Time
		t.LowPart = addr.LowPart
		t.High2Time = addr.High2Time
		if t.High1Time == t.High2Time {
			break
		}
		if i > 100 {
			panic("should have advanced by now")
		}
	}
	return *((*uint64)(unsafe.Pointer(&t))) * 100
}

func readInterruptTime() uint64 {
	return readUSD(interruptTimeAddr)
}

func readSystemTime() uint64 {
	return readUSD(systemTimeAddr)
}

func getprec(getticks func() uint64) []uint64 {
	r := make([]uint64, 0)
	c := getticks()
	for i := 0; i < 5; {
		c2 := getticks()
		if c != c2 {
			r = append(r, c2-c)
			i++
		}
	}
	return r
}

func main() {
	fmt.Printf("getSystemTimeAsFileTime: %v\n", getprec(getSystemTimeAsFileTime))
	fmt.Printf("getTickCount           : %v\n", getprec(getTickCount))
	if runtime.GOARCH == "amd64" {
		fmt.Printf("getTickCount64         : %v\n", getprec(getTickCount64))
	}
	fmt.Printf("readInterruptTime      : %v\n", getprec(readInterruptTime))
	fmt.Printf("readSystemTime         : %v\n", getprec(readSystemTime))
}

prints

C:\>u:\test
getSystemTimeAsFileTime: [15625000 15625000 15625000 15625000 15625000]
getTickCount           : [15000000 15000000 15000000 15000000 15000000]
readInterruptTime      : [976600 976600 976600 976600 976600]
readSystemTime         : [15625000 15625000 15625000 15625000 15625000]

on windows xp, and

C:\>u:\test
getSystemTimeAsFileTime: [977000 977000 977000 977000 977000]
getTickCount           : [16000000 16000000 16000000 16000000 16000000]
getTickCount64         : [15000000 15000000 15000000 15000000 15000000]
readInterruptTime      : [976500 976500 976500 976500 976500]
readSystemTime         : [977000 977000 977000 977000 977000]

on windows 7.

I still haven't made my mind about what to use here, so any suggestions are
welcome.

Alex

Sign in to reply to this message.

dvyukov

Do you know if timeBeginPeriod requires admin rights? If yes, then the numbers can be ...

10 years, 11 months ago (2014-07-25 07:02:40 UTC) #27

Do you know if timeBeginPeriod requires admin rights? If yes, then the
numbers can be different for program running w/o admin rights...


On Fri, Jul 25, 2014 at 8:55 AM,  <alex.brainman@gmail.com> wrote:
> I tested all possible candidate APIs again. This program:
>
>
> package main
>
> import (
>         "fmt"
>         "runtime"
>         "syscall"
>         "unsafe"
> )
>
> var (
>         modkernel32        = syscall.NewLazyDLL("kernel32.dll")
>         procGetTickCount64 = modkernel32.NewProc("GetTickCount64")
>         procGetTickCount   = modkernel32.NewProc("GetTickCount")
> )
>
> func GetTickCount64() uint64 {
>         r, _, _ := syscall.Syscall(procGetTickCount64.Addr(), 0, 0, 0, 0)
>         return uint64(r)
> }
>
> func GetTickCount() uint32 {
>         r, _, _ := syscall.Syscall(procGetTickCount.Addr(), 0, 0, 0, 0)
>         return uint32(r)
> }
>
> func getSystemTimeAsFileTime() uint64 {
>         var ft syscall.Filetime
>         syscall.GetSystemTimeAsFileTime(&ft)
>         return uint64(ft.Nanoseconds())
> }
>
> func getTickCount() uint64 {
>         return uint64(GetTickCount()) * 1000000
> }
>
> func getTickCount64() uint64 {
>         return GetTickCount64() * 1000000
> }
>
> type KSYSTEM_TIME struct {
>         LowPart   uint32
>         High1Time int32
>         High2Time int32
> }
>
> var (
>         // KUSER_SHARED_DATA
>         interruptTimeAddr =
> (*KSYSTEM_TIME)(unsafe.Pointer(uintptr(0x7FFE0008)))
>         systemTimeAddr    =
> (*KSYSTEM_TIME)(unsafe.Pointer(uintptr(0x7FFE0014)))
> )
>
> func readUSD(addr *KSYSTEM_TIME) uint64 {
>         var t KSYSTEM_TIME
>         for i := 0; ; i++ {
>                 t.High1Time = addr.High1Time
>                 t.LowPart = addr.LowPart
>                 t.High2Time = addr.High2Time
>                 if t.High1Time == t.High2Time {
>                         break
>                 }
>                 if i > 100 {
>                         panic("should have advanced by now")
>                 }
>         }
>         return *((*uint64)(unsafe.Pointer(&t))) * 100
> }
>
> func readInterruptTime() uint64 {
>         return readUSD(interruptTimeAddr)
> }
>
> func readSystemTime() uint64 {
>         return readUSD(systemTimeAddr)
> }
>
> func getprec(getticks func() uint64) []uint64 {
>         r := make([]uint64, 0)
>         c := getticks()
>         for i := 0; i < 5; {
>                 c2 := getticks()
>                 if c != c2 {
>                         r = append(r, c2-c)
>                         i++
>                 }
>         }
>         return r
> }
>
> func main() {
>         fmt.Printf("getSystemTimeAsFileTime: %v\n",
> getprec(getSystemTimeAsFileTime))
>         fmt.Printf("getTickCount           : %v\n", getprec(getTickCount))
>         if runtime.GOARCH == "amd64" {
>                 fmt.Printf("getTickCount64         : %v\n",
> getprec(getTickCount64))
>         }
>         fmt.Printf("readInterruptTime      : %v\n",
> getprec(readInterruptTime))
>         fmt.Printf("readSystemTime         : %v\n", getprec(readSystemTime))
> }
>
> prints
>
> C:\>u:\test
> getSystemTimeAsFileTime: [15625000 15625000 15625000 15625000 15625000]
> getTickCount           : [15000000 15000000 15000000 15000000 15000000]
> readInterruptTime      : [976600 976600 976600 976600 976600]
> readSystemTime         : [15625000 15625000 15625000 15625000 15625000]
>
> on windows xp, and
>
> C:\>u:\test
> getSystemTimeAsFileTime: [977000 977000 977000 977000 977000]
> getTickCount           : [16000000 16000000 16000000 16000000 16000000]
> getTickCount64         : [15000000 15000000 15000000 15000000 15000000]
> readInterruptTime      : [976500 976500 976500 976500 976500]
> readSystemTime         : [977000 977000 977000 977000 977000]
>
> on windows 7.
>
> I still haven't made my mind about what to use here, so any suggestions
> are welcome.
>
> Alex
>
>
> https://codereview.appspot.com/108700045/
>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-codereviews" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-codereviews+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Sign in to reply to this message.

brainman

On 2014/07/25 07:02:40, dvyukov wrote: > Do you know if timeBeginPeriod requires admin rights? ...

10 years, 11 months ago (2014-07-25 07:13:03 UTC) #28

dvyukov

This is getting insane. I think just need to relaxed the sleep test on windows ...

10 years, 11 months ago (2014-07-25 07:42:39 UTC) #29

brainman

On 2014/07/25 07:42:39, dvyukov wrote: > > Btw, that SystemTime at 0x7FFE0014 is the same ...

10 years, 11 months ago (2014-07-25 07:45:33 UTC) #30

brainman

Hello golang-codereviews@googlegroups.com, dvyukov@google.com, patrick@mezard.eu, aram.h@mgk.ro (cc: golang-codereviews@googlegroups.com), Please take another look.

10 years, 11 months ago (2014-07-31 01:01:41 UTC) #31

brainman

Dmitry, Finally I am happy with result (as good as we can make it). It ...

10 years, 11 months ago (2014-07-31 01:02:07 UTC) #32

Dmitry,

Finally I am happy with result (as good as we can make it). It is simple and
fast. And benchmarks

windows/386

C:\go\root\src\pkg\time>benchcmp.exe old new
benchmark                          old ns/op       new ns/op      delta
BenchmarkAfterFunc                 12187656        12031404       -1.28%
BenchmarkAfter                     15625200000     1953150000     -87.50%
BenchmarkStop                      968762          937512         -3.23%
BenchmarkSimultaneousAfterFunc     968762          945324         -2.42%
BenchmarkStartStop                 478131          443755         -7.19%
BenchmarkTicker                    15625200        1953150        -87.50%
BenchmarkNow                       81.3            48.4           -40.47%
BenchmarkNowUnixNano               83.6            52.6           -37.08%
BenchmarkFormat                    820             835            +1.83%
BenchmarkFormatNow                 828             835            +0.85%
BenchmarkParse                     664             664            +0.00%
BenchmarkHour                      118             118            +0.00%
BenchmarkSecond                    115             115            +0.00%
BenchmarkYear                      200             210            +5.00%
BenchmarkDay                       204             214            +4.90%

and windows/amd64

C:\go\root\src\pkg\time>benchcmp.exe old new
benchmark                          old ns/op      new ns/op      delta
BenchmarkAfterFunc                 6087230        4948613        -18.71%
BenchmarkAfter                     1966268200     1966097100     -0.01%
BenchmarkStop                      745893         701270         -5.98%
BenchmarkSimultaneousAfterFunc     728807         719339         -1.30%
BenchmarkStartStop                 295818         268006         -9.40%
BenchmarkTicker                    1952600        1953400        +0.04%
BenchmarkNow                       50.3           33.5           -33.40%
BenchmarkNowUnixNano               50.9           34.2           -32.81%
BenchmarkFormat                    540            531            -1.67%
BenchmarkFormatNow                 533            535            +0.38%
BenchmarkParse                     474            488            +2.95%
BenchmarkHour                      64.5           66.3           +2.79%
BenchmarkSecond                    62.1           63.8           +2.74%
BenchmarkYear                      85.4           88.3           +3.40%
BenchmarkDay                       95.3           99.0           +3.88%

aren't bad. I have small concern. Reading of KSYSTEM_TIME fields in
runtime.systime suppose to happens in order. I hope that compiler or human not
reorder these. What do you think?

Alex

Sign in to reply to this message.

minux

this is much nicer! Thanks. https://codereview.appspot.com/108700045/diff/80001/src/pkg/runtime/os_windows.c File src/pkg/runtime/os_windows.c (right): https://codereview.appspot.com/108700045/diff/80001/src/pkg/runtime/os_windows.c#newcode274 src/pkg/runtime/os_windows.c:274: INTERRUPT_TIME = (KSYSTEM_TIME*)0x7ffe0008, let's ...

10 years, 11 months ago (2014-07-31 02:15:34 UTC) #33

brainman

https://codereview.appspot.com/108700045/diff/80001/src/pkg/runtime/os_windows.c File src/pkg/runtime/os_windows.c (right): https://codereview.appspot.com/108700045/diff/80001/src/pkg/runtime/os_windows.c#newcode274 src/pkg/runtime/os_windows.c:274: INTERRUPT_TIME = (KSYSTEM_TIME*)0x7ffe0008, On 2014/07/31 02:15:33, minux wrote: > ...

10 years, 11 months ago (2014-07-31 05:04:25 UTC) #34

brainman

Hello golang-codereviews@googlegroups.com, dvyukov@google.com, patrick@mezard.eu, aram.h@mgk.ro, minux@golang.org (cc: golang-codereviews@googlegroups.com), Please take another look.

10 years, 11 months ago (2014-07-31 05:04:40 UTC) #35

dvyukov

I like the runtime code too. LGTM if you agree with the fix https://codereview.appspot.com/108700045/diff/100001/src/pkg/runtime/os_windows.c File ...

10 years, 11 months ago (2014-07-31 08:17:02 UTC) #37

brainman

*** Submitted as https://code.google.com/p/go/source/detail?r=b673250d7b72 *** runtime: implement monotonic clocks on windows Update issue 6007. LGTM=minux, ...

10 years, 11 months ago (2014-08-01 01:18:21 UTC) #38

*** Submitted as https://code.google.com/p/go/source/detail?r=b673250d7b72 ***

runtime: implement monotonic clocks on windows

Update issue 6007.

LGTM=minux, dvyukov
R=golang-codereviews, dvyukov, patrick, aram.h, minux
CC=golang-codereviews
https://codereview.appspot.com/108700045

https://codereview.appspot.com/108700045/diff/100001/src/pkg/runtime/os_windo...
File src/pkg/runtime/os_windows.c (right):

https://codereview.appspot.com/108700045/diff/100001/src/pkg/runtime/os_windo...
src/pkg/runtime/os_windows.c:291: runtime·throw("interrupt/system time is
changing too fast");
On 2014/07/31 08:17:02, dvyukov wrote:
> this scares me
> 
> note that this algorithm is not lock-free for readers
> i.e. merely retrying the reader loop won't fix the inconsistency
> consider that the timer interrupt handler is preempted by an NMI when it has
> updated High2Time but not High1Time,  this routine can easily do 1000
iterations
> quickly and on all iterations observe inconsistent values and terminate the
> program
> 
> I would trust more the interrupt handler to do the correct thing. It does not
> matter if we burn more CPU and introduce additional delay before crashing the
> program on a buggy system.
> 
> I would do something along the lines of:
> 
> for(i = 1; i < 10000; i++) {
>   if((i%100) == 0)
>     runtime.osyield();
>   ...

Done. But I have moved if statement to the end of for loop body.

I don't want runtime.osyield to run every time we call runtime·systime. I
suspect more often then not we will find t.High1Time == t.High2Time on first
try, so why waste CPU in most common case.

https://codereview.appspot.com/108700045/diff/100001/src/pkg/time/sleep_test.go
File src/pkg/time/sleep_test.go (right):

https://codereview.appspot.com/108700045/diff/100001/src/pkg/time/sleep_test....
src/pkg/time/sleep_test.go:18: const windowsInaccuracy = 17 * Millisecond
On 2014/07/31 08:17:02, dvyukov wrote:
> please add a short comment with pointers here
> it needs to say that we are using different functions for time.Now and
sleeping,
> that can tick at different frequencies
> and it needs to spell the configuration where it is reproducible (e.g. 386 XP)
> so that it's not removed in future because nobody can reproduce the bahavior
> anymore

Done.

Sign in to reply to this message.

dvyukov

https://codereview.appspot.com/108700045/diff/100001/src/pkg/runtime/os_windows.c File src/pkg/runtime/os_windows.c (right): https://codereview.appspot.com/108700045/diff/100001/src/pkg/runtime/os_windows.c#newcode291 src/pkg/runtime/os_windows.c:291: runtime·throw("interrupt/system time is changing too fast"); On 2014/08/01 01:18:20, ...

10 years, 11 months ago (2014-08-05 12:59:11 UTC) #39

Message was sent while issue was closed.

https://codereview.appspot.com/108700045/diff/100001/src/pkg/runtime/os_windo...
File src/pkg/runtime/os_windows.c (right):

https://codereview.appspot.com/108700045/diff/100001/src/pkg/runtime/os_windo...
src/pkg/runtime/os_windows.c:291: runtime·throw("interrupt/system time is
changing too fast");
On 2014/08/01 01:18:20, brainman wrote:
> On 2014/07/31 08:17:02, dvyukov wrote:
> > this scares me
> > 
> > note that this algorithm is not lock-free for readers
> > i.e. merely retrying the reader loop won't fix the inconsistency
> > consider that the timer interrupt handler is preempted by an NMI when it has
> > updated High2Time but not High1Time,  this routine can easily do 1000
> iterations
> > quickly and on all iterations observe inconsistent values and terminate the
> > program
> > 
> > I would trust more the interrupt handler to do the correct thing. It does
not
> > matter if we burn more CPU and introduce additional delay before crashing
the
> > program on a buggy system.
> > 
> > I would do something along the lines of:
> > 
> > for(i = 1; i < 10000; i++) {
> >   if((i%100) == 0)
> >     runtime.osyield();
> >   ...
> 
> Done. But I have moved if statement to the end of for loop body.
> 
> I don't want runtime.osyield to run every time we call runtime·systime. I
> suspect more often then not we will find t.High1Time == t.High2Time on first
> try, so why waste CPU in most common case.


Note that I started loop counter with 1:

for(i = 1; i < 10000; i++) {
  if((i%100) == 0)
    runtime.osyield();
  ...


first osyield would have been called only on 99-th iteration.
With your code osyield is executed after the first failed attempt. I don't know
how important it is, but your code wastes more CPU in common case.

Sign in to reply to this message.

brainman

10 years, 11 months ago (2014-08-06 07:16:54 UTC) #40

Message was sent while issue was closed.

On 2014/08/05 12:59:11, dvyukov wrote:
> 
> first osyield would have been called only on 99-th iteration.
> With your code osyield is executed after the first failed attempt. ...

I knew I was missing something. https://codereview.appspot.com/117670043

Alex

Sign in to reply to this message.

Issue 108700045: code review 108700045: runtime: implement monotonic clocks on windows (Closed)

Description

Patch Set 1 #

Patch Set 2 : diff -r 395bf97d72a1 https://go.googlecode.com/hg/ #

Patch Set 3 : diff -r 395bf97d72a1 https://go.googlecode.com/hg/ #

Patch Set 4 : diff -r 8e05add24c7e https://go.googlecode.com/hg/ #

Patch Set 5 : diff -r b3d45fe62d73 https://go.googlecode.com/hg/ #

Patch Set 6 : diff -r 455d073a9dd7 https://go.googlecode.com/hg/ #

Patch Set 7 : diff -r d81fb87ee976 https://go.googlecode.com/hg/ #

Messages