|
|
Descriptionruntime: implement monotonic clocks on windows
Update issue 6007.
Patch Set 1 #Patch Set 2 : diff -r 395bf97d72a1 https://go.googlecode.com/hg/ #Patch Set 3 : diff -r 395bf97d72a1 https://go.googlecode.com/hg/ #
Total comments: 3
Patch Set 4 : diff -r 8e05add24c7e https://go.googlecode.com/hg/ #Patch Set 5 : diff -r b3d45fe62d73 https://go.googlecode.com/hg/ #
Total comments: 6
Patch Set 6 : diff -r 455d073a9dd7 https://go.googlecode.com/hg/ #
Total comments: 5
Patch Set 7 : diff -r d81fb87ee976 https://go.googlecode.com/hg/ #
MessagesTotal messages: 40
Hello golang-codereviews@googlegroups.com, I'd like you to review this change to https://go.googlecode.com/hg/
Sign in to reply to this message.
Sorry, Dmitriy, but GetTickCount works for me. I will use it for now. But I will try your approach too, later. Alex
Sign in to reply to this message.
https://codereview.appspot.com/108700045/diff/40001/src/pkg/runtime/os_windows.c File src/pkg/runtime/os_windows.c (right): https://codereview.appspot.com/108700045/diff/40001/src/pkg/runtime/os_window... src/pkg/runtime/os_windows.c:274: static uint64 tclow, tchigh; you declare it as uint64, but then use only low 32-bits and always shift I would increment it by (1<<32) and then just directly add https://codereview.appspot.com/108700045/diff/40001/src/pkg/runtime/os_window... src/pkg/runtime/os_windows.c:279: tchigh++; hello, data race after fifty days, several threads can increment tchigh by fifty days https://codereview.appspot.com/108700045/diff/40001/src/pkg/runtime/os_window... src/pkg/runtime/os_windows.c:280: tclow = c; you. absolutely. must. not. write. to. a shared. variable. on fast. path. this kills performance you need something along the lines of: static Lock lock; static uint64 high; static uint32 phase; uint32 t; t = GetTickCount(); if((t>>31) != phase) { runtime.lock(&lock); if((t>>31) != phase) { if(phase == 1) high += 1ull<<32; phase = !phase; } runtime.unlock(&lock); } return high + t; But even that is incorrect, because a thread can get t=1<<32-1, pass the phase check; then another thread gets t=0 and increments high; then the first thread reads already incremented high and returns value that is off by 50 days. static Lock lock; static uint32 high; // high bit is phase, 32 bits so that atomic load is inexpensive uint32 t, h; t = GetTickCount(); h = runtime.atomicload(&high); if((t^h)>>31) != 0) { runtime.lock(&lock); h = runtime.atomicload(&high); if((t^h)>>31) != 0) { if((h>>31) == 1) h += 1; h ^= 1<<31; // switch phase runtime.atomicstore(&high, h); } runtime.unlock(&lock); } return ((uint64)(h&~(1<<31)))<<32 + t; This is incorrect as well. Because a slow thread can do double phase switch. I think the following should do: static Lock lock; static uint32 high; // high bit is phase, 32 bits so that atomic load is inexpensive uint32 t, h; t = GetTickCount(); h = runtime.atomicload(&high); if((t^h)>>31) != 0) { // phase check runtime.lock(&lock); t = GetTickCount(); // reload so that a slow thread don't do double phase switch h = runtime.atomicload(&high); if((t^h)>>31) != 0) { if((h>>31) == 1) h += 1; h ^= 1<<31; // switch phase runtime.atomicstore(&high, h); } runtime.unlock(&lock); } return ((uint64)(h&~(1<<31)))<<32 + t;
Sign in to reply to this message.
On 2014/07/16 05:51:37, brainman wrote: > Sorry, Dmitriy, but GetTickCount works for me. I will use it for now. But I will > try your approach too, later. > > Alex Well, GetTickCount does exactly what I suggested. You just have some additional overheads for stdcall and GetTickCount code. But obviously it's more reliable, so I am fine with it.
Sign in to reply to this message.
Hello golang-codereviews@googlegroups.com, dvyukov@google.com (cc: golang-codereviews@googlegroups.com), Please take another look.
Sign in to reply to this message.
On 2014/07/16 08:12:11, dvyukov wrote: > hello, data race > after fifty days, several threads can increment tchigh by fifty days > ... My fault. I was going to do something about it, but I have forgotten. Used your code, but, please, check - I had to add some braces. I also concern about performance (time package benchmarks). windows/386 is OK: benchmark old ns/op new ns/op delta BenchmarkAfterFunc 11718750 11875000 +1.33% BenchmarkAfter 15625000000 15625000000 +0.00% BenchmarkStop 664062 679687 +2.35% BenchmarkSimultaneousAfterFunc 835937 851562 +1.87% BenchmarkStartStop 368750 375000 +1.69% BenchmarkTicker 15625000 15625000 +0.00% BenchmarkNow 83.6 85.9 +2.75% BenchmarkNowUnixNano 85.2 89.1 +4.58% BenchmarkFormat 773 757 -2.07% BenchmarkFormatNow 765 757 -1.05% BenchmarkParse 671 679 +1.19% BenchmarkHour 118 118 +0.00% BenchmarkSecond 116 115 -0.86% BenchmarkYear 207 203 -1.93% BenchmarkDay 214 206 -3.74% But windows/amd64 has some significant slowdowns: benchmark old ns/op new ns/op delta BenchmarkAfterFunc 5802192 5856515 +0.94% BenchmarkAfter 1968252000 15620631000 +693.63% BenchmarkStop 660316 648173 -1.84% BenchmarkSimultaneousAfterFunc 859584 830782 -3.35% BenchmarkStartStop 326641 329376 +0.84% BenchmarkTicker 1954576 15619032 +699.10% BenchmarkNow 60.9 57.5 -5.58% BenchmarkNowUnixNano 63.5 59.6 -6.14% BenchmarkFormat 576 585 +1.56% BenchmarkFormatNow 585 575 -1.71% BenchmarkParse 560 556 -0.71% BenchmarkHour 78.0 75.9 -2.69% BenchmarkSecond 74.4 72.9 -2.02% BenchmarkYear 103 99.5 -3.40% BenchmarkDay 114 113 -0.88% I don't understand why. My windows/amd64 is running virtual, but still. Not sure what to do about that slowness. Alex
Sign in to reply to this message.
the code looks good to me to test it, I would (1) add 1<<31-1000 to t, so that the first phase switch happens after 1 second, and then printf everything that happens in that function during switch, and ensure that it behaves as we think; (2) set high to 1<<31 and add -1000 to t, this will trigger the second phase switch after 1 second. as for the slowdown, both tests set timers for 1 ns. I see 2 possibilities: (1) currently the timer thread never sleeps in this test; but with this change it tries to sleep for 1 ns, but OS actually suspends it to 1ms; or (2) that weirdness that is described at the bottom of http://msdn.microsoft.com/en-us/library/windows/desktop/ms724397(v=vs.85).aspx i.e. we sleep till the time when the timer must fire, re-check runtime.now and conclude that it's not yet the time to execute the timer, sleep again, woken up again, re-check and conclude again that it's not yet the time to execute the timer, and so on; or (3) GetTickCount return different values on different CPUs (e.g. time on CPU 2 slightly lags behind time on CPU 1).
Sign in to reply to this message.
On 2014/07/17 17:30:14, dvyukov wrote: > ... Thank you Dmitry. I will try it. Alex
Sign in to reply to this message.
On 2014/07/17 17:30:14, dvyukov wrote: > > as for the slowdown, both tests set timers for 1 ns. I see 2 possibilities: (1) > currently the timer thread never sleeps in this test; but with this change it > tries to sleep for 1 ns, but OS actually suspends it to 1ms; ... That is the case. addtimer starts timerproc on new thread. In old version (using GetSystemTimeAsFileTime) time would change by the time timerproc starts, so it didn't need to wait for 1 ns. New version (using GetTickCount) time is changing slower (every 10-16 ms), so timeproc see same time as addtime does, so it decides to wait. Minimum wait is around 15 ms on that system, so time.NewTicker(1) actually starts ticking after 15 ms, while old version worked instantaneously (no wait involved). It seems GetTickCount is too grainy - it does not change often enough. I looked at GetTickCount64, and it is only slightly better. I tried to use QueryPerformanceCounter and QueryPerformanceFrequency (just a rough initial version), and these look better then everything else: benchmark old ns/op new ns/op delta BenchmarkAfterFunc 5937728 9101912 +53.29% BenchmarkAfter 1990310800 10024799 -99.50% BenchmarkStop 643091 1540098 +139.48% BenchmarkSimultaneousAfterFunc 832063 1679752 +101.88% BenchmarkStartStop 321887 1117230 +247.09% BenchmarkTicker 1953200 1952223 -0.05% BenchmarkNow 58.4 53.3 -8.73% BenchmarkNowUnixNano 58.3 50.2 -13.89% BenchmarkFormat 489 499 +2.04% BenchmarkFormatNow 495 500 +1.01% BenchmarkParse 474 468 -1.27% BenchmarkHour 63.7 66.8 +4.87% BenchmarkSecond 61.0 63.8 +4.59% BenchmarkYear 83.5 86.7 +3.83% BenchmarkDay 94.6 97.2 +2.75% still there are some slow downs that I don't understand, but not as large as previous version. Maybe we should use QPC/QPF for amd64 and GetTickCount for 386. What do you think? I still have not tried your suggestion of just reading the clock directly ... Alex
Sign in to reply to this message.
On 2014/07/22 05:13:23, brainman wrote: > On 2014/07/17 17:30:14, dvyukov wrote: > > > > as for the slowdown, both tests set timers for 1 ns. I see 2 possibilities: > (1) > > currently the timer thread never sleeps in this test; but with this change it > > tries to sleep for 1 ns, but OS actually suspends it to 1ms; ... > > That is the case. addtimer starts timerproc on new thread. In old version (using > GetSystemTimeAsFileTime) time would change by the time timerproc starts, so it > didn't need to wait for 1 ns. New version (using GetTickCount) time is changing > slower (every 10-16 ms), so timeproc see same time as addtime does, so it > decides to wait. Minimum wait is around 15 ms on that system, so > time.NewTicker(1) actually starts ticking after 15 ms, while old version worked > instantaneously (no wait involved). > > It seems GetTickCount is too grainy - it does not change often enough. I looked > at GetTickCount64, and it is only slightly better. I tried to use > QueryPerformanceCounter and QueryPerformanceFrequency (just a rough initial > version), and these look better then everything else: > > benchmark old ns/op new ns/op delta > BenchmarkAfterFunc 5937728 9101912 +53.29% > BenchmarkAfter 1990310800 10024799 -99.50% > BenchmarkStop 643091 1540098 +139.48% > BenchmarkSimultaneousAfterFunc 832063 1679752 +101.88% > BenchmarkStartStop 321887 1117230 +247.09% > BenchmarkTicker 1953200 1952223 -0.05% > BenchmarkNow 58.4 53.3 -8.73% > BenchmarkNowUnixNano 58.3 50.2 -13.89% > BenchmarkFormat 489 499 +2.04% > BenchmarkFormatNow 495 500 +1.01% > BenchmarkParse 474 468 -1.27% > BenchmarkHour 63.7 66.8 +4.87% > BenchmarkSecond 61.0 63.8 +4.59% > BenchmarkYear 83.5 86.7 +3.83% > BenchmarkDay 94.6 97.2 +2.75% > > still there are some slow downs that I don't understand, but not as large as > previous version. Maybe we should use QPC/QPF for amd64 and GetTickCount for > 386. What do you think? I still have not tried your suggestion of just reading > the clock directly ... Humm... I am sure that I saw 1ms increments with GetTickCount (after we bump system timer precision). Basically I run this: package main import ( "fmt" "time" "unsafe" ) func main() { last := time.Now() last1 := *((*time.Duration)(unsafe.Pointer(uintptr(0x7FFE0008)))) * 100 for i := 0; i < 10000; i++ { now := time.Now() now1 := *((*time.Duration)(unsafe.Pointer(uintptr(0x7FFE0008)))) * 100 fmt.Printf("%v %v %v %v %v\n", now.Sub(last)-(now1-last1), now.Sub(last), now1-last1, now, now1) last = now last1 = now1 } } and saw that time.Now() and the raw reads produce exactly the same values. Maybe you just have an old OS, where GetTickCount is buggy? Hummm. DO you have any ideas as to why QPC is slower on some benchmarks? BenchmarkNow is runs faster, and QPC has enough prevision, so... why some benchmarks become slower?... In either case I don't think that these benchmarks are very representative (especially tons of successive 1ns sleeps).
Sign in to reply to this message.
On 2014/07/22 07:51:46, dvyukov wrote: > >... I am sure that I saw 1ms increments with GetTickCount (after we bump > system timer precision). … It, probably, is 1ms. But it is much longer than what we have now, and whole logic changes because of it. Right now we just return without wait, like I explained above, but after using GetTickCount we end up waiting on semaphore. timerproc thread starts quickly enough (less than 1ms, I guess), so it sees the same runtime.nanotime value as timer that is been added. So it all goes into wait for 1ns, but end up waiting around 15ms (we wait on windows event - I don’t think it waits for less than 15ms). QPC and reading 0x7FFE0008 is similar to the current approach, because they both change after about 100ns (much faster than GetTickCount). I am bit suspicious of reading 0x7FFE0008 approach. Will it work everywhere? What about multi CPUs? It is not documented. QPC, on the other hand, is very documented. And Microsoft recommends it above all. But it does not work on my XP https://code.google.com/p/go/issues/detail?id=6007#c36. Perhaps we should use QPC on amd64, and GetTickCount on 386? > Hummm. DO you have any ideas as to why QPC is slower on some benchmarks? No, I don’t. Should I investigate? > BenchmarkNow is runs faster, and QPC has enough prevision, so... why some > benchmarks become slower?... I don’t understand your question. Try again. > In either case I don't think that these benchmarks are very representative > (especially tons of successive 1ns sleeps). I agree, for the most of these. But I am concerned that time.NewThread(1) becomes slow and expensive. Given that it is easily handled on unix, it is, probably, quite common in peoples code. Alex
Sign in to reply to this message.
On 2014/07/22 09:29:26, brainman wrote: > On 2014/07/22 07:51:46, dvyukov wrote: > > > >... I am sure that I saw 1ms increments with GetTickCount (after we bump > > system timer precision). … > > It, probably, is 1ms. But it is much longer than what we have now, and whole > logic changes because of it. Right now we just return without wait, like I > explained above, but after using GetTickCount we end up waiting on semaphore. > timerproc thread starts quickly enough (less than 1ms, I guess), so it sees the > same runtime.nanotime value as timer that is been added. So it all goes into > wait for 1ns, but end up waiting around 15ms (we wait on windows event - I don’t > think it waits for less than 15ms). QPC and reading 0x7FFE0008 is similar to the > current approach, because they both change after about 100ns (much faster than > GetTickCount). Now I am puzzled about difference in behavior between GetTickCount and 0x7FFE0008. GetTickCount just reads that word... Maybe there are two variables: one is at 0x7FFE0008 and is updated every 100ns, and another is at 0x7FFE0004 and is updated every 1ms... > I am bit suspicious of reading 0x7FFE0008 approach. Will it work everywhere? > What about multi CPUs? It is not documented. Well, yes, these are all valid questions. All I can say is that I used 0x7FFE0008 like 10 years ago on XP. > QPC, on the other hand, is very documented. And Microsoft recommends it above > all. But it does not work on my XP > https://code.google.com/p/go/issues/detail?id=6007#c36. Perhaps we should use > QPC on amd64, and GetTickCount on 386? > > > Hummm. DO you have any ideas as to why QPC is slower on some benchmarks? > > No, I don’t. Should I investigate? > > > BenchmarkNow is runs faster, and QPC has enough prevision, so... why some > > benchmarks become slower?... > > I don’t understand your question. Try again. I mean that (1) QPC is faster than what we have now (BenchmarkNow measures that), (2) QPC has high precision (at least 100ns, or maybe higher), so the huge slowdown due to rescheduling that we observed with GetTickCount must not be happening. Now the question is: What else is different between the current impl and QPC what makes some of the benchmarks slower? It's not very important question to answer, but it's just puzzling. > > In either case I don't think that these benchmarks are very representative > > (especially tons of successive 1ns sleeps). > > I agree, for the most of these. But I am concerned that time.NewThread(1) > becomes slow and expensive. Given that it is easily handled on unix, it is, > probably, quite common in peoples code. > > Alex
Sign in to reply to this message.
QPC with fallback to GetTickCount on XP looks fine to me. I hope there are no people running high-performance Go servers on XP :)
Sign in to reply to this message.
On 2014/07/22 09:39:37, dvyukov wrote: > I mean that (1) QPC is faster than what we have now (BenchmarkNow measures > that), (2) QPC has high precision (at least 100ns, or maybe higher), so the huge > slowdown due to rescheduling that we observed with GetTickCount must not be > happening. Now the question is: What else is different between the current impl > and QPC what makes some of the benchmarks slower? > It's not very important question to answer, but it's just puzzling. Do the Windows hosts run on real hardware or virtual machines? I have encountered *massive* slowdown when running code with a lot of QPC calls in virtual machines, in particular Win7 hosts in VMWare containers.
Sign in to reply to this message.
On 2014/07/22 09:39:37, dvyukov wrote: > > Now I am puzzled about difference in behavior between GetTickCount and > 0x7FFE0008. GetTickCount just reads that word... 0x7FFE0008 points to int64, while GetTickCount returns uint32. GetTickCount counts in ms as per the doco. > I mean that (1) QPC is faster than what we have now (BenchmarkNow measures > that), (2) QPC has high precision (at least 100ns, or maybe higher), so the huge > slowdown due to rescheduling that we observed with GetTickCount must not be > happening. Now the question is: What else is different between the current impl > and QPC what makes some of the benchmarks slower? > It's not very important question to answer, but it's just puzzling. I will try to find out. Alex
Sign in to reply to this message.
On Tue, Jul 22, 2014 at 12:51 PM, <patrick@mezard.eu> wrote: > I have encountered *massive* slowdown when running code with a lot of > QPC calls in virtual machines, in particular Win7 hosts in VMWare > containers. Yep, to be expected. QPC uses HPET, which is a scarce resource. If every VM thinks it owns HPET, things become ugly. The QEMU used in the KVM port to SmartOS actually has a separate embedded cyclics subsystem implementation because of this... It should be ported to upstream QEMU so KVM on Linux can use it too. -- Aram Hăvărneanu
Sign in to reply to this message.
On 2014/07/22 09:42:48, dvyukov wrote: > QPC with fallback to GetTickCount on XP looks fine to me. Sounds like a plan. > I hope there are no > people running high-performance Go servers on XP :) I don't run one myself. I doubt there other windows XP Go users. :-) Alex
Sign in to reply to this message.
On 2014/07/22 09:51:59, patrick2 wrote: > > Do the Windows hosts run on real hardware or virtual machines? > No. I run XP on real PC. Alex
Sign in to reply to this message.
On Tue, Jul 22, 2014 at 2:21 PM, Aram Hăvărneanu <aram.h@mgk.ro> wrote: > On Tue, Jul 22, 2014 at 12:51 PM, <patrick@mezard.eu> wrote: >> I have encountered *massive* slowdown when running code with a lot of >> QPC calls in virtual machines, in particular Win7 hosts in VMWare >> containers. > > Yep, to be expected. QPC uses HPET, which is a scarce resource. If > every VM thinks it owns HPET, things become ugly. The QEMU used in the > KVM port to SmartOS actually has a separate embedded cyclics subsystem > implementation because of this... It should be ported to upstream QEMU > so KVM on Linux can use it too. How bad can it get? We query time on every timer setup, timer fire; and besides that runtime queries time every time it polls network and just periodically (the frequency is 20us, but there are sleeps in between, and I think windows sleeps for at least 1 ms). Alex, this can be a reason to not use QPC, because people do run servers in VMs today.
Sign in to reply to this message.
On 2014/07/22 10:29:01, dvyukov wrote: > > Alex, this can be a reason to not use QPC, because people do run > servers in VMs today. Do you suggest we keep this CL as is. Maybe use GetTickCount64 when available? But GetTickCount64 won't improve our time.NewTicker(1) situation. It is marginally better precission than GettickCount, but not enough to make time.NewTicker(1) quick. On the other hand, it returns int64 counter, so we don't need to anything else with returned value. Alex
Sign in to reply to this message.
On 2014/07/22 07:51:46, dvyukov wrote: > > Humm... I am sure that I saw 1ms increments with GetTickCount (after we bump > system timer precision). Basically I run this: > > package main > > import ( > "fmt" > "time" > "unsafe" > ) > > func main() { > last := time.Now() > last1 := *((*time.Duration)(unsafe.Pointer(uintptr(0x7FFE0008)))) * 100 > for i := 0; i < 10000; i++ { > now := time.Now() > now1 := *((*time.Duration)(unsafe.Pointer(uintptr(0x7FFE0008)))) * 100 > fmt.Printf("%v %v %v %v %v\n", now.Sub(last)-(now1-last1), now.Sub(last), > now1-last1, now, now1) > last = now > last1 = now1 > } > } > > and saw that time.Now() and the raw reads produce exactly the same values. > Just for the record. I measured time difference between values (when the value changes) returned from different Windows API in a loop like this. Both GetTickCount and GetTickCount64 change every 15-16 millisecond, while time.Now().UnixNano() (uses GetSystemTimeAsFileTime under covers) and "reading 0x7FFE0008" changes every 1 millisecond. This is all on windows/amd64. I am not convinced that QueryPerformanceCounter is the answer either - it is probably too expensive / might take too long to run (comparing to all the above). I am not sure what to do here. Alex
Sign in to reply to this message.
On 2014/07/23 02:27:35, brainman wrote: > On 2014/07/22 07:51:46, dvyukov wrote: > > > > Humm... I am sure that I saw 1ms increments with GetTickCount (after we bump > > system timer precision). Basically I run this: > > > > package main > > > > import ( > > "fmt" > > "time" > > "unsafe" > > ) > > > > func main() { > > last := time.Now() > > last1 := *((*time.Duration)(unsafe.Pointer(uintptr(0x7FFE0008)))) * 100 > > for i := 0; i < 10000; i++ { > > now := time.Now() > > now1 := *((*time.Duration)(unsafe.Pointer(uintptr(0x7FFE0008)))) * 100 > > fmt.Printf("%v %v %v %v %v\n", now.Sub(last)-(now1-last1), now.Sub(last), > > now1-last1, now, now1) > > last = now > > last1 = now1 > > } > > } > > > > and saw that time.Now() and the raw reads produce exactly the same values. > > > > Just for the record. I measured time difference between values (when the value > changes) returned from different Windows API in a loop like this. Both > GetTickCount and GetTickCount64 change every 15-16 millisecond, while > time.Now().UnixNano() (uses GetSystemTimeAsFileTime under covers) and "reading > 0x7FFE0008" changes every 1 millisecond. This is all on windows/amd64. > > I am not convinced that QueryPerformanceCounter is the answer either - it is > probably too expensive / might take too long to run (comparing to all the > above). > > I am not sure what to do here. Yeah... I would go with 0x7FFE0008. It's known on the internet, so it's unlikely that it will change (at least not in an existing OS version): http://www.dcl.hpi.uni-potsdam.de/research/WRK/2007/08/getting-os-information... http://uninformed.org/index.cgi?v=2&a=2&p=15 http://www.informit.com/articles/article.aspx?p=167857&seqNum=8 http://forum.sysinternals.com/0x7ffe0000-what-is-in-it_topic10012.html This link also suggests that CRT depends on it: https://github.com/sherpya/nloader/blob/master/README.md I don't remember whether it's possible to link CRT statically in Visual Studio. But if it is possible, then it means that existing binary applications compiled with Visual Studio will also break, which is not going to happen. It's definitely the fastest option. And it can also be the simplest option.
Sign in to reply to this message.
On 2014/07/23 07:52:56, dvyukov wrote: > > I would go with 0x7FFE0008. ... That works very well on my windows 7 pc (windows/amd64) - thanks for the links. But now I have problem with windows xp (windows/386). time.TestAfter (and time.Sleep) ends too early: sleep_test.go:154: After(100ms) slept for only 93750000 ns It appears, windows xp (unlike windows 7) have different time precisions: counter at 0x7FFE0008 changes every 1ms, while GetSystemTimeAsFileTime changes every 15-16ms. time.TestAfter uses both counters: it does time.Sleep (which uses counter at 0x7FFE0008) and than measures that sleep with GetSystemTimeAsFileTime. Obviously that will not work - especially for small durations. For example sleeping for 43ms, then GetSystemTimeAsFileTime says we've slept for 31.25ms. It didn't increment by the next 15ms yet. I don't see how we could resolve that unless we relax some tests. Or perhaps I could use GetTickCount on windows/386. GetTickCount changes every 15ms on windows, so at least the 2 will be more in sync. Still ... What do you think? Alex
Sign in to reply to this message.
On 2014/07/24 07:43:44, brainman wrote: > On 2014/07/23 07:52:56, dvyukov wrote: > > > > I would go with 0x7FFE0008. ... > > That works very well on my windows 7 pc (windows/amd64) - thanks for the links. > But now I have problem with windows xp (windows/386). time.TestAfter (and > time.Sleep) ends too early: > > sleep_test.go:154: After(100ms) slept for only 93750000 ns > > It appears, windows xp (unlike windows 7) have different time precisions: > counter at 0x7FFE0008 changes every 1ms, while GetSystemTimeAsFileTime changes > every 15-16ms. time.TestAfter uses both counters: it does time.Sleep (which uses > counter at 0x7FFE0008) and than measures that sleep with > GetSystemTimeAsFileTime. Obviously that will not work - especially for small > durations. For example sleeping for 43ms, then GetSystemTimeAsFileTime says > we've slept for 31.25ms. It didn't increment by the next 15ms yet. > > I don't see how we could resolve that unless we relax some tests. Or perhaps I > could use GetTickCount on windows/386. GetTickCount changes every 15ms on > windows, so at least the 2 will be more in sync. Still ... > > What do you think? Oh, boy. GetTickCount ticks every 15-16 ms. Now they question is: does GetSystemTimeAsFileTime tick as 15-16 ms as well, or it ticks at 15.25ms? If it's the latter, can't we get the same situation when GetSystemTimeAsFileTime is slightly behind (e.g. 0.75 ms behind)? Or it's GetTickCount that is always behind (if they round it down)? Since we are already using _KUSER_SHARED_DATA, maybe it will be simple to use _KUSER_SHARED_DATA.TickCountLow (at 0x7FFE0000) on XP. I think that it ticks at the same 15-16 ms on XP. Another idea: can't we just round off the 0x7FFE0008 value on XP, so that it as if ticks at the same 15-16ms? But I suspect that it can be tricky to round it to exactly match GetSystemTimeAsFileTime... Another observation: the test is actually incorrect, because it expects that system time does not change. When you sleep for 1 second, system time can be any random value after the sleep. I don't have a good answer for you. But I would prefer good and fast code for newer OSes, and a workaround for the XP bug.
Sign in to reply to this message.
On 2014/07/24 08:15:04, dvyukov wrote: > On 2014/07/24 07:43:44, brainman wrote: > > On 2014/07/23 07:52:56, dvyukov wrote: > > > > > > I would go with 0x7FFE0008. ... > > > > That works very well on my windows 7 pc (windows/amd64) - thanks for the > links. > > But now I have problem with windows xp (windows/386). time.TestAfter (and > > time.Sleep) ends too early: > > > > sleep_test.go:154: After(100ms) slept for only 93750000 ns > > > > It appears, windows xp (unlike windows 7) have different time precisions: > > counter at 0x7FFE0008 changes every 1ms, while GetSystemTimeAsFileTime changes > > every 15-16ms. time.TestAfter uses both counters: it does time.Sleep (which > uses > > counter at 0x7FFE0008) and than measures that sleep with > > GetSystemTimeAsFileTime. Obviously that will not work - especially for small > > durations. For example sleeping for 43ms, then GetSystemTimeAsFileTime says > > we've slept for 31.25ms. It didn't increment by the next 15ms yet. > > > > I don't see how we could resolve that unless we relax some tests. Or perhaps I > > could use GetTickCount on windows/386. GetTickCount changes every 15ms on > > windows, so at least the 2 will be more in sync. Still ... > > > > What do you think? > > > Oh, boy. > > GetTickCount ticks every 15-16 ms. Now they question is: does > GetSystemTimeAsFileTime tick as 15-16 ms as well, or it ticks at 15.25ms? If > it's the latter, can't we get the same situation when GetSystemTimeAsFileTime is > slightly behind (e.g. 0.75 ms behind)? Or it's GetTickCount that is always > behind (if they round it down)? > > Since we are already using _KUSER_SHARED_DATA, maybe it will be simple to use > _KUSER_SHARED_DATA.TickCountLow (at 0x7FFE0000) on XP. I think that it ticks at > the same 15-16 ms on XP. > > Another idea: can't we just round off the 0x7FFE0008 value on XP, so that it as > if ticks at the same 15-16ms? But I suspect that it can be tricky to round it to > exactly match GetSystemTimeAsFileTime... > > Another observation: the test is actually incorrect, because it expects that > system time does not change. When you sleep for 1 second, system time can be any > random value after the sleep. > > I don't have a good answer for you. But I would prefer good and fast code for > newer OSes, and a workaround for the XP bug. Well, another possibility is to manually bump GetSystemTimeAsFileTime precision on XP along the lines of: When we query GetSystemTimeAsFileTime also query 0x7FFE0008 and remember these values. When we next time query GetSystemTimeAsFileTime, if it returns the same value as before, then query 0x7FFE0008, calculate the diff and add to GetSystemTimeAsFileTime value. This way GetSystemTimeAsFileTime will tick at 1 ms at XP. I don't know how complex a reliable monotonic solution will be.
Sign in to reply to this message.
I tested all possible candidate APIs again. This program: package main import ( "fmt" "runtime" "syscall" "unsafe" ) var ( modkernel32 = syscall.NewLazyDLL("kernel32.dll") procGetTickCount64 = modkernel32.NewProc("GetTickCount64") procGetTickCount = modkernel32.NewProc("GetTickCount") ) func GetTickCount64() uint64 { r, _, _ := syscall.Syscall(procGetTickCount64.Addr(), 0, 0, 0, 0) return uint64(r) } func GetTickCount() uint32 { r, _, _ := syscall.Syscall(procGetTickCount.Addr(), 0, 0, 0, 0) return uint32(r) } func getSystemTimeAsFileTime() uint64 { var ft syscall.Filetime syscall.GetSystemTimeAsFileTime(&ft) return uint64(ft.Nanoseconds()) } func getTickCount() uint64 { return uint64(GetTickCount()) * 1000000 } func getTickCount64() uint64 { return GetTickCount64() * 1000000 } type KSYSTEM_TIME struct { LowPart uint32 High1Time int32 High2Time int32 } var ( // KUSER_SHARED_DATA interruptTimeAddr = (*KSYSTEM_TIME)(unsafe.Pointer(uintptr(0x7FFE0008))) systemTimeAddr = (*KSYSTEM_TIME)(unsafe.Pointer(uintptr(0x7FFE0014))) ) func readUSD(addr *KSYSTEM_TIME) uint64 { var t KSYSTEM_TIME for i := 0; ; i++ { t.High1Time = addr.High1Time t.LowPart = addr.LowPart t.High2Time = addr.High2Time if t.High1Time == t.High2Time { break } if i > 100 { panic("should have advanced by now") } } return *((*uint64)(unsafe.Pointer(&t))) * 100 } func readInterruptTime() uint64 { return readUSD(interruptTimeAddr) } func readSystemTime() uint64 { return readUSD(systemTimeAddr) } func getprec(getticks func() uint64) []uint64 { r := make([]uint64, 0) c := getticks() for i := 0; i < 5; { c2 := getticks() if c != c2 { r = append(r, c2-c) i++ } } return r } func main() { fmt.Printf("getSystemTimeAsFileTime: %v\n", getprec(getSystemTimeAsFileTime)) fmt.Printf("getTickCount : %v\n", getprec(getTickCount)) if runtime.GOARCH == "amd64" { fmt.Printf("getTickCount64 : %v\n", getprec(getTickCount64)) } fmt.Printf("readInterruptTime : %v\n", getprec(readInterruptTime)) fmt.Printf("readSystemTime : %v\n", getprec(readSystemTime)) } prints C:\>u:\test getSystemTimeAsFileTime: [15625000 15625000 15625000 15625000 15625000] getTickCount : [15000000 15000000 15000000 15000000 15000000] readInterruptTime : [976600 976600 976600 976600 976600] readSystemTime : [15625000 15625000 15625000 15625000 15625000] on windows xp, and C:\>u:\test getSystemTimeAsFileTime: [977000 977000 977000 977000 977000] getTickCount : [16000000 16000000 16000000 16000000 16000000] getTickCount64 : [15000000 15000000 15000000 15000000 15000000] readInterruptTime : [976500 976500 976500 976500 976500] readSystemTime : [977000 977000 977000 977000 977000] on windows 7. I still haven't made my mind about what to use here, so any suggestions are welcome. Alex
Sign in to reply to this message.
Do you know if timeBeginPeriod requires admin rights? If yes, then the numbers can be different for program running w/o admin rights... On Fri, Jul 25, 2014 at 8:55 AM, <alex.brainman@gmail.com> wrote: > I tested all possible candidate APIs again. This program: > > > package main > > import ( > "fmt" > "runtime" > "syscall" > "unsafe" > ) > > var ( > modkernel32 = syscall.NewLazyDLL("kernel32.dll") > procGetTickCount64 = modkernel32.NewProc("GetTickCount64") > procGetTickCount = modkernel32.NewProc("GetTickCount") > ) > > func GetTickCount64() uint64 { > r, _, _ := syscall.Syscall(procGetTickCount64.Addr(), 0, 0, 0, 0) > return uint64(r) > } > > func GetTickCount() uint32 { > r, _, _ := syscall.Syscall(procGetTickCount.Addr(), 0, 0, 0, 0) > return uint32(r) > } > > func getSystemTimeAsFileTime() uint64 { > var ft syscall.Filetime > syscall.GetSystemTimeAsFileTime(&ft) > return uint64(ft.Nanoseconds()) > } > > func getTickCount() uint64 { > return uint64(GetTickCount()) * 1000000 > } > > func getTickCount64() uint64 { > return GetTickCount64() * 1000000 > } > > type KSYSTEM_TIME struct { > LowPart uint32 > High1Time int32 > High2Time int32 > } > > var ( > // KUSER_SHARED_DATA > interruptTimeAddr = > (*KSYSTEM_TIME)(unsafe.Pointer(uintptr(0x7FFE0008))) > systemTimeAddr = > (*KSYSTEM_TIME)(unsafe.Pointer(uintptr(0x7FFE0014))) > ) > > func readUSD(addr *KSYSTEM_TIME) uint64 { > var t KSYSTEM_TIME > for i := 0; ; i++ { > t.High1Time = addr.High1Time > t.LowPart = addr.LowPart > t.High2Time = addr.High2Time > if t.High1Time == t.High2Time { > break > } > if i > 100 { > panic("should have advanced by now") > } > } > return *((*uint64)(unsafe.Pointer(&t))) * 100 > } > > func readInterruptTime() uint64 { > return readUSD(interruptTimeAddr) > } > > func readSystemTime() uint64 { > return readUSD(systemTimeAddr) > } > > func getprec(getticks func() uint64) []uint64 { > r := make([]uint64, 0) > c := getticks() > for i := 0; i < 5; { > c2 := getticks() > if c != c2 { > r = append(r, c2-c) > i++ > } > } > return r > } > > func main() { > fmt.Printf("getSystemTimeAsFileTime: %v\n", > getprec(getSystemTimeAsFileTime)) > fmt.Printf("getTickCount : %v\n", getprec(getTickCount)) > if runtime.GOARCH == "amd64" { > fmt.Printf("getTickCount64 : %v\n", > getprec(getTickCount64)) > } > fmt.Printf("readInterruptTime : %v\n", > getprec(readInterruptTime)) > fmt.Printf("readSystemTime : %v\n", getprec(readSystemTime)) > } > > prints > > C:\>u:\test > getSystemTimeAsFileTime: [15625000 15625000 15625000 15625000 15625000] > getTickCount : [15000000 15000000 15000000 15000000 15000000] > readInterruptTime : [976600 976600 976600 976600 976600] > readSystemTime : [15625000 15625000 15625000 15625000 15625000] > > on windows xp, and > > C:\>u:\test > getSystemTimeAsFileTime: [977000 977000 977000 977000 977000] > getTickCount : [16000000 16000000 16000000 16000000 16000000] > getTickCount64 : [15000000 15000000 15000000 15000000 15000000] > readInterruptTime : [976500 976500 976500 976500 976500] > readSystemTime : [977000 977000 977000 977000 977000] > > on windows 7. > > I still haven't made my mind about what to use here, so any suggestions > are welcome. > > Alex > > > https://codereview.appspot.com/108700045/ > > -- > You received this message because you are subscribed to the Google Groups > "golang-codereviews" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to golang-codereviews+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/d/optout.
Sign in to reply to this message.
On 2014/07/25 07:02:40, dvyukov wrote: > Do you know if timeBeginPeriod requires admin rights? ... I don't know. I suspect it does. > ... If yes, then the > numbers can be different for program running w/o admin rights... It will indeed. This is Go program with timeBeginPeriod, and with timeBeginPeriod removed: getSystemTimeAsFileTime: [976500 976500 976500 976500 976500] getTickCount : [15000000 15000000 15000000 15000000 15000000] getTickCount64 : [16000000 16000000 16000000 16000000 16000000] readInterruptTime : [976500 976500 976500 976500 976500] readSystemTime : [976500 976500 976500 976500 976500] getSystemTimeAsFileTime: [15625000 15625000 15625000 15625000 15625000] getTickCount : [16000000 16000000 16000000 16000000 16000000] getTickCount64 : [15000000 15000000 15000000 15000000 15000000] readInterruptTime : [15625000 15625000 15625000 15625000 15625000] readSystemTime : [15625000 15625000 15625000 15625000 15625000] Things aren't getting better. :-( Alex
Sign in to reply to this message.
This is getting insane. I think just need to relaxed the sleep test on windows and give it 16ms inconsistency window. Users should not rely on the condition that the test currently tests, because system time is essentially a random value. Btw, that SystemTime at 0x7FFE0014 is the same as SystemTimeAsFileTime, right? What do you think about using it for time.Now? If we rely on KUSER_SHARED_DATA, it does not matter whether we read one or two fields from it. Here are the "docs": http://uninformed.org/index.cgi?v=2&a=2&p=20
Sign in to reply to this message.
On 2014/07/25 07:42:39, dvyukov wrote: > > Btw, that SystemTime at 0x7FFE0014 is the same as SystemTimeAsFileTime, right? Looks that way to me. > What do you think about using it for time.Now? If we rely on KUSER_SHARED_DATA, > it does not matter whether we read one or two fields from it. I agree. Less code and faster. Alex
Sign in to reply to this message.
Hello golang-codereviews@googlegroups.com, dvyukov@google.com, patrick@mezard.eu, aram.h@mgk.ro (cc: golang-codereviews@googlegroups.com), Please take another look.
Sign in to reply to this message.
Dmitry, Finally I am happy with result (as good as we can make it). It is simple and fast. And benchmarks windows/386 C:\go\root\src\pkg\time>benchcmp.exe old new benchmark old ns/op new ns/op delta BenchmarkAfterFunc 12187656 12031404 -1.28% BenchmarkAfter 15625200000 1953150000 -87.50% BenchmarkStop 968762 937512 -3.23% BenchmarkSimultaneousAfterFunc 968762 945324 -2.42% BenchmarkStartStop 478131 443755 -7.19% BenchmarkTicker 15625200 1953150 -87.50% BenchmarkNow 81.3 48.4 -40.47% BenchmarkNowUnixNano 83.6 52.6 -37.08% BenchmarkFormat 820 835 +1.83% BenchmarkFormatNow 828 835 +0.85% BenchmarkParse 664 664 +0.00% BenchmarkHour 118 118 +0.00% BenchmarkSecond 115 115 +0.00% BenchmarkYear 200 210 +5.00% BenchmarkDay 204 214 +4.90% and windows/amd64 C:\go\root\src\pkg\time>benchcmp.exe old new benchmark old ns/op new ns/op delta BenchmarkAfterFunc 6087230 4948613 -18.71% BenchmarkAfter 1966268200 1966097100 -0.01% BenchmarkStop 745893 701270 -5.98% BenchmarkSimultaneousAfterFunc 728807 719339 -1.30% BenchmarkStartStop 295818 268006 -9.40% BenchmarkTicker 1952600 1953400 +0.04% BenchmarkNow 50.3 33.5 -33.40% BenchmarkNowUnixNano 50.9 34.2 -32.81% BenchmarkFormat 540 531 -1.67% BenchmarkFormatNow 533 535 +0.38% BenchmarkParse 474 488 +2.95% BenchmarkHour 64.5 66.3 +2.79% BenchmarkSecond 62.1 63.8 +2.74% BenchmarkYear 85.4 88.3 +3.40% BenchmarkDay 95.3 99.0 +3.88% aren't bad. I have small concern. Reading of KSYSTEM_TIME fields in runtime.systime suppose to happens in order. I hope that compiler or human not reorder these. What do you think? Alex
Sign in to reply to this message.
this is much nicer! Thanks. https://codereview.appspot.com/108700045/diff/80001/src/pkg/runtime/os_windows.c File src/pkg/runtime/os_windows.c (right): https://codereview.appspot.com/108700045/diff/80001/src/pkg/runtime/os_window... src/pkg/runtime/os_windows.c:274: INTERRUPT_TIME = (KSYSTEM_TIME*)0x7ffe0008, let's use const KSYSTEM_TIME* INTERRUPT_TIME instead of a enum. I don't C support non-integer-typed enums like this. https://codereview.appspot.com/108700045/diff/80001/src/pkg/runtime/os_window... src/pkg/runtime/os_windows.c:291: return *((int64*)&t); return t.High1Time << 32 | t.LowPart; no endian assumptions please. https://codereview.appspot.com/108700045/diff/80001/src/pkg/runtime/os_window... src/pkg/runtime/os_windows.c:293: runtime·throw("runtime.nanotime did not advance"); nanotime? this message is confusing. I think it should be interrupt/system time is changing too fast...
Sign in to reply to this message.
https://codereview.appspot.com/108700045/diff/80001/src/pkg/runtime/os_windows.c File src/pkg/runtime/os_windows.c (right): https://codereview.appspot.com/108700045/diff/80001/src/pkg/runtime/os_window... src/pkg/runtime/os_windows.c:274: INTERRUPT_TIME = (KSYSTEM_TIME*)0x7ffe0008, On 2014/07/31 02:15:33, minux wrote: > let's use const KSYSTEM_TIME* INTERRUPT_TIME instead of > a enum. > > I don't C support non-integer-typed enums like this. Done. https://codereview.appspot.com/108700045/diff/80001/src/pkg/runtime/os_window... src/pkg/runtime/os_windows.c:291: return *((int64*)&t); On 2014/07/31 02:15:33, minux wrote: > return t.High1Time << 32 | t.LowPart; > > no endian assumptions please. Done. https://codereview.appspot.com/108700045/diff/80001/src/pkg/runtime/os_window... src/pkg/runtime/os_windows.c:293: runtime·throw("runtime.nanotime did not advance"); On 2014/07/31 02:15:33, minux wrote: > nanotime? Code rot. > I think it should be interrupt/system time is changing too fast... Fine. Changed.
Sign in to reply to this message.
Hello golang-codereviews@googlegroups.com, dvyukov@google.com, patrick@mezard.eu, aram.h@mgk.ro, minux@golang.org (cc: golang-codereviews@googlegroups.com), Please take another look.
Sign in to reply to this message.
LGTM. Please wait for dmitry.
Sign in to reply to this message.
I like the runtime code too. LGTM if you agree with the fix https://codereview.appspot.com/108700045/diff/100001/src/pkg/runtime/os_windo... File src/pkg/runtime/os_windows.c (right): https://codereview.appspot.com/108700045/diff/100001/src/pkg/runtime/os_windo... src/pkg/runtime/os_windows.c:291: runtime·throw("interrupt/system time is changing too fast"); this scares me note that this algorithm is not lock-free for readers i.e. merely retrying the reader loop won't fix the inconsistency consider that the timer interrupt handler is preempted by an NMI when it has updated High2Time but not High1Time, this routine can easily do 1000 iterations quickly and on all iterations observe inconsistent values and terminate the program I would trust more the interrupt handler to do the correct thing. It does not matter if we burn more CPU and introduce additional delay before crashing the program on a buggy system. I would do something along the lines of: for(i = 1; i < 10000; i++) { if((i%100) == 0) runtime.osyield(); ... https://codereview.appspot.com/108700045/diff/100001/src/pkg/time/sleep_test.go File src/pkg/time/sleep_test.go (right): https://codereview.appspot.com/108700045/diff/100001/src/pkg/time/sleep_test.... src/pkg/time/sleep_test.go:18: const windowsInaccuracy = 17 * Millisecond please add a short comment with pointers here it needs to say that we are using different functions for time.Now and sleeping, that can tick at different frequencies and it needs to spell the configuration where it is reproducible (e.g. 386 XP) so that it's not removed in future because nobody can reproduce the bahavior anymore
Sign in to reply to this message.
*** Submitted as https://code.google.com/p/go/source/detail?r=b673250d7b72 *** runtime: implement monotonic clocks on windows Update issue 6007. LGTM=minux, dvyukov R=golang-codereviews, dvyukov, patrick, aram.h, minux CC=golang-codereviews https://codereview.appspot.com/108700045 https://codereview.appspot.com/108700045/diff/100001/src/pkg/runtime/os_windo... File src/pkg/runtime/os_windows.c (right): https://codereview.appspot.com/108700045/diff/100001/src/pkg/runtime/os_windo... src/pkg/runtime/os_windows.c:291: runtime·throw("interrupt/system time is changing too fast"); On 2014/07/31 08:17:02, dvyukov wrote: > this scares me > > note that this algorithm is not lock-free for readers > i.e. merely retrying the reader loop won't fix the inconsistency > consider that the timer interrupt handler is preempted by an NMI when it has > updated High2Time but not High1Time, this routine can easily do 1000 iterations > quickly and on all iterations observe inconsistent values and terminate the > program > > I would trust more the interrupt handler to do the correct thing. It does not > matter if we burn more CPU and introduce additional delay before crashing the > program on a buggy system. > > I would do something along the lines of: > > for(i = 1; i < 10000; i++) { > if((i%100) == 0) > runtime.osyield(); > ... Done. But I have moved if statement to the end of for loop body. I don't want runtime.osyield to run every time we call runtime·systime. I suspect more often then not we will find t.High1Time == t.High2Time on first try, so why waste CPU in most common case. https://codereview.appspot.com/108700045/diff/100001/src/pkg/time/sleep_test.go File src/pkg/time/sleep_test.go (right): https://codereview.appspot.com/108700045/diff/100001/src/pkg/time/sleep_test.... src/pkg/time/sleep_test.go:18: const windowsInaccuracy = 17 * Millisecond On 2014/07/31 08:17:02, dvyukov wrote: > please add a short comment with pointers here > it needs to say that we are using different functions for time.Now and sleeping, > that can tick at different frequencies > and it needs to spell the configuration where it is reproducible (e.g. 386 XP) > so that it's not removed in future because nobody can reproduce the bahavior > anymore Done.
Sign in to reply to this message.
Message was sent while issue was closed.
https://codereview.appspot.com/108700045/diff/100001/src/pkg/runtime/os_windo... File src/pkg/runtime/os_windows.c (right): https://codereview.appspot.com/108700045/diff/100001/src/pkg/runtime/os_windo... src/pkg/runtime/os_windows.c:291: runtime·throw("interrupt/system time is changing too fast"); On 2014/08/01 01:18:20, brainman wrote: > On 2014/07/31 08:17:02, dvyukov wrote: > > this scares me > > > > note that this algorithm is not lock-free for readers > > i.e. merely retrying the reader loop won't fix the inconsistency > > consider that the timer interrupt handler is preempted by an NMI when it has > > updated High2Time but not High1Time, this routine can easily do 1000 > iterations > > quickly and on all iterations observe inconsistent values and terminate the > > program > > > > I would trust more the interrupt handler to do the correct thing. It does not > > matter if we burn more CPU and introduce additional delay before crashing the > > program on a buggy system. > > > > I would do something along the lines of: > > > > for(i = 1; i < 10000; i++) { > > if((i%100) == 0) > > runtime.osyield(); > > ... > > Done. But I have moved if statement to the end of for loop body. > > I don't want runtime.osyield to run every time we call runtime·systime. I > suspect more often then not we will find t.High1Time == t.High2Time on first > try, so why waste CPU in most common case. Note that I started loop counter with 1: for(i = 1; i < 10000; i++) { if((i%100) == 0) runtime.osyield(); ... first osyield would have been called only on 99-th iteration. With your code osyield is executed after the first failed attempt. I don't know how important it is, but your code wastes more CPU in common case.
Sign in to reply to this message.
Message was sent while issue was closed.
On 2014/08/05 12:59:11, dvyukov wrote: > > first osyield would have been called only on 99-th iteration. > With your code osyield is executed after the first failed attempt. ... I knew I was missing something. https://codereview.appspot.com/117670043 Alex
Sign in to reply to this message.
|