|
|
Created:
12 years, 6 months ago by hector Modified:
12 years, 6 months ago Reviewers:
CC:
brainman, jp, rsc, dvyukov, golang-dev Visibility:
Public. |
Descriptionruntime: implement pprof support for windows
Credit to jp for proof of concept.
Patch Set 1 #Patch Set 2 : diff -r b7badceba5ec https://go.googlecode.com/hg/ #
Total comments: 23
Patch Set 3 : diff -r 2d8706ce68b4 https://go.googlecode.com/hg/ #
Total comments: 3
Patch Set 4 : diff -r 785d9a614916 https://go.googlecode.com/hg/ #
Total comments: 6
Patch Set 5 : diff -r 805742b53bf3 https://go.googlecode.com/hg/ #
Total comments: 2
Patch Set 6 : diff -r c588e886a8bd https://go.googlecode.com/hg/ #Patch Set 7 : diff -r b80728111ff9 https://go.googlecode.com/hg/ #
Total comments: 1
MessagesTotal messages: 47
Hello alex.brainman@gmail.com, jp@webmaster.ms, rsc@golang.org (cc: golang-dev@googlegroups.com), I'd like you to review this change to https://go.googlecode.com/hg/
Sign in to reply to this message.
http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/386/sys.s File src/pkg/runtime/windows/386/sys.s (right): http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/386/sys.... src/pkg/runtime/windows/386/sys.s:102: CALL runtime·externalthreadhandler(SB) PUSHL $runtime·ctrlhandler1(SB) JMP runtime·externalthreadhandler(SB) ? http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/amd64/sys.s File src/pkg/runtime/windows/amd64/sys.s (right): http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/amd64/sy... src/pkg/runtime/windows/amd64/sys.s:107: RET JMP runtime·externalthreadhandler(SB) ? http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/thread.c File src/pkg/runtime/windows/thread.c (right): http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/thread.c... src/pkg/runtime/windows/thread.c:353: r->ContextFlags = 0x100001; // CONTEXT_CONTROL 0x00010001 on 386, 0x00100001 on AMD64 and ARM http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/thread.c... src/pkg/runtime/windows/thread.c:366: if((uintptr)runtime·stdcall(runtime·ResumeThread, 1, m->thandle) == -1) ResumeThread may be called before runtime·dosigprof() to minimize the stop time. http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/thread.c... src/pkg/runtime/windows/thread.c:381: ms = 1000 / hz; ms can be zero. then "If this parameter is zero, the timer is signaled once. If this parameter is greater than zero, the timer is periodic."
Sign in to reply to this message.
Nice and simple. As always <g>. Please, add "Fixes issue 2041" to CL description. http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/386/sys.s File src/pkg/runtime/windows/386/sys.s (right): http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/386/sys.... src/pkg/runtime/windows/386/sys.s:145: MOVL 0(SP), CX I don't think you have tested 386 version. I think you just want RET here, otherwise it crashes after first call into this function. http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/thread.c File src/pkg/runtime/windows/thread.c (right): http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/thread.c... src/pkg/runtime/windows/thread.c:61: runtime·stdcall(runtime·DuplicateHandle, 7, (uintptr)-1, (uintptr)-2, (uintptr)-1, &m->thandle, (uintptr)0, (uintptr)0, (uintptr)2); Please, use name consts here, or some comments for the rest of use <g>. What handle is -2? http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/thread.c... src/pkg/runtime/windows/thread.c:352: r = (Context*)((uintptr)(&rbuf[15]) & ~15); Again, please, say here that you are just aligning r. http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/thread.c... src/pkg/runtime/windows/thread.c:373: { I can see this function is called twice with hz!=0. First time from runtime/pprof.StartCPUProfile: runtime.resetcpuprofiler(0x64, 0x40e1c7) /root/hg/go/src/pkg/runtime/windows/thread.c:392 +0xd0 runtime.setcpuprofilerate(0x403a90, 0x64, 0x402a9c) /root/hg/go/src/pkg/runtime/proc.c:1586 +0xb6 runtime.SetCPUProfileRate(0x64, 0x0) /root/hg/go/src/pkg/runtime/cpuprof.c:167 +0x16e runtime/pprof.StartCPUProfile(0x10a13040, 0x10a133a0, 0x0, 0x0) /root/hg/go/src/pkg/runtime/pprof/pprof.go:147 +0x175 main.main() /root/t2/test.go:16 +0xe5 runtime.mainstart() /root/hg/go/src/pkg/runtime/386/asm.s:96 +0xf runtime.goexit() /root/hg/go/src/pkg/runtime/proc.c:246 and then from schedule: runtime.gosched() /root/hg/go/src/pkg/runtime/proc.c:804 +0x55 runtime.exitsyscall() /root/hg/go/src/pkg/runtime/proc.c:906 +0x7b runtime.cgocall(0x412268, 0x2715dc, 0x2715b8) /root/hg/go/src/pkg/runtime/cgocall.c:136 +0xf0 runtime.syscall(0x7c590cf7, 0x2, 0x27160c, 0x0, 0x2715f8, ...) /root/hg/go/src/pkg/runtime/windows/thread.c:256 +0x49 syscall.getprocaddress(0x7c570000, 0x10a03f80, 0x10a03f80, 0xcb5dce80) /root/hg/go/src/pkg/runtime/windows/syscall.c:18 +0x43 syscall.(*LazyProc).Addr(0x10a03660, 0x0, 0x1) /root/hg/go/src/pkg/syscall/syscall_windows.go:139 +0xe4 syscall.GetSystemTimeAsFileTime(0x10a001f0, 0x10a001f0) /root/hg/go/src/pkg/syscall/zsyscall_windows_386.go:465 +0x2a syscall.Gettimeofday(0x2716c4, 0x0, 0xffffffff) /root/hg/go/src/pkg/syscall/syscall_windows.go:420 +0x44 os.Time(0x0, 0x0, 0x0, 0x0, 0x0, ...) /root/hg/go/src/pkg/os/time.go:15 +0x58 time.Nanoseconds(0x5062f4, 0x40e1c7) /root/hg/go/src/pkg/time/sys.go:22 +0x21 time.Sleep(0xd964b800, 0x45, 0x0, 0x0) /root/hg/go/src/pkg/time/sys.go:33 +0x21 main.main() /root/t2/test.go:19 +0x10b runtime.mainstart() /root/hg/go/src/pkg/runtime/386/asm.s:96 +0xf runtime.goexit() /root/hg/go/src/pkg/runtime/proc.c:246 I don't think your function is designed for that. Is it?
Sign in to reply to this message.
http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/thread.c File src/pkg/runtime/windows/thread.c (right): http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/thread.c... src/pkg/runtime/windows/thread.c:61: runtime·stdcall(runtime·DuplicateHandle, 7, (uintptr)-1, (uintptr)-2, (uintptr)-1, &m->thandle, (uintptr)0, (uintptr)0, (uintptr)2); On 2011/09/07 07:04:22, brainman wrote: > Please, use name consts here, or some comments for the rest of use <g>. What > handle is -2? GetCurrentThread() always returns (HANDLE)(-2). It is a "pseudohandle". http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/thread.c... src/pkg/runtime/windows/thread.c:352: r = (Context*)((uintptr)(&rbuf[15]) & ~15); On 2011/09/07 07:04:22, brainman wrote: > Again, please, say here that you are just aligning r. This came from my code as is, so I answer. GetThreadContext parameter must be 16-bytes aligned. In Microsoft C it looks like: typedef struct DECLSPEC_ALIGN(16) _CONTEXT { .... }; http://msdn.microsoft.com/en-us/library/ms679284(v=vs.85).aspx I suppose, this requirement is needed for storing XMM registers into CONTEXT, which have to be 16-byte aligned in memory. It is the same reason why we align the stack doing a syscall (oh, well, now we do in just by substracting 0x60 from SP, but try to substract inncent-looking 0x68 and it will eventually crash), Crypto API fuctions uses XMM on the stack and wants the stack pointer to be 16-bytes aligned. As we do not ask for XMM registers content here (only CONTEXT_CONTROL), the alignment may be unnecessary here. But this assumption has to be carefully tested.
Sign in to reply to this message.
On 2011/09/07 07:20:18, jp wrote: > > GetCurrentThread() always returns (HANDLE)(-2). > It is a "pseudohandle". I didn't know that. And I will forget tomorrow. Please, add a comment about it. Or use a MS const, if there is one. Something that I could refer to next time I will look at this code. > > ..., the > alignment may be unnecessary here. But this assumption has to be carefully > tested. No argument here. But I didn't see anyone aligning their data in C this way. Perhaps it is me, but comment here will stop me wandering. If you think that comment is not necessary, just ignore my request. Alex
Sign in to reply to this message.
http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/thread.c File src/pkg/runtime/windows/thread.c (right): http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/thread.c... src/pkg/runtime/windows/thread.c:374: static void *timer; Maybe runtime·lock() or another mutex to protect 'static void *timer;' and avoid the risk of creating two timers ? http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/thread.c... src/pkg/runtime/windows/thread.c:377: if(hz == 0) { if(timer != 0) { // old timer to be deleted even if hz != 0 http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/thread.c... src/pkg/runtime/windows/thread.c:380: } else { } if(hz != 0) {
Sign in to reply to this message.
On 2011/09/07 07:30:49, brainman wrote: > But I didn't see anyone aligning their data in C this way. http://stackoverflow.com/questions/4696543/getthreadcontext-fails-after-a-suc...
Sign in to reply to this message.
http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/amd64/sys.s File src/pkg/runtime/windows/amd64/sys.s (right): http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/amd64/sy... src/pkg/runtime/windows/amd64/sys.s:137: PUSHQ 32(BP) What is suppose to be in 32(BP) ? The parameter Windows ran the thread with ? It was in CX (by amd64 calling convention)
Sign in to reply to this message.
http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/386/sys.s File src/pkg/runtime/windows/386/sys.s (right): http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/386/sys.... src/pkg/runtime/windows/386/sys.s:121: SUBL $512, SP // space for M I suggest that you add uintptr end[0]; to the bottom of both struct M and struct G. Then you can use SUBL $m_end, SP and SUBL $g_end, SP to carve out the right amount of space.
Sign in to reply to this message.
PTAL: I've changed it so that a timer is run for each M, due to the following comment in setcpuprofilerate(): // Stop profiler on this cpu so that it is safe to lock prof. // if a profiling signal came in while we had prof locked, // it would deadlock. runtime·resetcpuprofiler(0); The race is that one M stops the timer and enters the prof lock. Another M starts the timer again, and the timer fires. Now no-one can enter sigprof() because the first M still has the prof lock. http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/386/sys.s File src/pkg/runtime/windows/386/sys.s (right): http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/386/sys.... src/pkg/runtime/windows/386/sys.s:102: CALL runtime·externalthreadhandler(SB) On 2011/09/07 00:26:24, jp wrote: > PUSHL $runtime·ctrlhandler1(SB) > JMP runtime·externalthreadhandler(SB) > ? We need to return to Windows eventually. http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/386/sys.... src/pkg/runtime/windows/386/sys.s:121: SUBL $512, SP // space for M On 2011/09/07 18:28:36, rsc wrote: > I suggest that you add > > uintptr end[0]; > > to the bottom of both struct M and struct G. > Then you can use > > SUBL $m_end, SP > > and > > SUBL $g_end, SP > > to carve out the right amount of space. Done. http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/386/sys.... src/pkg/runtime/windows/386/sys.s:145: MOVL 0(SP), CX On 2011/09/07 07:04:22, brainman wrote: > I don't think you have tested 386 version. > I think you just want RET here, otherwise it crashes after first call into this > function. Done. http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/amd64/sys.s File src/pkg/runtime/windows/amd64/sys.s (right): http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/amd64/sy... src/pkg/runtime/windows/amd64/sys.s:107: RET On 2011/09/07 00:26:24, jp wrote: > JMP runtime·externalthreadhandler(SB) > ? I'm not sure what you mean? We need to eventually return to Windows. http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/amd64/sy... src/pkg/runtime/windows/amd64/sys.s:137: PUSHQ 32(BP) On 2011/09/07 07:50:58, jp wrote: > What is suppose to be in 32(BP) ? > The parameter Windows ran the thread with ? > It was in CX (by amd64 calling convention) Thanks. I've added a spill in the caller just in case. http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/thread.c File src/pkg/runtime/windows/thread.c (right): http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/thread.c... src/pkg/runtime/windows/thread.c:366: if((uintptr)runtime·stdcall(runtime·ResumeThread, 1, m->thandle) == -1) On 2011/09/07 00:26:24, jp wrote: > ResumeThread may be called before runtime·dosigprof() to minimize the stop time. That would be unsafe. The snapshot includes a stack trace, so we don't want it to be running while we do that.
Sign in to reply to this message.
LGTM. Nice.
Sign in to reply to this message.
http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/amd64/sys.s File src/pkg/runtime/windows/amd64/sys.s (right): http://codereview.appspot.com/4960057/diff/4/src/pkg/runtime/windows/amd64/sy... src/pkg/runtime/windows/amd64/sys.s:107: RET On 2011/09/07 22:40:23, hector wrote: > On 2011/09/07 00:26:24, jp wrote: > > JMP runtime·externalthreadhandler(SB) > > ? > > I'm not sure what you mean? We need to eventually return to Windows. I mean runtime·externalthreadhandler(SB) to be not a C-compatible function (it is never called from C) but a common part of runtime·ctrlhandler(SB) and runtime·timerhandler(SB). Thus, RET in runtime·externalthreadhandler(SB) will return to Windows. TEXT runtime·ctrlhandler(SB),7,$8 MOVQ $runtime·ctrlhandler1(SB), CX JMP start_of_the_common_part TEXT runtime·timerhandler(SB),7,$8 MOVQ $runtime·timerhandler1(SB), CX start_of_the_common_part: MOVQ CX, 0(SP) //TEXT runtime·externalthreadhandler(SB),7,$0 Nevermind, this optimization will work only on amd64, there are different RETs on 386. http://codereview.appspot.com/4960057/diff/14003/src/pkg/runtime/windows/386/... File src/pkg/runtime/windows/386/sys.s (right): http://codereview.appspot.com/4960057/diff/14003/src/pkg/runtime/windows/386/... src/pkg/runtime/windows/386/sys.s:104: JMP CX looks like POPL CX // to any register RET 4 http://codereview.appspot.com/4960057/diff/14003/src/pkg/runtime/windows/386/... src/pkg/runtime/windows/386/sys.s:111: JMP CX looks like POPL CX // to any register RET 8
Sign in to reply to this message.
Can we hold off submitting this until I check that WT_EXECUTEINPERSISTENTTHREAD is not a better flag for CreateTimerQueueTimer? I suspect the snapshot is not cheap, for example see comment in pprof.go: // at more than about 500 Hz, and our processing of the // signal is not cheap (mostly getting the stack trace). http://codereview.appspot.com/4960057/diff/14003/src/pkg/runtime/windows/386/... File src/pkg/runtime/windows/386/sys.s (right): http://codereview.appspot.com/4960057/diff/14003/src/pkg/runtime/windows/386/... src/pkg/runtime/windows/386/sys.s:104: JMP CX On 2011/09/08 07:24:15, jp wrote: > looks like > POPL CX // to any register > RET 4 Unfortunately the assembler doesn't support RETN n instructions.
Sign in to reply to this message.
Ok I've uploaded a new diff, hopefully this'll be good enough. PTAL On 8 September 2011 11:40, <hectorchu@gmail.com> wrote: > Can we hold off submitting this until I check that > WT_EXECUTEINPERSISTENTTHREAD is not a better flag for > CreateTimerQueueTimer? I suspect the snapshot is not cheap, for example > see comment in pprof.go: > > // at more than about 500 Hz, and our processing of the > // signal is not cheap (mostly getting the stack trace). > > > http://codereview.appspot.com/4960057/diff/14003/src/pkg/runtime/windows/386/... > File src/pkg/runtime/windows/386/sys.s (right): > > http://codereview.appspot.com/4960057/diff/14003/src/pkg/runtime/windows/386/... > src/pkg/runtime/windows/386/sys.s:104: JMP CX > On 2011/09/08 07:24:15, jp wrote: >> >> looks like >> POPL CX // to any register >> RET 4 > > Unfortunately the assembler doesn't support RETN n instructions. > > http://codereview.appspot.com/4960057/ >
Sign in to reply to this message.
On 2011/09/08 22:54:07, hector wrote: > Ok I've uploaded a new diff, hopefully this'll be good enough. > Would you mind to explain why you decided to use CreateTimerQueue in this last change. I suspect, it is performance related, but I couldn't find any specific details. I did find http://www.virtualdub.org/blog/pivot/entry.php?id=272 on the other hand. Perhaps, it is simpler just start a new thread ourselves and use Sleep() with precise time calculations. Alex
Sign in to reply to this message.
On Sep 9, 2011 2:19 AM, <alex.brainman@gmail.com> wrote: > > On 2011/09/08 22:54:07, hector wrote: >> >> Ok I've uploaded a new diff, hopefully this'll be good enough. > > > > Would you mind to explain why you decided to use CreateTimerQueue in > this last change. I suspect, it is performance related, but I couldn't > find any specific details. Timer queues are used to logically group related timers. I don't want to allow other dlls to mix their timers with our ones, and I suspect Windows creates a waitable timer for each queue. > I did find http://www.virtualdub.org/blog/pivot/entry.php?id=272 on the > other hand. Perhaps, it is simpler just start a new thread ourselves and > use Sleep() with precise time calculations. I think using the MM timers requires more coding effort for no discernible benefit to timing accuracy. > Alex > > > http://codereview.appspot.com/4960057/
Sign in to reply to this message.
LGTM. Anyone else wants to comment?
Sign in to reply to this message.
http://codereview.appspot.com/4960057/diff/24001/src/pkg/runtime/windows/thre... File src/pkg/runtime/windows/thread.c (right): http://codereview.appspot.com/4960057/diff/24001/src/pkg/runtime/windows/thre... src/pkg/runtime/windows/thread.c:361: if((uintptr)runtime·stdcall(runtime·SuspendThread, 1, m->hthread) == -1) The return type is int32. http://codereview.appspot.com/4960057/diff/24001/src/pkg/runtime/windows/thre... src/pkg/runtime/windows/thread.c:367: if((uintptr)runtime·stdcall(runtime·GetThreadContext, 2, m->hthread, r) == 0) The return type is int32. http://codereview.appspot.com/4960057/diff/24001/src/pkg/runtime/windows/thre... src/pkg/runtime/windows/thread.c:371: if((uintptr)runtime·stdcall(runtime·ResumeThread, 1, m->hthread) == -1) The return type is int32. http://codereview.appspot.com/4960057/diff/24001/src/pkg/runtime/windows/thre... src/pkg/runtime/windows/thread.c:388: ok = (uintptr)runtime·stdcall(runtime·CreateTimerQueueTimer, 7, The return type is int32.
Sign in to reply to this message.
http://codereview.appspot.com/4960057/diff/24001/src/pkg/runtime/windows/thre... File src/pkg/runtime/windows/thread.c (right): http://codereview.appspot.com/4960057/diff/24001/src/pkg/runtime/windows/thre... src/pkg/runtime/windows/thread.c:360: runtime·lock(&m->proflock); Why we need this lock?
Sign in to reply to this message.
http://codereview.appspot.com/4960057/diff/24001/src/pkg/runtime/windows/thre... File src/pkg/runtime/windows/thread.c (right): http://codereview.appspot.com/4960057/diff/24001/src/pkg/runtime/windows/thre... src/pkg/runtime/windows/thread.c:360: runtime·lock(&m->proflock); On 2011/09/09 10:57:30, dvyukov wrote: > Why we need this lock? In case two worker threads run this function on the same m concurrently, which is possible because the timer thread does not wait for a previous callback to finish before dispatching the next tick. In that case the first thread could resume m just after the second thread suspends it.
Sign in to reply to this message.
On 2011/09/07 22:40:23, hector wrote: > PTAL: > > I've changed it so that a timer is run for each M, due to the following comment > in setcpuprofilerate(): > > // Stop profiler on this cpu so that it is safe to lock prof. > // if a profiling signal came in while we had prof locked, > // it would deadlock. > runtime·resetcpuprofiler(0); > > The race is that one M stops the timer and enters the prof lock. Another M > starts the timer again, and the timer fires. Now no-one can enter sigprof() > because the first M still has the prof lock. I do not see a problem here. The first M will release prof lock eventually, and the program will continue without problems. The problem is with UNIX *signals*. Namely, an M holds prof lock, and *it* receives prof signal, which tried to reacquire prof lock recursively, the lock is not recursive, so the program deadlocks. Win32 timers fire in separate threads, there is no such problem. As far as I see one time will do. Does timer approach provides sufficient preciseness? I would consider the following approach. Create single dedicated high priority thread. The thread sleeps for the required duration, then wakes up, iterates over all Ms and obtains their contexts.
Sign in to reply to this message.
On 2011/09/09 11:08:56, hector wrote: > http://codereview.appspot.com/4960057/diff/24001/src/pkg/runtime/windows/thre... > File src/pkg/runtime/windows/thread.c (right): > > http://codereview.appspot.com/4960057/diff/24001/src/pkg/runtime/windows/thre... > src/pkg/runtime/windows/thread.c:360: runtime·lock(&m->proflock); > On 2011/09/09 10:57:30, dvyukov wrote: > > Why we need this lock? > > In case two worker threads run this function on the same m concurrently, which > is possible because the timer thread does not wait for a previous callback to > finish before dispatching the next tick. In that case the first thread could > resume m just after the second thread suspends it. As far as I see, it is not a problem. Suspend/Resume are recursive: If the function succeeds, the return value is the thread's previous suspend count... Each thread has a suspend count. If the suspend count is greater than zero, the thread is suspended; otherwise, the thread is not suspended and is eligible for execution.
Sign in to reply to this message.
On 2011/09/09 11:15:02, dvyukov wrote: > Does timer approach provides sufficient preciseness? Also it would have sense to set up timer resolution using timeBeginPeriod() "This function affects a global Windows setting. Windows uses the lowest value (that is, highest resolution) requested by any process. Setting a higher resolution can improve the accuracy of time-out intervals in wait functions. However, it can also reduce overall system performance, because the thread scheduler switches tasks more often."
Sign in to reply to this message.
On Fri, Sep 9, 2011 at 3:36 PM, <jp@webmaster.ms> wrote: > On 2011/09/09 11:15:02, dvyukov wrote: > > Does timer approach provides sufficient preciseness? >> > > Also it would have sense to set up timer resolution using > timeBeginPeriod() > > "This function affects a global Windows setting. Windows uses the lowest > value (that is, highest resolution) requested by any process. Setting a > higher resolution can improve the accuracy of time-out intervals in wait > functions. However, it can also reduce overall system performance, > because the thread scheduler switches tasks more often." > > Default Win32 scheduling interval seems to be 15ms, if we want more we need to increase frequency. To elaborate on the preciseness issue. If we have only 1 hardware thread, and the program has some idle intervals (periods when it does not occupy the M), prof timers are likely to be scheduled exactly during that intervals, and so it will observe that the M is always blocked in scheduler. Higher priority of the profiling thread may solve the potential issue. However all that requires further investigation.
Sign in to reply to this message.
On 2011/09/09 11:15:02, dvyukov wrote: > I do not see a problem here. The first M will release prof lock eventually, and > the program will continue without problems. I don't believe this is true. Here's the sequence again, assuming one timer for all threads: 1. m1 enters setcpuprofilerate, calls resetcpuprofiler(0) to stop the timer. 2. m2 is already in setcpuprofilerate and calls resetcpuprofiler(100), starting the timer. 3. m1 calls runtime·lock(&prof), locking prof. 4. The timer fires and all m's are suspended. It calls runtime·sigprof for each m. 5. In runtime·sigprof it tries to runtime·lock(&prof), but deadlocks because m1 is suspended and still holds the prof lock. > Does timer approach provides sufficient preciseness? I believe timer queues are sufficiently granular and timely. We don't require real-time precision, and I think Linux itself is not sufficiently granular, which I gathered from this comment in runtime/pprof/pprof.go: // The runtime routines allow a variable profiling rate, // but in practice operating systems cannot trigger signals // at more than about 500 Hz, and our processing of the // signal is not cheap (mostly getting the stack trace). // 100 Hz is a reasonable choice: it is frequent enough to // produce useful data, rare enough not to bog down the // system, and a nice round number to make it easy to // convert sample counts to seconds. Instead of requiring // each client to specify the frequency, we hard code it. > I would consider the following approach. > Create single dedicated high priority thread. The thread sleeps for the required > duration, then wakes up, iterates over all Ms and obtains their contexts. This implies one timer for all threads. I have demonstrated above that this can deadlock. > As far as I see, it is not a problem. Suspend/Resume are recursive: > If the function succeeds, the return value is the thread's previous suspend > count... > Each thread has a suspend count. If the suspend count is greater than zero, the > thread is suspended; otherwise, the thread is not suspended and is eligible for > execution. The problem with the scenario I outlined in the previous email is that if the first thread resumes m while the second thread just suspended it, then the second thread will try to read g's stack while m runs g.
Sign in to reply to this message.
On Fri, Sep 9, 2011 at 4:02 PM, <hectorchu@gmail.com> wrote: > On 2011/09/09 11:15:02, dvyukov wrote: > >> I do not see a problem here. The first M will release prof lock >> > eventually, and > >> the program will continue without problems. >> > > I don't believe this is true. Here's the sequence again, assuming one > timer for all threads: > 1. m1 enters setcpuprofilerate, calls resetcpuprofiler(0) to stop the > timer. > 2. m2 is already in setcpuprofilerate and calls resetcpuprofiler(100), > starting the timer. > 3. m1 calls runtime·lock(&prof), locking prof. > 4. The timer fires and all m's are suspended. It calls runtime·sigprof > for each m. > 5. In runtime·sigprof it tries to runtime·lock(&prof), but deadlocks > because m1 is suspended and still holds the prof lock. Youre right. But then can't the following scenario happen? 1. m1 calls resetcpuprofiler(100) 2. prof timer callback for m1 started executing 3. m1 calls resetcpuprofiler(0) 4. m1 locks prof lock 5. the callback suspends m1 6. the callback tries to lock prof lock 7. deadlock
Sign in to reply to this message.
On Fri, Sep 9, 2011 at 4:02 PM, <hectorchu@gmail.com> wrote: > > Does timer approach provides sufficient preciseness? >> > > I believe timer queues are sufficiently granular and timely. We don't > require real-time precision, and I think Linux itself is not > sufficiently granular, which I gathered from this comment in > runtime/pprof/pprof.go: > I mostly meant the following scenario: --- To elaborate on the preciseness issue. If we have only 1 hardware thread, and the program has some idle intervals (periods when it does not occupy the M), prof timers are likely to be scheduled exactly during that intervals, and so it will observe that the M is always blocked in scheduler. Higher priority of the profiling thread may solve the potential issue. However all that requires further investigation. ---
Sign in to reply to this message.
On Fri, Sep 9, 2011 at 4:02 PM, <hectorchu@gmail.com> wrote: > >> As far as I see, it is not a problem. Suspend/Resume are recursive: >> If the function succeeds, the return value is the thread's previous >> > suspend > >> count... >> Each thread has a suspend count. If the suspend count is greater than >> > zero, the > >> thread is suspended; otherwise, the thread is not suspended and is >> > eligible for > >> execution. >> > > The problem with the scenario I outlined in the previous email is that > if the first thread resumes m while the second thread just suspended it, > then, it will just decrease thread's suspension count from 2 to 1, thread won't be running. > then the second thread will try to read g's stack while m runs g. > >
Sign in to reply to this message.
On 2011/09/09 12:18:53, dvyukov wrote: > Youre right. But then can't the following scenario happen? > 1. m1 calls resetcpuprofiler(100) > 2. prof timer callback for m1 started executing > 3. m1 calls resetcpuprofiler(0) After this point, the timer thread will cancel the timer. It will also wait for any callbacks to finish. > 4. m1 locks prof lock > 5. the callback suspends m1 Now this isn't possible because the callback will not be running. > To elaborate on the preciseness issue. If we have only 1 hardware thread, > and the program has some idle intervals (periods when it does not occupy the > M), prof timers are likely to be scheduled exactly during that intervals, > and so it will observe that the M is always blocked in scheduler. This assumes the worker threads run at a lower priority? Here is how I understand it: 1. Timer thread runs a waitable timer for each timer queue and waits on it. I assume timer thread is at least normal priority. 2. When the timer is signalled, the timer thread will wake up and then queue an APC to a ready worker thread. 3. The worker thread dequeues the APC which will end up calling our function. Hence I do not see the reason why the worker threads are more likely to be scheduled during idle periods, unless they are running with lower priority. > then, it will just decrease thread's suspension count from 2 to 1, thread > won't be running. You are correct, I will remove the prof lock from M.
Sign in to reply to this message.
On Fri, Sep 9, 2011 at 5:01 PM, <hectorchu@gmail.com> wrote: > On 2011/09/09 12:18:53, dvyukov wrote: > >> Youre right. But then can't the following scenario happen? >> 1. m1 calls resetcpuprofiler(100) >> 2. prof timer callback for m1 started executing >> 3. m1 calls resetcpuprofiler(0) >> > > After this point, the timer thread will cancel the timer. It will also > wait for any callbacks to finish. Youre right again. Don't know why I assumed that timer callbacks can still be running. > > > 4. m1 locks prof lock >> 5. the callback suspends m1 >> > > Now this isn't possible because the callback will not be running. > > > To elaborate on the preciseness issue. If we have only 1 hardware >> > thread, > >> and the program has some idle intervals (periods when it does not >> > occupy the > >> M), prof timers are likely to be scheduled exactly during that >> > intervals, > >> and so it will observe that the M is always blocked in scheduler. >> > > This assumes the worker threads run at a lower priority? > Here is how I understand it: > 1. Timer thread runs a waitable timer for each timer queue and waits on > it. I assume timer thread is at least normal priority. > 2. When the timer is signalled, the timer thread will wake up and then > queue an APC to a ready worker thread. > 3. The worker thread dequeues the APC which will end up calling our > function. > Yeah, but all that can happen only when Go thread is not running. That is, it is either idle/blocked or preempted. Now the question is will OS preempt a thread of the same priority to run another thread? May you please profile the following program and post the results: func TestFoobar(t *testing.T) { for i := 0; i < 3e2; i++ { start := time. Nanoseconds() for time.Nanoseconds() - start < 3e6 { for j := 0; j < 1e3; j++ { func() { }() } for j := 0; j < 1e3; j++ { func() { }() } } time.Sleep(10e6) } } Linux profiler says: 45 50.6% 50.6% 89 100.0% _xtest_.TestFoobar 23 25.8% 76.4% 23 25.8% _xtest_._func_009 21 23.6% 100.0% 21 23.6% _xtest_._func_008 > Hence I do not see the reason why the worker threads are more likely to > be scheduled during idle periods, unless they are running with lower > priority. > >
Sign in to reply to this message.
On Fri, Sep 9, 2011 at 5:48 PM, Dmitry Vyukov <dvyukov@google.com> wrote: > On Fri, Sep 9, 2011 at 5:01 PM, <hectorchu@gmail.com> wrote: > >> On 2011/09/09 12:18:53, dvyukov wrote: >> >>> Youre right. But then can't the following scenario happen? >>> 1. m1 calls resetcpuprofiler(100) >>> 2. prof timer callback for m1 started executing >>> 3. m1 calls resetcpuprofiler(0) >>> >> >> After this point, the timer thread will cancel the timer. It will also >> wait for any callbacks to finish. > > > Youre right again. Don't know why I assumed that timer callbacks can still > be running. > > >> >> >> 4. m1 locks prof lock >>> 5. the callback suspends m1 >>> >> >> Now this isn't possible because the callback will not be running. >> >> >> To elaborate on the preciseness issue. If we have only 1 hardware >>> >> thread, >> >>> and the program has some idle intervals (periods when it does not >>> >> occupy the >> >>> M), prof timers are likely to be scheduled exactly during that >>> >> intervals, >> >>> and so it will observe that the M is always blocked in scheduler. >>> >> >> This assumes the worker threads run at a lower priority? >> Here is how I understand it: >> 1. Timer thread runs a waitable timer for each timer queue and waits on >> it. I assume timer thread is at least normal priority. >> 2. When the timer is signalled, the timer thread will wake up and then >> queue an APC to a ready worker thread. >> 3. The worker thread dequeues the APC which will end up calling our >> function. >> > > Yeah, but all that can happen only when Go thread is not running. That is, > it is either idle/blocked or preempted. Now the question is will OS preempt > a thread of the same priority to run another thread? > > May you please profile the following program and post the results: > > func TestFoobar(t *testing.T) { > for i := 0; i < 3e2; i++ { > start := time. Nanoseconds() > for time.Nanoseconds() - start < 3e6 { > for j := 0; j < 1e3; j++ { > func() { > }() > } > for j := 0; j < 1e3; j++ { > func() { > }() > } > } > time.Sleep(10e6) > } > } > > Linux profiler says: > 45 50.6% 50.6% 89 100.0% _xtest_.TestFoobar > 23 25.8% 76.4% 23 25.8% _xtest_._func_009 > 21 23.6% 100.0% 21 23.6% _xtest_._func_008 > > ... and it should be run on a singlecore machine. Or alternatively you may put SetProcessAffinityMask() into osinit(), or use TaskManager to restrict affinity to single cpu.
Sign in to reply to this message.
On 2011/09/09 13:01:04, hector wrote: > > ... I will remove the prof lock from M. Do you still plan to do that? Alex
Sign in to reply to this message.
I didn't quite get the same consistent results running Dmitry's test so I started looking at running our own thread and sleeping for appropriate intervals. I'll let you all know how that goes once I've finished evaluating it. On 11 September 2011 09:49, <alex.brainman@gmail.com> wrote: > On 2011/09/09 13:01:04, hector wrote: > >> ... I will remove the prof lock from M. > > Do you still plan to do that? > > Alex > > http://codereview.appspot.com/4960057/ >
Sign in to reply to this message.
Ok I've determined that the most accurate way to run the timer is to use a single thread and a waitable timer, with thread priority set to highest and timer resolution set to 1ms. Please take another look. On 11 September 2011 09:59, Hector Chu <hectorchu@gmail.com> wrote: > I didn't quite get the same consistent results running Dmitry's test > so I started looking at running our own thread and sleeping for > appropriate intervals. I'll let you all know how that goes once I've > finished evaluating it. > > On 11 September 2011 09:49, <alex.brainman@gmail.com> wrote: >> On 2011/09/09 13:01:04, hector wrote: >> >>> ... I will remove the prof lock from M. >> >> Do you still plan to do that? >> >> Alex >> >> http://codereview.appspot.com/4960057/ >> >
Sign in to reply to this message.
LGTM, but I could not test it codereview issue 4960057 is out of date: patch and recent changes conflict (805742b53bf3->af0ac80bbb92) http://codereview.appspot.com/4960057/diff/30002/src/pkg/runtime/windows/thre... File src/pkg/runtime/windows/thread.c (right): http://codereview.appspot.com/4960057/diff/30002/src/pkg/runtime/windows/thre... src/pkg/runtime/windows/thread.c:70: runtime·stdcall(runtime·timeBeginPeriod, 1, (uintptr)1); I would delay it until you really need it. Not every program will by profiled.
Sign in to reply to this message.
Synced with latest, PTAL. http://codereview.appspot.com/4960057/diff/30002/src/pkg/runtime/windows/thre... File src/pkg/runtime/windows/thread.c (right): http://codereview.appspot.com/4960057/diff/30002/src/pkg/runtime/windows/thre... src/pkg/runtime/windows/thread.c:70: runtime·stdcall(runtime·timeBeginPeriod, 1, (uintptr)1); On 2011/09/15 02:45:40, brainman wrote: > I would delay it until you really need it. Not every program will by profiled. Yes, but the tick rate will also affect the accuracy elsewhere, for example time.Sleep(10e6) actually sleeps for 15ms without this line.
Sign in to reply to this message.
On 2011/09/15 06:17:30, hector wrote: > Synced with latest, PTAL. package main import ( "fmt" "log" "os" "time" "runtime/pprof" ) func main() { f, err := os.Create("a.prof") if err != nil { log.Fatal(err) } pprof.StartCPUProfile(f) defer pprof.StopCPUProfile() time.Sleep(3e9) fmt.Printf("All good\n") } prints: panic during panic windows/386 > Yes, but the tick rate will also affect the accuracy elsewhere, for example > time.Sleep(10e6) actually sleeps for 15ms without this line. But, according to doco, it is "global" setting. Should we change it, unless we have to? Alex
Sign in to reply to this message.
I am worried about this approach. This is using real time ticks to do the profiling, but if you do that you end up overcounting things that are just sitting there sleeping. This is the same thing 6prof does, and it does not produce terribly useful aggregate results. If you profile a mostly idle program for 30 seconds at 10 Hz, you should get significantly fewer than 300 samples. The profiling on Linux uses an execution-based timer: every time the whole process uses up another 10ms of CPU time (or whatever it is), the thread that was responsible for crossing that line gets a signal and causes a single profiling record. This way the executing threads get profiled - and only in proportion to how much they execute - and the ones that just sit there are ignored. I looked around the MSDN docs and I agree that there's no apparent way to do what Linux does. Maybe we can adjust the current code if we make it not include goroutines that are not executing. The easiest way to do that appears to be to refine the condition in profilem to if(gp != nil && gp != mp->g0 && gp->status != Gsyscall) Russ
Sign in to reply to this message.
On 15 September 2011 07:49, <alex.brainman@gmail.com> wrote: > panic during panic > windows/386 Thanks. I found and fixed a couple of issues, you shouldn't see this panic anymore. > But, according to doco, it is "global" setting. Should we change it, > unless we have to? I think we should, because programs written on other OSes might change their timing when we run them on Windows. The timing error can be significant if time.Sleep() is called in a loop. We shouldn't make people insert a syscall(timeBeginPeriod) if they want better accuracy. iTunes also calls this function, and programs using Sleep can change their timing depending on whether iTunes is running or not. On 15 September 2011 16:43, Russ Cox <rsc@golang.org> wrote: > Maybe we can adjust the current code if we make it not include > goroutines that are not executing. The easiest way to do that > appears to be to refine the condition in profilem to > if(gp != nil && gp != mp->g0 && gp->status != Gsyscall) Thanks, done.
Sign in to reply to this message.
I think I'm stammering, but it LGTM. http://codereview.appspot.com/4960057/diff/39002/src/pkg/runtime/windows/thre... File src/pkg/runtime/windows/thread.c (right): http://codereview.appspot.com/4960057/diff/39002/src/pkg/runtime/windows/thre... src/pkg/runtime/windows/thread.c:391: thread = runtime·atomicloadp(&mp->thread); Why do you prefer that to starting new thread suspended, like you had it before.
Sign in to reply to this message.
On Sep 16, 2011 3:56 AM, <alex.brainman@gmail.com> wrote: > > I think I'm stammering, but it LGTM. Heh :). It almost was good enough until Dmitry came along. Lucky for us he is busy with other work now ;). > http://codereview.appspot.com/4960057/diff/39002/src/pkg/runtime/windows/ > > File src/pkg/runtime/windows/thread.c (right): > > http://codereview.appspot.com/4960057/diff/39002/src/pkg/runtime/windows/thre... > src/pkg/runtime/windows/thread.c:391: thread = > runtime·atomicloadp(&mp->thread); > Why do you prefer that to starting new thread suspended, like you had it > before. Because I realised that starting it suspended wouldn't have stopped the race of the profile thread from reading mp->thread, because the new M is added to allm before newosproc is called. > http://codereview.appspot.com/4960057/
Sign in to reply to this message.
LGTM
Sign in to reply to this message.
On 2011/09/16 06:55:08, hector wrote: > On Sep 16, 2011 3:56 AM, <mailto:alex.brainman@gmail.com> wrote: > > > > I think I'm stammering, but it LGTM. > > Heh :). It almost was good enough until Dmitry came along. Lucky for us he > is busy with other work now ;). :) LGTM It's really unpleasant when profiling results are skewed. What I currently observe on Mac is well... sometimes I can make *some* sense out of it and sometimes I can't.
Sign in to reply to this message.
On Fri, Sep 16, 2011 at 14:09, <dvyukov@google.com> wrote: > It's really unpleasant when profiling results are skewed. What I > currently observe on Mac is well... sometimes I can make *some* sense > out of it and sometimes I can't. pprof is broken on the Mac, full stop. something like what this CL does might help it start working better. russ
Sign in to reply to this message.
On Fri, Sep 16, 2011 at 1:38 PM, Russ Cox <rsc@golang.org> wrote: > On Fri, Sep 16, 2011 at 14:09, <dvyukov@google.com> wrote: > > It's really unpleasant when profiling results are skewed. What I > > currently observe on Mac is well... sometimes I can make *some* sense > > out of it and sometimes I can't. > > pprof is broken on the Mac, full stop. > something like what this CL does might > help it start working better. > > Yeah, but with GOMAXPROCS>1 it produces results that I can make some sense of... sometimes. So for me as a user it's essentially the same as having pprof that is officially declared as working but produces skewed results.
Sign in to reply to this message.
On Fri, Sep 16, 2011 at 16:51, Dmitry Vyukov <dvyukov@google.com> wrote: > Yeah, but with GOMAXPROCS>1 it produces results that I can make some sense > of... sometimes. So for me as a user it's essentially the same as having > pprof that is officially declared as working but produces skewed results. I official declare pprof on OS X not working. :-) It's Apple's fault, too. Russ
Sign in to reply to this message.
*** Submitted as http://code.google.com/p/go/source/detail?r=9c5c0cbadb4d *** runtime: implement pprof support for windows Credit to jp for proof of concept. R=alex.brainman, jp, rsc, dvyukov CC=golang-dev http://codereview.appspot.com/4960057 Committer: Alex Brainman <alex.brainman@gmail.com>
Sign in to reply to this message.
|