| 
 | 
 | 
| Created: 13 years, 2 months ago by Sebastien Paolacci Modified: 13 years, 1 month ago Reviewers: CC: rsc, dvyukov, dave_cheney.net, mikio, bradfitz, remyoudompheng, golang-dev Visibility: Public. | Descriptionnet: spread fd over several pollservers.
Lighten contention without preventing further improvements on pollservers.
Connections are spread over Min(GOMAXPROCS, NumCPU, 8) pollserver instances.
Median of 10 runs, 4 cores @ 3.4GHz, amd/linux-3.2:
BenchmarkTCPOneShot                171917 ns/op   175194 ns/op      1.91%
BenchmarkTCPOneShot-2              101413 ns/op   109462 ns/op      7.94%
BenchmarkTCPOneShot-4               91796 ns/op    35712 ns/op    -61.10%
BenchmarkTCPOneShot-6               90938 ns/op    30607 ns/op    -66.34%
BenchmarkTCPOneShot-8               90374 ns/op    29150 ns/op    -67.75%
BenchmarkTCPOneShot-16             101089 ns/op   111526 ns/op     10.32%
BenchmarkTCPOneShotTimeout         174986 ns/op   178606 ns/op      2.07%
BenchmarkTCPOneShotTimeout-2       101585 ns/op   110678 ns/op      8.95%
BenchmarkTCPOneShotTimeout-4        91547 ns/op    35931 ns/op    -60.75%
BenchmarkTCPOneShotTimeout-6        91496 ns/op    31019 ns/op    -66.10%
BenchmarkTCPOneShotTimeout-8        90670 ns/op    29531 ns/op    -67.43%
BenchmarkTCPOneShotTimeout-16      101013 ns/op   106026 ns/op      4.96%
BenchmarkTCPPersistent              51731 ns/op    53324 ns/op      3.08%
BenchmarkTCPPersistent-2            32888 ns/op    30678 ns/op     -6.72%
BenchmarkTCPPersistent-4            25751 ns/op    15595 ns/op    -39.44%
BenchmarkTCPPersistent-6            26737 ns/op     9805 ns/op    -63.33%
BenchmarkTCPPersistent-8            26850 ns/op     9730 ns/op    -63.76%
BenchmarkTCPPersistent-16          104449 ns/op   102838 ns/op     -1.54%
BenchmarkTCPPersistentTimeout       51806 ns/op    53281 ns/op      2.85%
BenchmarkTCPPersistentTimeout-2     32956 ns/op    30895 ns/op     -6.25%
BenchmarkTCPPersistentTimeout-4     25994 ns/op    18111 ns/op    -30.33%
BenchmarkTCPPersistentTimeout-6     26679 ns/op     9846 ns/op    -63.09%
BenchmarkTCPPersistentTimeout-8     26810 ns/op     9727 ns/op    -63.72%
BenchmarkTCPPersistentTimeout-16   101652 ns/op   104410 ns/op      2.71%
   Patch Set 1 #Patch Set 2 : diff -r 47e3b5cdff68 https://code.google.com/p/go/ #
      Total comments: 1
      
     Patch Set 3 : diff -r 46e193d130ce https://code.google.com/p/go/ #Patch Set 4 : diff -r ee8276dbea49 https://code.google.com/p/go/ #
      Total comments: 2
      
     Patch Set 5 : diff -r ee8276dbea49 https://code.google.com/p/go/ #Patch Set 6 : diff -r 979c5e5b1308 https://code.google.com/p/go/ #
      Total comments: 5
      
     Patch Set 7 : diff -r 979c5e5b1308 https://code.google.com/p/go/ #Patch Set 8 : diff -r cfbcf8176d26 https://code.google.com/p/go/ #
 MessagesTotal messages: 37 
 PTAL. As already observed by Dmitry, numbers really get crazy (and stable) for any cpu number greater than 8. I didn't investigate around the "Persistent*" variants being slowed-down for cpu=1, it just seems counter intuitive at first. Sebastien 
            
              Sign in to reply to this message.
            
           
 This looks nice, thanks. Is the speedup correlated with number of cores? Should we use runtime.NumCPUs() as the number of poll servers instead of a fixed constant? Dmitriy? http://codereview.appspot.com/6496054/diff/1002/src/pkg/net/fd.go File src/pkg/net/fd.go (right): http://codereview.appspot.com/6496054/diff/1002/src/pkg/net/fd.go#newcode307 src/pkg/net/fd.go:307: if err = pollservers[fd.sysfd%pollN].WaitWrite(fd); err != nil { This calculation seems like it might change over time. Could you please add func (fd *netFD) pollServer() *pollServer { return pollservers[fd.sysfd % pollN] } and then use fd.pollServer().WaitWrite(fd) etc, so that the assignment detail is in just one place? 
            
              Sign in to reply to this message.
            
           
 I'm -1 on this change as it stands. Spreading the fds over a number of pollservers will reduce the latency between the fd becoming ready and the caller being signaled, but I think it is overkill to spawn N pollservers in the case of a simple program that just wants to do a http get, or respond to a few tcp requests. Would it be possible to make the process of spawning (and reaping pollservers) sensitive to the load on the net package ? On Tue, Sep 11, 2012 at 11:22 AM, <rsc@golang.org> wrote: > This looks nice, thanks. Is the speedup correlated with number of cores? > Should we use runtime.NumCPUs() as the number of poll servers instead of > a fixed constant? Dmitriy? > > > > http://codereview.appspot.com/6496054/diff/1002/src/pkg/net/fd.go > File src/pkg/net/fd.go (right): > > http://codereview.appspot.com/6496054/diff/1002/src/pkg/net/fd.go#newcode307 > src/pkg/net/fd.go:307: if err = > pollservers[fd.sysfd%pollN].WaitWrite(fd); err != nil { > This calculation seems like it might change over time. Could you please > add > > > func (fd *netFD) pollServer() *pollServer { > return pollservers[fd.sysfd % pollN] > } > > and then use fd.pollServer().WaitWrite(fd) etc, so that the assignment > detail is in just one place? > > http://codereview.appspot.com/6496054/ 
            
              Sign in to reply to this message.
            
           
 It would probably be okay to create poll servers on demand up to the max? 
            
              Sign in to reply to this message.
            
           
 SGTM On Tue, Sep 11, 2012 at 12:58 PM, Russ Cox <rsc@golang.org> wrote: > It would probably be okay to create poll servers on demand up to the max? 
            
              Sign in to reply to this message.
            
           
 On 2012/08/29 06:53:20, Sebastien Paolacci wrote: > PTAL. > > As already observed by Dmitry, numbers really get crazy (and stable) for any cpu > number greater than 8. Does anybody have an explanation? It seems to correlate with number of connections. Is it a kernel issue? > I didn't investigate around the "Persistent*" variants being slowed-down for > cpu=1, it just seems counter intuitive at first. I think it has to do with the fact that now you have 8 additional polling goroutines. It must go away once you limit number of poll servers. 
            
              Sign in to reply to this message.
            
           
 On 2012/09/11 01:22:15, rsc wrote: > This looks nice, thanks. Is the speedup correlated with number of cores? Should > we use runtime.NumCPUs() as the number of poll servers instead of a fixed > constant? Dmitriy? I think you are right. Because on a 64-way machine one will have the same contention as he has now on an 8-way machine. It should be limited by min(GOMAXPROCS, NumCPU), because it makes no sense to have 8 poll servers with GOMAXPROCS=1 (I believe that is what causes slowdown on BenchmarkTCPPersistent-1). 
            
              Sign in to reply to this message.
            
           
 On 2012/09/11 01:22:15, rsc wrote: > This looks nice, thanks. Is the speedup correlated with number of cores? It doesn't seems to be. On a 4 cores @ 3.4GHz, amd/linux-3.2.23, I have BenchmarkTCPOneShot 173950 ns/op 179293 ns/op 3.07% BenchmarkTCPOneShot-2 102390 ns/op 107469 ns/op 4.96% BenchmarkTCPOneShot-4 92847 ns/op 35918 ns/op -61.31% BenchmarkTCPOneShot-6 92034 ns/op 29586 ns/op -67.85% BenchmarkTCPOneShot-8 91204 ns/op 28002 ns/op -69.30% BenchmarkTCPOneShot-16 101174 ns/op 122650 ns/op 21.23% BenchmarkTCPOneShotTimeout 177841 ns/op 182579 ns/op 2.66% BenchmarkTCPOneShotTimeout-2 103251 ns/op 109657 ns/op 6.20% BenchmarkTCPOneShotTimeout-4 93524 ns/op 36019 ns/op -61.49% BenchmarkTCPOneShotTimeout-6 93358 ns/op 30165 ns/op -67.69% BenchmarkTCPOneShotTimeout-8 92107 ns/op 28542 ns/op -69.01% BenchmarkTCPOneShotTimeout-16 106491 ns/op 122996 ns/op 15.50% BenchmarkTCPPersistent 52489 ns/op 65834 ns/op 25.42% BenchmarkTCPPersistent-2 33433 ns/op 34018 ns/op 1.75% BenchmarkTCPPersistent-4 26408 ns/op 14033 ns/op -46.86% BenchmarkTCPPersistent-6 28270 ns/op 9917 ns/op -64.92% BenchmarkTCPPersistent-8 27838 ns/op 9671 ns/op -65.26% BenchmarkTCPPersistent-16 110090 ns/op 102760 ns/op -6.66% BenchmarkTCPPersistentTimeout 52612 ns/op 65356 ns/op 24.22% BenchmarkTCPPersistentTimeout-2 33480 ns/op 33967 ns/op 1.45% BenchmarkTCPPersistentTimeout-4 26460 ns/op 11179 ns/op -57.75% BenchmarkTCPPersistentTimeout-6 28208 ns/op 9916 ns/op -64.85% BenchmarkTCPPersistentTimeout-8 27660 ns/op 9696 ns/op -64.95% BenchmarkTCPPersistentTimeout-16 103784 ns/op 103534 ns/op -0.24% > Should > we use runtime.NumCPUs() as the number of poll servers instead of a fixed > constant? Dmitriy? That what I used first, but on a 12 core box.. argh 12 is just a weird number. More seriously, there's currently no scalabilty beyond 8 cores, so I arbitrary cut here. I would actually vote for min(NumCPU, 8), and then increase when it will make sense. > http://codereview.appspot.com/6496054/diff/1002/src/pkg/net/fd.go > File src/pkg/net/fd.go (right): > > http://codereview.appspot.com/6496054/diff/1002/src/pkg/net/fd.go#newcode307 > src/pkg/net/fd.go:307: if err = pollservers[fd.sysfd%pollN].WaitWrite(fd); err > != nil { > This calculation seems like it might change over time. Could you please add > > func (fd *netFD) pollServer() *pollServer { > return pollservers[fd.sysfd % pollN] > } > > and then use fd.pollServer().WaitWrite(fd) etc, so that the assignment detail is > in just one place? Done. 
            
              Sign in to reply to this message.
            
           
 On 2012/09/11 02:58:02, rsc wrote: > It would probably be okay to create poll servers on demand up to the max? A pollServer, despite its name, might not be a so heavy beast. I'll try to measure it tomorrow (out of curiosity), and go for a lazy instantiation unless someone really advocates the opposite. Sebastien 
            
              Sign in to reply to this message.
            
           
 On 2012/09/11 03:25:12, dvyukov wrote: > On 2012/08/29 06:53:20, Sebastien Paolacci wrote: > > PTAL. > > > > As already observed by Dmitry, numbers really get crazy (and stable) for any > cpu > > number greater than 8. > > Does anybody have an explanation? It seems to correlate with number of > connections. Is it a kernel issue? Kernel is the easy culprit.. but I honestly first thought about Go's scheduler;) . I have the very same behavior on both 2.6.32 and 3.2 kernels, but didn't found any time to further investigate. > > I didn't investigate around the "Persistent*" variants being slowed-down for > > cpu=1, it just seems counter intuitive at first. > > I think it has to do with the fact that now you have 8 additional polling > goroutines. It must go away once you limit number of poll servers. I think I still don't get it. I'm puzzled about how the connection persistence "alone" can affect that patch so much. It obviously must have some explanation, but it is not natural to me. I'll run some more tests (the patch was actually left alone since the initial upload). Sebastien 
            
              Sign in to reply to this message.
            
           
 On 2012/09/11 03:29:55, dvyukov wrote: > It should be limited by min(GOMAXPROCS, NumCPU), because it makes no sense to > have 8 poll servers with GOMAXPROCS=1 (I believe that is what causes slowdown on > BenchmarkTCPPersistent-1). I think we should avoid GOMAXPROCS and use something more static. Every process that is playing with GOMAXPROC, e.g `go test', would otherwise miss the point (and I would avoid dynamically reassigning fds). I would consider that anything beyond NumCPU is useless, that anything smaller is not really damaging, and that anything greater than 8 currently doesn't bring any improvements. Min(NumCPU, 8) + lazy instantiation as suggested by Dave/Russ ? Sebastien 
            
              Sign in to reply to this message.
            
           
 Hello rsc@golang.org, dvyukov@google.com, dave@cheney.net (cc: golang-dev@googlegroups.com), I'd like you to review this change to https://code.google.com/p/go/ 
            
              Sign in to reply to this message.
            
           
 On 2012/09/11 19:30:26, Sebastien Paolacci wrote: > On 2012/09/11 02:58:02, rsc wrote: > > It would probably be okay to create poll servers on demand up to the max? > > A pollServer, despite its name, might not be a so heavy beast. I'll try to > measure it tomorrow (out of curiosity), and go for a lazy instantiation unless > someone really advocates the opposite. Each pollServer is a dedicated persistent thread + I think it may benefit from some amount of batching. 
            
              Sign in to reply to this message.
            
           
 On 2012/09/11 19:50:49, Sebastien Paolacci wrote: > On 2012/09/11 03:25:12, dvyukov wrote: > > On 2012/08/29 06:53:20, Sebastien Paolacci wrote: > > > PTAL. > > > > > > As already observed by Dmitry, numbers really get crazy (and stable) for any > > cpu > > > number greater than 8. > > > > Does anybody have an explanation? It seems to correlate with number of > > connections. Is it a kernel issue? > Kernel is the easy culprit.. but I honestly first thought about Go's scheduler;) > . I have the very same behavior on both 2.6.32 and 3.2 kernels, but didn't found > any time to further investigate. I observe it on 3.6.38 with both current and my new scheduler and poll server. > > > I didn't investigate around the "Persistent*" variants being slowed-down for > > > cpu=1, it just seems counter intuitive at first. > > > > I think it has to do with the fact that now you have 8 additional polling > > goroutines. It must go away once you limit number of poll servers. > I think I still don't get it. I'm puzzled about how the connection persistence > "alone" can affect that patch so much. > It obviously must have some explanation, but it is not natural to me. I'll run > some more tests (the patch was actually left alone since the initial upload). Each pollServer is a separate thread and goroutine. The hypothesis is when you have 8 pollServers with GOMAXPROCS=1, there are additional latencies and penalties caused by constant thread and goroutine switching where each goroutine does very little work. 
            
              Sign in to reply to this message.
            
           
 On 2012/09/11 20:05:38, Sebastien Paolacci wrote: > On 2012/09/11 03:29:55, dvyukov wrote: > > It should be limited by min(GOMAXPROCS, NumCPU), because it makes no sense to > > have 8 poll servers with GOMAXPROCS=1 (I believe that is what causes slowdown > on > > BenchmarkTCPPersistent-1). > I think we should avoid GOMAXPROCS and use something more static. Every process > that is playing with GOMAXPROC, e.g `go test', would otherwise miss the point > (and I would avoid dynamically reassigning fds). > > I would consider that anything beyond NumCPU is useless, that anything smaller > is not really damaging, and that anything greater than 8 currently doesn't bring > any improvements. > > Min(NumCPU, 8) + lazy instantiation as suggested by Dave/Russ ? 25% performance penalty does not look particularly cool. I would experiment with Min(GOMAXPROCS, NumCPU, 8). I think you do not need to re-distribute fds, just grow up to Min(GOMAXPROCS, NumCPU, 8) using current GOMAXPROCS value. I do not care a lot about programs that "play with GOMAXPROCS" (especially about go test). 
            
              Sign in to reply to this message.
            
           
 On 2012/09/16 05:09:07, dvyukov wrote: > 25% performance penalty does not look particularly cool. Indeed it is weird, fixed. The point was more about understanding why it just impacted the "Persistent" benchs, but decreasing the number of pollservers does actually fix the situation. > I would experiment with Min(GOMAXPROCS, NumCPU, 8). Done. >I think you do not need to re-distribute fds, just grow up > to Min(GOMAXPROCS, NumCPU, 8) using current GOMAXPROCS ? > value. I incidentally realized that assignment stability was not required at all, so the various requirements can be met (i.e changing GOMAXPROCS + Dave's lazy instantiation). >I do not care a lot about programs that "play with GOMAXPROCS"? >(especially about go test). I do care about those programs, I wouldn't even be able to bench that very patch otherwise;) Please take another look. Sebastien 
            
              Sign in to reply to this message.
            
           
 Hello rsc@golang.org, dvyukov@google.com, dave@cheney.net (cc: golang-dev@googlegroups.com), Please take another look. 
            
              Sign in to reply to this message.
            
           
 CL's description updated, here's last patch set result: Median of 10 runs, 4 cores @ 3.4GHz, amd/linux-3.2: BenchmarkTCPOneShot 171917 ns/op 175194 ns/op 1.91% BenchmarkTCPOneShot-2 101413 ns/op 109462 ns/op 7.94% BenchmarkTCPOneShot-4 91796 ns/op 35712 ns/op -61.10% BenchmarkTCPOneShot-6 90938 ns/op 30607 ns/op -66.34% BenchmarkTCPOneShot-8 90374 ns/op 29150 ns/op -67.75% BenchmarkTCPOneShot-16 101089 ns/op 111526 ns/op 10.32% BenchmarkTCPOneShotTimeout 174986 ns/op 178606 ns/op 2.07% BenchmarkTCPOneShotTimeout-2 101585 ns/op 110678 ns/op 8.95% BenchmarkTCPOneShotTimeout-4 91547 ns/op 35931 ns/op -60.75% BenchmarkTCPOneShotTimeout-6 91496 ns/op 31019 ns/op -66.10% BenchmarkTCPOneShotTimeout-8 90670 ns/op 29531 ns/op -67.43% BenchmarkTCPOneShotTimeout-16 101013 ns/op 106026 ns/op 4.96% BenchmarkTCPPersistent 51731 ns/op 53324 ns/op 3.08% BenchmarkTCPPersistent-2 32888 ns/op 30678 ns/op -6.72% BenchmarkTCPPersistent-4 25751 ns/op 15595 ns/op -39.44% BenchmarkTCPPersistent-6 26737 ns/op 9805 ns/op -63.33% BenchmarkTCPPersistent-8 26850 ns/op 9730 ns/op -63.76% BenchmarkTCPPersistent-16 104449 ns/op 102838 ns/op -1.54% BenchmarkTCPPersistentTimeout 51806 ns/op 53281 ns/op 2.85% BenchmarkTCPPersistentTimeout-2 32956 ns/op 30895 ns/op -6.25% BenchmarkTCPPersistentTimeout-4 25994 ns/op 18111 ns/op -30.33% BenchmarkTCPPersistentTimeout-6 26679 ns/op 9846 ns/op -63.09% BenchmarkTCPPersistentTimeout-8 26810 ns/op 9727 ns/op -63.72% BenchmarkTCPPersistentTimeout-16 101652 ns/op 104410 ns/op 2.71% 
            
              Sign in to reply to this message.
            
           
 LGTM 
            
              Sign in to reply to this message.
            
           
 https://codereview.appspot.com/6496054/diff/13002/src/pkg/net/sendfile_linux.go File src/pkg/net/sendfile_linux.go (right): https://codereview.appspot.com/6496054/diff/13002/src/pkg/net/sendfile_linux.... src/pkg/net/sendfile_linux.go:62: if err1 = c.pollServer().WaitWrite(c); err1 == nil { sendfile_freebsd.go needs the same change too. 
            
              Sign in to reply to this message.
            
           
 https://codereview.appspot.com/6496054/diff/13002/src/pkg/net/sendfile_linux.go File src/pkg/net/sendfile_linux.go (right): https://codereview.appspot.com/6496054/diff/13002/src/pkg/net/sendfile_linux.... src/pkg/net/sendfile_linux.go:62: if err1 = c.pollServer().WaitWrite(c); err1 == nil { On 2012/09/19 21:57:11, mikio wrote: > sendfile_freebsd.go needs the same change too. Done, thanks. 
            
              Sign in to reply to this message.
            
           
 LGTM 
            
              Sign in to reply to this message.
            
           
 So if GOMAXPROCS changes at runtime, the pollServer() accessor can lose its *pollServer? Why not just store the *pollServer in the *netFD? GOMAXPROCS changing at runtime used to sound crazy, but it's not uncommon now to adjust the number of VCPUs a VM gets at runtime due to changing demand. On Wed, Sep 19, 2012 at 3:19 PM, <mikioh.mikioh@gmail.com> wrote: > LGTM > > https://codereview.appspot.**com/6496054/<https://codereview.appspot.com/6496... > 
            
              Sign in to reply to this message.
            
           
 Hello Brad,
> So if GOMAXPROCS changes at runtime, the pollServer() accessor can 
> lose its *pollServer?
Yes and no. A same `*netFD' can indeed get buried in different pollServers over
the time, but we don't really care as long as the selected pollServer is
maintained for each single {wait,unblock} 
sequence, which always happens in e.g `fd.pollServer().WaitWrite(fd)'. I/O can't
also happen concurrently thanks to rio/wio mutexes.
All the tests pass, an aborting Close should however be racy if GOMAXPROCS
change in the between.
> Why not just store the *pollServer in the *netFD?
Indeed, seems to be the safest/easiest way to maintain hard coherency at the
*netFD level. Thanks.
> GOMAXPROCS changing at runtime used to sound crazy, but it's not
> uncommon now
It's exactly the purpose of the last patch set, the (quite used) benchmark
framework relies on that feature too.
Sebastien
            
              Sign in to reply to this message.
            
           
 I eventually manage to make `TestTCPListenClose' fail by injecting a dummy `rand.Int() % pollN' selection code in `pollserver()'. Even if it doesn't make so much sense, it should have worked. CL updated with Brad's suggestion, please take another look. Thanks, Sebastien 
            
              Sign in to reply to this message.
            
           
 Thank you for your proposal, but I remain uncomfortable with it. In return for additional scalability, a lot more code has been added in the hot path of newFD which will impact the latency of every accept and dial operation. I am concerned that this is a price that all users of net have to pay even if they are not interested in handling tens of thousands of active connections. https://codereview.appspot.com/6496054/diff/23001/src/pkg/net/fd_unix.go File src/pkg/net/fd_unix.go (right): https://codereview.appspot.com/6496054/diff/23001/src/pkg/net/fd_unix.go#newc... src/pkg/net/fd_unix.go:285: panic(err) if any error from newPollServer is a panic, newPollServer() itself should panic. 
            
              Sign in to reply to this message.
            
           
 Hello Dave,
I'm not sure I really got your point. It indeed adds few lines of code in the
"hot" `newFD()' (not in read and/or write), but if you look at them you'll find:
 - an access to an already burned `Once' (that was already present),   
 - an additional {modulo + test + two slice lookups} sequence.
Given that anyway going for few syscalls, two/three mutexes, a sleep/wake-up
sequence, and hopefully few real bytes of transfer, I can't see that as a major
latency impact. Benchmarks, given that few percents are really within the
confidence interval for that kind of run, seems decent (to me) even in the cpu=1
situation.
On the other side, handling few tens of thousands of active connections is
honestly not a corner-case.
Cheers,
Sebastien
            
              Sign in to reply to this message.
            
           
 https://codereview.appspot.com/6496054/diff/23001/src/pkg/net/fd_unix.go File src/pkg/net/fd_unix.go (right): https://codereview.appspot.com/6496054/diff/23001/src/pkg/net/fd_unix.go#newc... src/pkg/net/fd_unix.go:308: startServersOnce[k]() this line may crash if GOMAXPROCS has increased since the size it was during init(), when startServersOnce was initialized. 
            
              Sign in to reply to this message.
            
           
 https://codereview.appspot.com/6496054/diff/23001/src/pkg/net/fd_unix.go File src/pkg/net/fd_unix.go (right): https://codereview.appspot.com/6496054/diff/23001/src/pkg/net/fd_unix.go#newc... src/pkg/net/fd_unix.go:308: startServersOnce[k]() On 2012/09/20 18:43:18, bradfitz wrote: > this line may crash if GOMAXPROCS has increased since the size it was during > init(), when startServersOnce was initialized. init() does not call GOMAXPROCS, it calls NumCPU() 
            
              Sign in to reply to this message.
            
           
 https://codereview.appspot.com/6496054/diff/23001/src/pkg/net/fd_unix.go File src/pkg/net/fd_unix.go (right): https://codereview.appspot.com/6496054/diff/23001/src/pkg/net/fd_unix.go#newc... src/pkg/net/fd_unix.go:308: startServersOnce[k]() On 2012/09/20 19:15:11, dvyukov wrote: > On 2012/09/20 18:43:18, bradfitz wrote: > > this line may crash if GOMAXPROCS has increased since the size it was during > > init(), when startServersOnce was initialized. > > init() does not call GOMAXPROCS, it calls NumCPU() sorry, I see. Teaches me to read code on a bus. 
            
              Sign in to reply to this message.
            
           
 https://codereview.appspot.com/6496054/diff/23001/src/pkg/net/fd_unix.go File src/pkg/net/fd_unix.go (right): https://codereview.appspot.com/6496054/diff/23001/src/pkg/net/fd_unix.go#newc... src/pkg/net/fd_unix.go:303: pollN := runtime.GOMAXPROCS(0) Please move this out of newFD(). This change addresses inefficiency in current pollserver impl. I anticipate that pollserver impl will not only change in future, but also diverge for different OSes. The netFD part of this file implements generic functionality that won't diverge, so please split it from the partitioning logic (most OSes are perfectly able to implement a scalable pollserver, and so won't need it). 
            
              Sign in to reply to this message.
            
           
 On 2012/09/20 19:22:20, dvyukov wrote: https://codereview.appspot.com/6496054/diff/23001/src/pkg/net/fd_unix.go#newc... > src/pkg/net/fd_unix.go:303: pollN := runtime.GOMAXPROCS(0) > Please move this out of newFD(). Done. 
            
              Sign in to reply to this message.
            
           
 > I would consider that anything beyond NumCPU is useless, that anything > smaller is not really damaging, and that anything greater than 8 > currently doesn't bring any improvements. > > Min(NumCPU, 8) + lazy instantiation as suggested by Dave/Russ ? This is okay with me but I would like dvyukov to agree. Russ 
            
              Sign in to reply to this message.
            
           
 On 2012/09/21 14:36:13, rsc wrote: > > I would consider that anything beyond NumCPU is useless, that anything > > smaller is not really damaging, and that anything greater than 8 > > currently doesn't bring any improvements. > > > > Min(NumCPU, 8) + lazy instantiation as suggested by Dave/Russ ? > > This is okay with me but I would like dvyukov to agree. > > Russ Hello Russ, Just for the record, last patch round is stabilizing on `Min(GOMAXPROCS, NumCPU, 8)' with lazy instantiation. CL's description also mentions (but does not explain) that choice. Sebastien 
            
              Sign in to reply to this message.
            
           
 LGTM 
            
              Sign in to reply to this message.
            
           
 Is this going to be submitted? 
            
              Sign in to reply to this message.
            
           
 *** Submitted as http://code.google.com/p/go/source/detail?r=19cf9361902a *** net: spread fd over several pollservers. Lighten contention without preventing further improvements on pollservers. Connections are spread over Min(GOMAXPROCS, NumCPU, 8) pollserver instances. Median of 10 runs, 4 cores @ 3.4GHz, amd/linux-3.2: BenchmarkTCPOneShot 171917 ns/op 175194 ns/op 1.91% BenchmarkTCPOneShot-2 101413 ns/op 109462 ns/op 7.94% BenchmarkTCPOneShot-4 91796 ns/op 35712 ns/op -61.10% BenchmarkTCPOneShot-6 90938 ns/op 30607 ns/op -66.34% BenchmarkTCPOneShot-8 90374 ns/op 29150 ns/op -67.75% BenchmarkTCPOneShot-16 101089 ns/op 111526 ns/op 10.32% BenchmarkTCPOneShotTimeout 174986 ns/op 178606 ns/op 2.07% BenchmarkTCPOneShotTimeout-2 101585 ns/op 110678 ns/op 8.95% BenchmarkTCPOneShotTimeout-4 91547 ns/op 35931 ns/op -60.75% BenchmarkTCPOneShotTimeout-6 91496 ns/op 31019 ns/op -66.10% BenchmarkTCPOneShotTimeout-8 90670 ns/op 29531 ns/op -67.43% BenchmarkTCPOneShotTimeout-16 101013 ns/op 106026 ns/op 4.96% BenchmarkTCPPersistent 51731 ns/op 53324 ns/op 3.08% BenchmarkTCPPersistent-2 32888 ns/op 30678 ns/op -6.72% BenchmarkTCPPersistent-4 25751 ns/op 15595 ns/op -39.44% BenchmarkTCPPersistent-6 26737 ns/op 9805 ns/op -63.33% BenchmarkTCPPersistent-8 26850 ns/op 9730 ns/op -63.76% BenchmarkTCPPersistent-16 104449 ns/op 102838 ns/op -1.54% BenchmarkTCPPersistentTimeout 51806 ns/op 53281 ns/op 2.85% BenchmarkTCPPersistentTimeout-2 32956 ns/op 30895 ns/op -6.25% BenchmarkTCPPersistentTimeout-4 25994 ns/op 18111 ns/op -30.33% BenchmarkTCPPersistentTimeout-6 26679 ns/op 9846 ns/op -63.09% BenchmarkTCPPersistentTimeout-8 26810 ns/op 9727 ns/op -63.72% BenchmarkTCPPersistentTimeout-16 101652 ns/op 104410 ns/op 2.71% R=rsc, dvyukov, dave, mikioh.mikioh, bradfitz, remyoudompheng CC=golang-dev http://codereview.appspot.com/6496054 Committer: Russ Cox <rsc@golang.org> 
            
              Sign in to reply to this message.
            
           | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

