|
|
|
Created:
13 years, 9 months ago by akumar Modified:
13 years, 5 months ago Reviewers:
CC:
rsc, rminnich, npe1, aam, ality, golang-dev Visibility:
Public. |
Descriptionsyscall, os: Fix a fork-exec/wait race in Plan 9.
On Plan 9, only the parent of a given process can enter its wait
queue. When a Go program tries to fork-exec a child process
and subsequently waits for it to finish, the goroutines doing
these two tasks do not necessarily tie themselves to the same
(or any single) OS thread. In the case that the fork and the wait
system calls happen on different OS threads (say, due to a
goroutine being rescheduled somewhere along the way), the
wait() will either return an error or end up waiting for a
completely different child than was intended.
This change forces the fork and wait syscalls to happen in the
same goroutine and ties that goroutine to its OS thread until
the child exits. The PID of the child is recorded upon fork and
exit, and de-queued once the child's wait message has been read.
The Wait API, then, is translated into a synthetic implementation
that simply waits for the requested PID to show up in the queue
and then reads the associated stats.
Patch Set 1 #Patch Set 2 : diff -r 90c9121e26c3 https://go.googlecode.com/hg/ #Patch Set 3 : diff -r 90c9121e26c3 https://go.googlecode.com/hg/ #
Total comments: 26
Patch Set 4 : diff -r aa2f44cc25c0 https://go.googlecode.com/hg/ #
Total comments: 12
Patch Set 5 : diff -r 7ea3674ce4b5 https://code.google.com/p/go #
Total comments: 11
Patch Set 6 : diff -r 43203aa69a8a https://code.google.com/p/go #
Total comments: 19
Patch Set 7 : diff -r c53ac9baac67 https://code.google.com/p/go #
Total comments: 10
Patch Set 8 : diff -r 6519eef2c815 https://code.google.com/p/go #MessagesTotal messages: 27
Hello rsc@golang.org, rminnich@gmail.com, npe@plan9.bell-labs.com (cc: golang-dev@googlegroups.com), I'd like you to review this change to https://go.googlecode.com/hg/
Sign in to reply to this message.
nice! tests now actually pass, wow! go test -short std broke at database/sql. it appears to be a race because go test -short in the directory itself passes. here's what i see: andrey 305474 0:01 0:26 797892K Pread go andrey 305475 0:00 0:00 797892K Tsemacqu go andrey 305476 0:00 0:02 797892K Semacqui go andrey 305486 0:00 0:00 797892K Semacqui go andrey 305487 0:00 0:00 797892K Semacqui go andrey 305794 0:00 0:00 794000K Semacqui sql.test andrey 305795 0:00 0:00 794000K Broken sql.test % acid 305795 /proc/305795/text:386 plan 9 executable /sys/lib/acid/port /sys/lib/acid/386 acid: lstk() <stdin>:2: (error) no stack frame: can't translate address 0xf06bb5fe acid: echo kill > /proc/305795/ctl % acid 305794 /proc/305794/text:386 plan 9 executable /sys/lib/acid/port /sys/lib/acid/386 acid: lstk() runtime.plan9_semacquire()+0x7 /usr/andrey/go/src/pkg/runtime/sys_plan9_386.s:50 runtime.semasleep(ns=0xffffffff)+0x50 /usr/andrey/go/src/pkg/runtime/thread_plan9.c:289 runtime.notesleep(n=0x1061e084)+0xb4 /usr/andrey/go/src/pkg/runtime/lock_sema.c:160 nextgandunlock()+0x19b /usr/andrey/go/src/pkg/runtime/proc.c:635 gp=0x1e2b74 schedule(gp=0x1061a080)+0xb4 /usr/andrey/go/src/pkg/runtime/proc.c:920 runtime.mcall(fn=0x1061a080)+0x34 /usr/andrey/go/src/pkg/runtime/asm_386.s:175 0x1061a080 ?file?:0 acid: % cd pkg/database/sql && go test -short PASS ok database/sql 0.375s % here are the passing tests before that: % time go test std -short '-timeout=120s' ok cmd/api 0.277s ? cmd/cgo [no test files] ok cmd/fix 2.337s ok cmd/go 0.297s ? cmd/godoc [no test files] ok cmd/gofmt 0.330s ? cmd/vet [no test files] ? cmd/yacc [no test files] ok archive/tar 0.153s ok archive/zip 0.690s ok bufio 0.218s ok bytes 0.415s ok compress/bzip2 0.433s ok compress/flate 1.425s ok compress/gzip 0.165s ok compress/lzw 0.550s ok compress/zlib 2.670s ok container/heap 0.143s ok container/list 0.139s ok container/ring 0.146s ? crypto [no test files] ok crypto/aes 0.178s ok crypto/cipher 0.209s ok crypto/des 0.219s ok crypto/dsa 0.300s ok crypto/ecdsa 0.199s ok crypto/elliptic 0.178s ok crypto/hmac 0.157s ok crypto/md5 0.159s ok crypto/rand 0.364s ok crypto/rc4 0.146s ok crypto/rsa 0.513s ok crypto/sha1 0.151s ok crypto/sha256 0.149s ok crypto/sha512 0.153s ok crypto/subtle 0.156s ok crypto/tls 0.623s ok crypto/x509 1.799s ? crypto/x509/pkix [no test files]
Sign in to reply to this message.
This gets us over a major hurdle. LGTM
Sign in to reply to this message.
what I encountered after a day of running go test with this CL in: - go test go/ast always fails with the last process in Pwrite state - occasionally go test will fail in a random place with the last proc in Broken state. the whole runtime appears to be blocked because after the broken process is killed the runtime will fire the (long expired) timeout set up by the go binary itself - if a binary is just being started a keyboard interrupt will occasionally cause: split stack overflow: 0x1096ff3c < 0x10970000 throw: runtime: split stack overflow - i saw one "throw: negative mcpu in scheduler" while running go test attached is a log of go test -short -timeout 30s where the hung processes (in broken state) are manually killed after a couple of minutes.
Sign in to reply to this message.
Yes, there are definitely more issues in the go runtime for Plan 9 to be resolved. :-) But from what I can see, these seem to be orthogonal to what this patch is doing; rather, they're coming up because you're now able to get further into the process. On 21 September 2012 18:01, andrey mirtchovski <mirtchovski@gmail.com> wrote: > what I encountered after a day of running go test with this CL in: > > - go test go/ast always fails with the last process in Pwrite state > - occasionally go test will fail in a random place with the last proc > in Broken state. the whole runtime appears to be blocked because after > the broken process is killed the runtime will fire the (long expired) > timeout set up by the go binary itself > - if a binary is just being started a keyboard interrupt will > occasionally cause: > > split stack overflow: 0x1096ff3c < 0x10970000 > throw: runtime: split stack overflow > > - i saw one "throw: negative mcpu in scheduler" while running go test > > attached is a log of go test -short -timeout 30s where the hung > processes (in broken state) are manually killed after a couple of > minutes.
Sign in to reply to this message.
http://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/exec_plan9.go File src/pkg/syscall/exec_plan9.go (right): http://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/exec_plan9.go... src/pkg/syscall/exec_plan9.go:519: // Plan 9, only the parent thread can ever enter the s/ever enter the wait queue/wait/ There's no queue. http://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/exec_plan9.go... src/pkg/syscall/exec_plan9.go:539: defer runtime.UnlockOSThread() This will happen automatically when the goroutine dies. Delete. http://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/exec_plan9.go... src/pkg/syscall/exec_plan9.go:542: forkc <- rval You should probably wait to do this until you have entered the pid in forks. That means you'll need a forkc <- rval in the next if statement too. http://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/exec_plan9.go... src/pkg/syscall/exec_plan9.go:544: // No one should be waiting // If fork fails there is nothing to wait for. http://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/exec_plan9.go... src/pkg/syscall/exec_plan9.go:546: if err != nil || pid == -1 { err := rval.err first http://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/exec_plan9.go... src/pkg/syscall/exec_plan9.go:552: forks[pid] = true You need a lock here. http://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/exec_plan9.go... src/pkg/syscall/exec_plan9.go:552: forks[pid] = true You can avoid having two maps by having waitmsg map[int]*Waitmsg and use nil to mean it's running and not done yet. Also, why is pid int? Didn't we just establish that was too short? http://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/exec_plan9.go... src/pkg/syscall/exec_plan9.go:553: defer delete(forks, pid) You need a lock here (in the code that runs at defer). Probably better to just do the lock+delete at the bottom. The defer isn't saving anything. http://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/syscall_plan9.go File src/pkg/syscall/syscall_plan9.go (right): http://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/syscall_plan9... src/pkg/syscall/syscall_plan9.go:46: type Processes struct { Please don't export this. type processes struct Also you can put the sync.Mutex and the sync.Cond here. In fact you don't even need the type or the extra variables. var procs struct { once sync.Once sync.RWMutex cond sync.Cond forks map[int]bool waits map[int]Waitmsg } http://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/syscall_plan9... src/pkg/syscall/syscall_plan9.go:59: procs Processes In general please don't put unexported vars in the same block as exported vars. It complicates the godoc output. http://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/syscall_plan9... src/pkg/syscall/syscall_plan9.go:213: func makeProcs() { Please move all this to the other file, so that all the code manipulating procs is in one place. The procs struct too. http://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/syscall_plan9... src/pkg/syscall/syscall_plan9.go:239: return NewError("no forked process with the given pid") NewError("process not found") is fine. http://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/syscall_plan9... src/pkg/syscall/syscall_plan9.go:248: delete(waits, w.Pid) Deleting from the map while holding a reader lock is a no-no. I would just use a Mutex instead of an RWMutex. Everyone needs to write things.
Sign in to reply to this message.
PTAL. https://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/exec_plan9.go File src/pkg/syscall/exec_plan9.go (right): https://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/exec_plan9.g... src/pkg/syscall/exec_plan9.go:519: // Plan 9, only the parent thread can ever enter the On 2012/09/24 15:46:11, rsc wrote: > s/ever enter the wait queue/wait/ > There's no queue. Done. https://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/exec_plan9.g... src/pkg/syscall/exec_plan9.go:539: defer runtime.UnlockOSThread() On 2012/09/24 15:46:11, rsc wrote: > This will happen automatically when the goroutine dies. Delete. Done. https://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/exec_plan9.g... src/pkg/syscall/exec_plan9.go:542: forkc <- rval On 2012/09/24 15:46:11, rsc wrote: > You should probably wait to do this until you have entered the pid in forks. > That means you'll need a forkc <- rval in the next if statement too. Done. https://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/exec_plan9.g... src/pkg/syscall/exec_plan9.go:544: // No one should be waiting On 2012/09/24 15:46:11, rsc wrote: > // If fork fails there is nothing to wait for. Done. https://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/exec_plan9.g... src/pkg/syscall/exec_plan9.go:546: if err != nil || pid == -1 { On 2012/09/24 15:46:11, rsc wrote: > err := rval.err first Done. https://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/exec_plan9.g... src/pkg/syscall/exec_plan9.go:552: forks[pid] = true On 2012/09/24 15:46:11, rsc wrote: > You need a lock here. Done. https://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/exec_plan9.g... src/pkg/syscall/exec_plan9.go:552: forks[pid] = true On 2012/09/24 15:46:11, rsc wrote: > You can avoid having two maps by having > > waitmsg map[int]*Waitmsg > > and use nil to mean it's running and not done yet. > > Also, why is pid int? Didn't we just establish that was too short? Done. https://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/exec_plan9.g... src/pkg/syscall/exec_plan9.go:553: defer delete(forks, pid) On 2012/09/24 15:46:11, rsc wrote: > You need a lock here (in the code that runs at defer). Probably better to just > do the lock+delete at the bottom. The defer isn't saving anything. > Done. Took your suggestion about using just one map; this is no longer needed. https://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/syscall_plan... File src/pkg/syscall/syscall_plan9.go (right): https://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/syscall_plan... src/pkg/syscall/syscall_plan9.go:46: type Processes struct { On 2012/09/24 15:46:11, rsc wrote: > Please don't export this. > > type processes struct > > Also you can put the sync.Mutex and the sync.Cond here. > In fact you don't even need the type or the extra variables. > > var procs struct { > once sync.Once > sync.RWMutex > cond sync.Cond > forks map[int]bool > waits map[int]Waitmsg > } Done. https://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/syscall_plan... src/pkg/syscall/syscall_plan9.go:59: procs Processes On 2012/09/24 15:46:11, rsc wrote: > In general please don't put unexported vars in the same block as exported vars. > It complicates the godoc output. Done. https://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/syscall_plan... src/pkg/syscall/syscall_plan9.go:213: func makeProcs() { On 2012/09/24 15:46:11, rsc wrote: > Please move all this to the other file, so that all the code manipulating procs > is in one place. The procs struct too. Done. https://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/syscall_plan... src/pkg/syscall/syscall_plan9.go:239: return NewError("no forked process with the given pid") On 2012/09/24 15:46:11, rsc wrote: > NewError("process not found") is fine. Done. https://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/syscall_plan... src/pkg/syscall/syscall_plan9.go:248: delete(waits, w.Pid) On 2012/09/24 15:46:11, rsc wrote: > Deleting from the map while holding a reader lock is a no-no. I would just use a > Mutex instead of an RWMutex. Everyone needs to write things. > Done.
Sign in to reply to this message.
ping On 2012/09/30 21:22:13, akumar wrote: > PTAL. > > https://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/exec_plan9.go > File src/pkg/syscall/exec_plan9.go (right): > > https://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/exec_plan9.g... > src/pkg/syscall/exec_plan9.go:519: // Plan 9, only the parent thread can ever > enter the > On 2012/09/24 15:46:11, rsc wrote: > > s/ever enter the wait queue/wait/ > > There's no queue. > > Done. > > https://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/exec_plan9.g... > src/pkg/syscall/exec_plan9.go:539: defer runtime.UnlockOSThread() > On 2012/09/24 15:46:11, rsc wrote: > > This will happen automatically when the goroutine dies. Delete. > > Done. > > https://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/exec_plan9.g... > src/pkg/syscall/exec_plan9.go:542: forkc <- rval > On 2012/09/24 15:46:11, rsc wrote: > > You should probably wait to do this until you have entered the pid in forks. > > That means you'll need a forkc <- rval in the next if statement too. > > Done. > > https://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/exec_plan9.g... > src/pkg/syscall/exec_plan9.go:544: // No one should be waiting > On 2012/09/24 15:46:11, rsc wrote: > > // If fork fails there is nothing to wait for. > > Done. > > https://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/exec_plan9.g... > src/pkg/syscall/exec_plan9.go:546: if err != nil || pid == -1 { > On 2012/09/24 15:46:11, rsc wrote: > > err := rval.err first > > Done. > > https://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/exec_plan9.g... > src/pkg/syscall/exec_plan9.go:552: forks[pid] = true > On 2012/09/24 15:46:11, rsc wrote: > > You need a lock here. > > Done. > > https://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/exec_plan9.g... > src/pkg/syscall/exec_plan9.go:552: forks[pid] = true > On 2012/09/24 15:46:11, rsc wrote: > > You can avoid having two maps by having > > > > waitmsg map[int]*Waitmsg > > > > and use nil to mean it's running and not done yet. > > > > Also, why is pid int? Didn't we just establish that was too short? > > Done. > > https://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/exec_plan9.g... > src/pkg/syscall/exec_plan9.go:553: defer delete(forks, pid) > On 2012/09/24 15:46:11, rsc wrote: > > You need a lock here (in the code that runs at defer). Probably better to just > > do the lock+delete at the bottom. The defer isn't saving anything. > > > > Done. > > Took your suggestion about using just one map; this is no longer needed. > > https://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/syscall_plan... > File src/pkg/syscall/syscall_plan9.go (right): > > https://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/syscall_plan... > src/pkg/syscall/syscall_plan9.go:46: type Processes struct { > On 2012/09/24 15:46:11, rsc wrote: > > Please don't export this. > > > > type processes struct > > > > Also you can put the sync.Mutex and the sync.Cond here. > > In fact you don't even need the type or the extra variables. > > > > var procs struct { > > once sync.Once > > sync.RWMutex > > cond sync.Cond > > forks map[int]bool > > waits map[int]Waitmsg > > } > > Done. > > https://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/syscall_plan... > src/pkg/syscall/syscall_plan9.go:59: procs Processes > On 2012/09/24 15:46:11, rsc wrote: > > In general please don't put unexported vars in the same block as exported > vars. > > It complicates the godoc output. > > Done. > > https://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/syscall_plan... > src/pkg/syscall/syscall_plan9.go:213: func makeProcs() { > On 2012/09/24 15:46:11, rsc wrote: > > Please move all this to the other file, so that all the code manipulating > procs > > is in one place. The procs struct too. > > Done. > > https://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/syscall_plan... > src/pkg/syscall/syscall_plan9.go:239: return NewError("no forked process with > the given pid") > On 2012/09/24 15:46:11, rsc wrote: > > NewError("process not found") is fine. > > Done. > > https://codereview.appspot.com/6545051/diff/4001/src/pkg/syscall/syscall_plan... > src/pkg/syscall/syscall_plan9.go:248: delete(waits, w.Pid) > On 2012/09/24 15:46:11, rsc wrote: > > Deleting from the map while holding a reader lock is a no-no. I would just use > a > > Mutex instead of an RWMutex. Everyone needs to write things. > > > > Done.
Sign in to reply to this message.
Getting close. https://codereview.appspot.com/6545051/diff/10010/src/pkg/syscall/exec_plan9.go File src/pkg/syscall/exec_plan9.go (right): https://codereview.appspot.com/6545051/diff/10010/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:516: cond *sync.Cond cond sync.Cond https://codereview.appspot.com/6545051/diff/10010/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:522: procs.cond = sync.NewCond(&procs.Mutex) proc.cond.L = &procs.Mutex https://codereview.appspot.com/6545051/diff/10010/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:569: for wmsg.Pid != pid { Why is this a for loop? There is only one thing to wait for. I expected (no loop): wmsg.err = Await(&wmsg) procs.Lock() waits[wmsg.Pid] = &wmsg procs.cond.Broadcast() procs.Unlock() https://codereview.appspot.com/6545051/diff/10010/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:641: procs.cond.L.Lock() procs.Lock() defer procs.Unlock() https://codereview.appspot.com/6545051/diff/10010/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:651: // Everytime a process exits, check Every time https://codereview.appspot.com/6545051/diff/10010/src/pkg/syscall/syscall_pla... File src/pkg/syscall/syscall_plan9.go (right): https://codereview.appspot.com/6545051/diff/10010/src/pkg/syscall/syscall_pla... src/pkg/syscall/syscall_plan9.go:186: err error We can't just add extra unexported fields to data structures like this. Define a separate type and use that instead.
Sign in to reply to this message.
What's the status of this CL? Thanks, Anthony
Sign in to reply to this message.
Hey, sorry, I went out of country and then got caught up in other projects. This and the other patch will be fixed within the next few days. Again, my apologies for letting such things linger. On 5 December 2012 21:48, Russ Cox <rsc@golang.org> wrote: > I think we're waiting for Akshat to address my last round of comments.
Sign in to reply to this message.
I think we're waiting for Akshat to address my last round of comments.
Sign in to reply to this message.
PTAL. https://codereview.appspot.com/6545051/diff/10010/src/pkg/syscall/exec_plan9.go File src/pkg/syscall/exec_plan9.go (right): https://codereview.appspot.com/6545051/diff/10010/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:516: cond *sync.Cond On 2012/11/01 16:40:32, rsc wrote: > cond sync.Cond Done. https://codereview.appspot.com/6545051/diff/10010/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:522: procs.cond = sync.NewCond(&procs.Mutex) On 2012/11/01 16:40:32, rsc wrote: > proc.cond.L = &procs.Mutex Done. https://codereview.appspot.com/6545051/diff/10010/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:569: for wmsg.Pid != pid { On 2012/11/01 16:40:32, rsc wrote: > Why is this a for loop? There is only one thing to wait for. > I expected (no loop): > > wmsg.err = Await(&wmsg) > procs.Lock() > waits[wmsg.Pid] = &wmsg > procs.cond.Broadcast() > procs.Unlock() Ah, I think you're right. I was thinking that the same thread might be waiting for more than one kid and so we might receive the wait message for the wrong kid here; however, that check is already in WaitProcess and shouldn't be done here. Done. https://codereview.appspot.com/6545051/diff/10010/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:641: procs.cond.L.Lock() On 2012/11/01 16:40:32, rsc wrote: > procs.Lock() > defer procs.Unlock() Done. https://codereview.appspot.com/6545051/diff/10010/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:651: // Everytime a process exits, check On 2012/11/01 16:40:32, rsc wrote: > Every time Done. https://codereview.appspot.com/6545051/diff/10010/src/pkg/syscall/syscall_pla... File src/pkg/syscall/syscall_plan9.go (right): https://codereview.appspot.com/6545051/diff/10010/src/pkg/syscall/syscall_pla... src/pkg/syscall/syscall_plan9.go:186: err error On 2012/11/01 16:40:32, rsc wrote: > We can't just add extra unexported fields to data structures like this. Define a > separate type and use that instead. Done.
Sign in to reply to this message.
ping? On 17 December 2012 10:47, <seed@mail.nanosouffle.net> wrote: > PTAL. > > > https://codereview.appspot.com/6545051/diff/10010/src/pkg/syscall/exec_plan9.go > File src/pkg/syscall/exec_plan9.go (right): > > https://codereview.appspot.com/6545051/diff/10010/src/pkg/syscall/exec_plan9.... > src/pkg/syscall/exec_plan9.go:516: cond *sync.Cond > On 2012/11/01 16:40:32, rsc wrote: >> >> cond sync.Cond > > > Done. > > https://codereview.appspot.com/6545051/diff/10010/src/pkg/syscall/exec_plan9.... > src/pkg/syscall/exec_plan9.go:522: procs.cond = > sync.NewCond(&procs.Mutex) > On 2012/11/01 16:40:32, rsc wrote: >> >> proc.cond.L = &procs.Mutex > > > Done. > > https://codereview.appspot.com/6545051/diff/10010/src/pkg/syscall/exec_plan9.... > src/pkg/syscall/exec_plan9.go:569: for wmsg.Pid != pid { > On 2012/11/01 16:40:32, rsc wrote: >> >> Why is this a for loop? There is only one thing to wait for. >> I expected (no loop): > > >> wmsg.err = Await(&wmsg) >> procs.Lock() >> waits[wmsg.Pid] = &wmsg >> procs.cond.Broadcast() >> procs.Unlock() > > > Ah, I think you're right. I was thinking that the same thread might be > waiting for more than one kid and so we might receive the wait message > for the wrong kid here; however, that check is already in WaitProcess > and shouldn't be done here. > > Done. > > https://codereview.appspot.com/6545051/diff/10010/src/pkg/syscall/exec_plan9.... > src/pkg/syscall/exec_plan9.go:641: procs.cond.L.Lock() > On 2012/11/01 16:40:32, rsc wrote: >> >> procs.Lock() >> defer procs.Unlock() > > > Done. > > https://codereview.appspot.com/6545051/diff/10010/src/pkg/syscall/exec_plan9.... > src/pkg/syscall/exec_plan9.go:651: // Everytime a process exits, check > On 2012/11/01 16:40:32, rsc wrote: >> >> Every time > > > Done. > > https://codereview.appspot.com/6545051/diff/10010/src/pkg/syscall/syscall_pla... > File src/pkg/syscall/syscall_plan9.go (right): > > https://codereview.appspot.com/6545051/diff/10010/src/pkg/syscall/syscall_pla... > src/pkg/syscall/syscall_plan9.go:186: err error > On 2012/11/01 16:40:32, rsc wrote: >> >> We can't just add extra unexported fields to data structures like > > this. Define a >> >> separate type and use that instead. > > > Done. > > https://codereview.appspot.com/6545051/
Sign in to reply to this message.
ping once more On 20 December 2012 13:33, Akshat Kumar <seed@mail.nanosouffle.net> wrote: > ping? > > On 17 December 2012 10:47, <seed@mail.nanosouffle.net> wrote: >> PTAL. >> >> >> https://codereview.appspot.com/6545051/diff/10010/src/pkg/syscall/exec_plan9.go >> File src/pkg/syscall/exec_plan9.go (right): >> >> https://codereview.appspot.com/6545051/diff/10010/src/pkg/syscall/exec_plan9.... >> src/pkg/syscall/exec_plan9.go:516: cond *sync.Cond >> On 2012/11/01 16:40:32, rsc wrote: >>> >>> cond sync.Cond >> >> >> Done. >> >> https://codereview.appspot.com/6545051/diff/10010/src/pkg/syscall/exec_plan9.... >> src/pkg/syscall/exec_plan9.go:522: procs.cond = >> sync.NewCond(&procs.Mutex) >> On 2012/11/01 16:40:32, rsc wrote: >>> >>> proc.cond.L = &procs.Mutex >> >> >> Done. >> >> https://codereview.appspot.com/6545051/diff/10010/src/pkg/syscall/exec_plan9.... >> src/pkg/syscall/exec_plan9.go:569: for wmsg.Pid != pid { >> On 2012/11/01 16:40:32, rsc wrote: >>> >>> Why is this a for loop? There is only one thing to wait for. >>> I expected (no loop): >> >> >>> wmsg.err = Await(&wmsg) >>> procs.Lock() >>> waits[wmsg.Pid] = &wmsg >>> procs.cond.Broadcast() >>> procs.Unlock() >> >> >> Ah, I think you're right. I was thinking that the same thread might be >> waiting for more than one kid and so we might receive the wait message >> for the wrong kid here; however, that check is already in WaitProcess >> and shouldn't be done here. >> >> Done. >> >> https://codereview.appspot.com/6545051/diff/10010/src/pkg/syscall/exec_plan9.... >> src/pkg/syscall/exec_plan9.go:641: procs.cond.L.Lock() >> On 2012/11/01 16:40:32, rsc wrote: >>> >>> procs.Lock() >>> defer procs.Unlock() >> >> >> Done. >> >> https://codereview.appspot.com/6545051/diff/10010/src/pkg/syscall/exec_plan9.... >> src/pkg/syscall/exec_plan9.go:651: // Everytime a process exits, check >> On 2012/11/01 16:40:32, rsc wrote: >>> >>> Every time >> >> >> Done. >> >> https://codereview.appspot.com/6545051/diff/10010/src/pkg/syscall/syscall_pla... >> File src/pkg/syscall/syscall_plan9.go (right): >> >> https://codereview.appspot.com/6545051/diff/10010/src/pkg/syscall/syscall_pla... >> src/pkg/syscall/syscall_plan9.go:186: err error >> On 2012/11/01 16:40:32, rsc wrote: >>> >>> We can't just add extra unexported fields to data structures like >> >> this. Define a >>> >>> separate type and use that instead. >> >> >> Done. >> >> https://codereview.appspot.com/6545051/
Sign in to reply to this message.
https://codereview.appspot.com/6545051/diff/28001/src/pkg/os/exec_plan9.go File src/pkg/os/exec_plan9.go (right): https://codereview.appspot.com/6545051/diff/28001/src/pkg/os/exec_plan9.go#ne... src/pkg/os/exec_plan9.go:79: for true { s/for true/for/ https://codereview.appspot.com/6545051/diff/28001/src/pkg/os/exec_plan9.go#ne... src/pkg/os/exec_plan9.go:79: for true { I don't think the loop is needed at all. When will WaitProcess succeed but return waitmsg.Pid != p.Pid? https://codereview.appspot.com/6545051/diff/28001/src/pkg/syscall/exec_plan9.go File src/pkg/syscall/exec_plan9.go (right): https://codereview.appspot.com/6545051/diff/28001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:512: waits map[int]*waitErr The use of the condition variable here is not quite right. If multiple goroutines try to wait for the same pid, one will get a result and the other will hang forever. I think everything can be a bit clearer if we avoid the condition variable and use a channel for each process: delete cond entirely and change waits to be a map[int]chan *waitErr. https://codereview.appspot.com/6545051/diff/28001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:517: procs.cond.L = &procs.Mutex Delete. https://codereview.appspot.com/6545051/diff/28001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:531: var ( There is one rval var here shared bewteen the goroutine and the parent. It is not technically incorrect, but it's confusing. Much clearer would be to put the declarations next to their uses, which will incidentally split the two different rvals into two different variables. type pidErr struct { pid int err error } forkc := make(chan pidErr, 1) go func() { runtime.LockOSThread() var rval pidErr rval.pid, rval.err = ... ... ch := make(chan *waitErr, 1) procs.Lock() waits[pid] = ch procs.Unlock() forkc <- rval var wmsg waitErr wmsg.err = Await(&wmsg.Waitmsg) ch <- &wmsg close(ch) }() rval := <-forkc return rval.pid, rval.err https://codereview.appspot.com/6545051/diff/28001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:636: // Make sure that there is a process procs.Lock() ch := waits[pid] procs.Unlock() var wmsg *waitErr if ch != nil { wmsg = <-ch procs.Lock() if waits[pid] == ch { delete(waits, pid) } procs.Unlock() } if wmsg == nil { // ch was missing or ch is closed return NewError("process not found") } if wmsg.err != nil { return wmsg.err } *w = wmsg.Waitmsg return nil
Sign in to reply to this message.
Thanks Russ, for these wonderful improvements. I hope I've implemented them properly. https://codereview.appspot.com/6545051/diff/28001/src/pkg/os/exec_plan9.go File src/pkg/os/exec_plan9.go (right): https://codereview.appspot.com/6545051/diff/28001/src/pkg/os/exec_plan9.go#ne... src/pkg/os/exec_plan9.go:79: for true { On 2013/01/07 05:10:00, rsc wrote: > I don't think the loop is needed at all. When will WaitProcess succeed but > return waitmsg.Pid != p.Pid? Done. https://codereview.appspot.com/6545051/diff/28001/src/pkg/syscall/exec_plan9.go File src/pkg/syscall/exec_plan9.go (right): https://codereview.appspot.com/6545051/diff/28001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:512: waits map[int]*waitErr On 2013/01/07 05:10:00, rsc wrote: > The use of the condition variable here is not quite right. If multiple > goroutines try to wait for the same pid, one will get a result and the other > will hang forever. I think everything can be a bit clearer if we avoid the > condition variable and use a channel for each process: delete cond entirely and > change waits to be a map[int]chan *waitErr. Done. https://codereview.appspot.com/6545051/diff/28001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:517: procs.cond.L = &procs.Mutex On 2013/01/07 05:10:00, rsc wrote: > Delete. Done. https://codereview.appspot.com/6545051/diff/28001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:531: var ( On 2013/01/07 05:10:00, rsc wrote: > There is one rval var here shared bewteen the goroutine and the parent. It is > not technically incorrect, but it's confusing. Much clearer would be to put the > declarations next to their uses, which will incidentally split the two different > rvals into two different variables. > > type pidErr struct { > pid int > err error > } > > forkc := make(chan pidErr, 1) > go func() { > runtime.LockOSThread() > var rval pidErr > rval.pid, rval.err = ... > ... > ch := make(chan *waitErr, 1) > procs.Lock() > waits[pid] = ch > procs.Unlock() > forkc <- rval > > var wmsg waitErr > wmsg.err = Await(&wmsg.Waitmsg) > ch <- &wmsg > close(ch) > }() > rval := <-forkc > return rval.pid, rval.err Done. https://codereview.appspot.com/6545051/diff/28001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:636: // Make sure that there is a process On 2013/01/07 05:10:00, rsc wrote: > procs.Lock() > ch := waits[pid] > procs.Unlock() > > var wmsg *waitErr > if ch != nil { > wmsg = <-ch > procs.Lock() > if waits[pid] == ch { > delete(waits, pid) > } > procs.Unlock() > } > if wmsg == nil { > // ch was missing or ch is closed > return NewError("process not found") > } > if wmsg.err != nil { > return wmsg.err > } > *w = wmsg.Waitmsg > return nil Done. I also added a check to make sure w != nil, but not cause an error, since we might want to wait on a process without caring for the Waitmsg.
Sign in to reply to this message.
On a second thought, I think this is not quite right: in buying the ability for multiple goroutines on the same go thread to wait on one process without a deadlock, we've lost a proper routing of wait messages in the case that multiple processes are started from one go thread (thus causing another deadlock). However, I think the fix is not too drastic and channels are the right way forward. https://codereview.appspot.com/6545051/diff/37001/src/pkg/syscall/exec_plan9.go File src/pkg/syscall/exec_plan9.go (right): https://codereview.appspot.com/6545051/diff/37001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:560: ch <- &wmsg Note that there is no guarantee here that wmsg.Pid == pid, since this thread may have started more than one process. If indeed wmsg.Pid != pid, then anyone waiting on waits[wmsg.Pid] will continue to wait forever. I think the way to fix this is: if waits[wmsg.Pid] != nil { waits[wmsg.Pid] <- &wmsg close(waits[wmsg.Pid]) } (and not to do the close(ch) itself). https://codereview.appspot.com/6545051/diff/37001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:631: if ch != nil { If the above change is not made, then there are no guarantees anymore that WaitProcess() will wait for the given pid if the process is still running, since it simply gives up after the first wait message that is received, regardless of the pid. https://codereview.appspot.com/6545051/diff/37001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:634: if waits[pid] == ch { Should this not be: waits[wmsg.Pid] ? Otherwise, this is a tautology.
Sign in to reply to this message.
Ah, looking into runtime.LockOSThread() again, I think your method indeed works because only the goroutine that is waiting for the fork-execed process can be running on the OS thread, until the wait finishes. That is, only one process can be forked at any time from any one OS thread. Then there is no confusion at all, so my proposed change might be unnecessarily cautious. Thanks and sorry for the noise! Akshat On 8 January 2013 02:12, <seed@mail.nanosouffle.net> wrote: > On a second thought, I think this is not quite right: in buying the > ability for multiple goroutines on the same go thread to wait on one > process without a deadlock, we've lost a proper routing of wait messages > in the case that multiple processes are started from one go thread (thus > causing another deadlock). > > However, I think the fix is not too drastic and channels are the right > way forward. > > > https://codereview.appspot.com/6545051/diff/37001/src/pkg/syscall/exec_plan9.go > File src/pkg/syscall/exec_plan9.go (right): > > https://codereview.appspot.com/6545051/diff/37001/src/pkg/syscall/exec_plan9.... > src/pkg/syscall/exec_plan9.go:560: ch <- &wmsg > Note that there is no guarantee here that wmsg.Pid == pid, since this > thread may have started more than one process. If indeed wmsg.Pid != > pid, then anyone waiting on waits[wmsg.Pid] will continue to wait > forever. > > I think the way to fix this is: > > if waits[wmsg.Pid] != nil { > waits[wmsg.Pid] <- &wmsg > close(waits[wmsg.Pid]) > } > > (and not to do the close(ch) itself). > > https://codereview.appspot.com/6545051/diff/37001/src/pkg/syscall/exec_plan9.... > src/pkg/syscall/exec_plan9.go:631: if ch != nil { > If the above change is not made, then there are no guarantees anymore > that WaitProcess() will wait for the given pid if the process is still > running, since it simply gives up after the first wait message that is > received, regardless of the pid. > > https://codereview.appspot.com/6545051/diff/37001/src/pkg/syscall/exec_plan9.... > src/pkg/syscall/exec_plan9.go:634: if waits[pid] == ch { > Should this not be: waits[wmsg.Pid] ? Otherwise, this is a tautology. > > https://codereview.appspot.com/6545051/
Sign in to reply to this message.
This is pretty close. I have just a few nitpicks. Thanks for fixing this. https://codereview.appspot.com/6545051/diff/37001/src/pkg/syscall/exec_plan9.go File src/pkg/syscall/exec_plan9.go (right): https://codereview.appspot.com/6545051/diff/37001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:514: func makeProcs() { I would just make this another init function and get rid of the sync.Once variable altogether. This is just my preference. I'd rather allocate package globals at startup then on the first call to a function. https://codereview.appspot.com/6545051/diff/37001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:529: type pidErr struct { I think the code would read nicer if this was named "forkRet" and the variables were "r" or "ret" instead of "rval" throughout. https://codereview.appspot.com/6545051/diff/37001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:540: pid := rval.pid Delete this line. https://codereview.appspot.com/6545051/diff/37001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:542: if rval.err != nil || pid == 0 { Make this rval.pid. https://codereview.appspot.com/6545051/diff/37001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:548: waits := procs.waits Delete this line. https://codereview.appspot.com/6545051/diff/37001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:549: ch := make(chan *waitErr, 1) I would name this "waitc". https://codereview.appspot.com/6545051/diff/37001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:553: waits[pid] = ch Make this procs.waits. https://codereview.appspot.com/6545051/diff/37001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:558: var wmsg waitErr Just "w" seems fine. The variable is localized to just these three lines. A longer name isn't worth it.
Sign in to reply to this message.
PTAL. https://codereview.appspot.com/6545051/diff/37001/src/pkg/syscall/exec_plan9.go File src/pkg/syscall/exec_plan9.go (right): https://codereview.appspot.com/6545051/diff/37001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:514: func makeProcs() { On 2013/01/10 00:46:32, ality wrote: > I would just make this another init function > and get rid of the sync.Once variable altogether. > > This is just my preference. I'd rather allocate > package globals at startup then on the first call > to a function. The type and struct declarations here (which would have to be moved as well) are nicely localized with the fork, startProcess, and wait code; the init function above is nicely localized with fd code. I think this helps code readability and structure. There does not seem to be an elegant way to reconcile the two. Still, if there is a second opinion about this, I'm willing to neglect my aesthetic impulses. https://codereview.appspot.com/6545051/diff/37001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:529: type pidErr struct { On 2013/01/10 00:46:32, ality wrote: > I think the code would read nicer if this was named "forkRet" > and the variables were "r" or "ret" instead of "rval" > throughout. Done. https://codereview.appspot.com/6545051/diff/37001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:540: pid := rval.pid On 2013/01/10 00:46:32, ality wrote: > Delete this line. Done. https://codereview.appspot.com/6545051/diff/37001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:542: if rval.err != nil || pid == 0 { On 2013/01/10 00:46:32, ality wrote: > Make this rval.pid. Done. https://codereview.appspot.com/6545051/diff/37001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:548: waits := procs.waits On 2013/01/10 00:46:32, ality wrote: > Delete this line. Done. https://codereview.appspot.com/6545051/diff/37001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:549: ch := make(chan *waitErr, 1) On 2013/01/10 00:46:32, ality wrote: > I would name this "waitc". Done. https://codereview.appspot.com/6545051/diff/37001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:553: waits[pid] = ch On 2013/01/10 00:46:32, ality wrote: > Make this procs.waits. Done. https://codereview.appspot.com/6545051/diff/37001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:558: var wmsg waitErr On 2013/01/10 00:46:32, ality wrote: > Just "w" seems fine. The variable is localized to just these three lines. A > longer name isn't worth it. Done.
Sign in to reply to this message.
https://codereview.appspot.com/6545051/diff/44001/src/pkg/syscall/exec_plan9.go File src/pkg/syscall/exec_plan9.go (right): https://codereview.appspot.com/6545051/diff/44001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:546: procs.once.Do(makeProcs) This is unnecessary, since procs is locked just a few lines later. waits can be initialized then, with a simple if statement. https://codereview.appspot.com/6545051/diff/44001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:621: procs.once.Do(makeProcs) Delete. https://codereview.appspot.com/6545051/diff/44001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:622: waits := procs.waits Delete. https://codereview.appspot.com/6545051/diff/44001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:625: ch := waits[pid] s/waits/proc.waits/ https://codereview.appspot.com/6545051/diff/44001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:632: if waits[pid] == ch { s/waits/procs.waits/
Sign in to reply to this message.
PTAL. https://codereview.appspot.com/6545051/diff/44001/src/pkg/syscall/exec_plan9.go File src/pkg/syscall/exec_plan9.go (right): https://codereview.appspot.com/6545051/diff/44001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:546: procs.once.Do(makeProcs) On 2013/01/18 20:28:15, rsc wrote: > This is unnecessary, since procs is locked just a few lines later. waits can be > initialized then, with a simple if statement. Done. https://codereview.appspot.com/6545051/diff/44001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:621: procs.once.Do(makeProcs) On 2013/01/18 20:28:15, rsc wrote: > Delete. Done. https://codereview.appspot.com/6545051/diff/44001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:622: waits := procs.waits On 2013/01/18 20:28:15, rsc wrote: > Delete. Done. https://codereview.appspot.com/6545051/diff/44001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:625: ch := waits[pid] On 2013/01/18 20:28:15, rsc wrote: > s/waits/proc.waits/ Done. https://codereview.appspot.com/6545051/diff/44001/src/pkg/syscall/exec_plan9.... src/pkg/syscall/exec_plan9.go:632: if waits[pid] == ch { On 2013/01/18 20:28:15, rsc wrote: > s/waits/procs.waits/ Done.
Sign in to reply to this message.
LGTM
Sign in to reply to this message.
*** Submitted as https://code.google.com/p/go/source/detail?r=12eff5fd6f09 *** syscall, os: fix a fork-exec/wait race in Plan 9. On Plan 9, only the parent of a given process can enter its wait queue. When a Go program tries to fork-exec a child process and subsequently waits for it to finish, the goroutines doing these two tasks do not necessarily tie themselves to the same (or any single) OS thread. In the case that the fork and the wait system calls happen on different OS threads (say, due to a goroutine being rescheduled somewhere along the way), the wait() will either return an error or end up waiting for a completely different child than was intended. This change forces the fork and wait syscalls to happen in the same goroutine and ties that goroutine to its OS thread until the child exits. The PID of the child is recorded upon fork and exit, and de-queued once the child's wait message has been read. The Wait API, then, is translated into a synthetic implementation that simply waits for the requested PID to show up in the queue and then reads the associated stats. R=rsc, rminnich, npe, mirtchovski, ality CC=golang-dev https://codereview.appspot.com/6545051 Committer: Russ Cox <rsc@golang.org>
Sign in to reply to this message.
Russ, looking at package syscall again, I think my point below still stands: we are exposing at least two functions - ForkExec and Exec - separate from StartProcess that will do rfork()s. If either of these is done before a StartProcess, on the same OS thread, then the Await() in StartProcess may come back for the wrong pid, and we're not checking for this. I think my changes below should account for this. What do you think? On 8 January 2013 02:12, <seed@mail.nanosouffle.net> wrote: > On a second thought, I think this is not quite right: in buying the > ability for multiple goroutines on the same go thread to wait on one > process without a deadlock, we've lost a proper routing of wait messages > in the case that multiple processes are started from one go thread (thus > causing another deadlock). > > However, I think the fix is not too drastic and channels are the right > way forward. > > > https://codereview.appspot.com/6545051/diff/37001/src/pkg/syscall/exec_plan9.go > File src/pkg/syscall/exec_plan9.go (right): > > https://codereview.appspot.com/6545051/diff/37001/src/pkg/syscall/exec_plan9.... > src/pkg/syscall/exec_plan9.go:560: ch <- &wmsg > Note that there is no guarantee here that wmsg.Pid == pid, since this > thread may have started more than one process. If indeed wmsg.Pid != > pid, then anyone waiting on waits[wmsg.Pid] will continue to wait > forever. > > I think the way to fix this is: > > if waits[wmsg.Pid] != nil { > waits[wmsg.Pid] <- &wmsg > close(waits[wmsg.Pid]) > } > > (and not to do the close(ch) itself). > > https://codereview.appspot.com/6545051/diff/37001/src/pkg/syscall/exec_plan9.... > src/pkg/syscall/exec_plan9.go:631: if ch != nil { > If the above change is not made, then there are no guarantees anymore > that WaitProcess() will wait for the given pid if the process is still > running, since it simply gives up after the first wait message that is > received, regardless of the pid. > > https://codereview.appspot.com/6545051/diff/37001/src/pkg/syscall/exec_plan9.... > src/pkg/syscall/exec_plan9.go:634: if waits[pid] == ch { > Should this not be: waits[wmsg.Pid] ? Otherwise, this is a tautology. > > https://codereview.appspot.com/6545051/
Sign in to reply to this message.
|
