|
|
Description'close' race fixed/suggestions from rcs
Ref: Message-ID: <001485f6d9f46735c90479a3f25c@google.com>
Patch Set 1 : code review 164057: 'close' race fixed/suggestions from rcs #Patch Set 2 : code review 164057: 'close' race fixed/suggestions from rcs #Patch Set 3 : code review 164057: 'close' race fixed/suggestions from rcs #
Total comments: 4
MessagesTotal messages: 13
Changes as suggested by rsc on the list. These do NOT fix the issue for me. I notice the issue does occur even for GOMAXPROCS > 1, though it seems harder to trigger for large values.
Sign in to reply to this message.
Hello rsc, dho, agl (cc: cw), I'd like you to review the following change.
Sign in to reply to this message.
Probably these apply to dho's copy too. http://codereview.appspot.com/164057/diff/1023/1024 File src/pkg/net/fd.go (right): http://codereview.appspot.com/164057/diff/1023/1024#newcode208 src/pkg/net/fd.go:208: fd.incref(); I don't think this needs incref/decref. If the fd has been passed to the poll server, the goroutine who gave it to the poll server is in charge of keeping the ref count up. (If it did need an incref, that would be a bug, because the kernel has the sysfd right now too.) http://codereview.appspot.com/164057/diff/1023/1024#newcode500 src/pkg/net/fd.go:500: fd.incref(); Please move these two lines up above the comment about syscall/exec.go. Once syscall.ForkLock.RLock() has been called, no other locks should be acquired, and there's one in incref. http://codereview.appspot.com/164057/diff/1023/1026 File src/pkg/net/tcpsock.go (right): http://codereview.appspot.com/164057/diff/1023/1026#newcode76 src/pkg/net/tcpsock.go:76: fd.incref(); There's no need for the incref/decref here. This is a constructor so there are no other references to the fd, hence no concurrent Close. http://codereview.appspot.com/164057/diff/1023/1027 File src/pkg/net/udpsock.go (right): http://codereview.appspot.com/164057/diff/1023/1027#newcode76 src/pkg/net/udpsock.go:76: fd.incref(); No need here.
Sign in to reply to this message.
2009/12/1 <rsc@golang.org>: > Probably these apply to dho's copy too. Yep. Synched with these in my copy, #321 still passes; #360 still hanging. --dho
Sign in to reply to this message.
> Yep. Synched with these in my copy, #321 still passes; #360 still hanging. can you get a ktrace of the hang? i tried to strace it on linux and it made the hang go away.
Sign in to reply to this message.
2009/12/1 Russ Cox <rsc@golang.org>: >> Yep. Synched with these in my copy, #321 still passes; #360 still hanging. > > can you get a ktrace of the hang? i tried to strace it on linux > and it made the hang go away. I'm piddling in Linux right now. Stack: http://golang.pastebin.com/d71d96013 strace: http://golang.pastebin.com/d42d23c16 --dho
Sign in to reply to this message.
The strace is only showing one thread. You have to say -f to make it follow clone, and then at that point I think the bug disappears, because strace is making the thing run single-threaded. That goroutine with the messed up stack is probably executing on the other cpu, and we don't get its stack because the ABRT gets handled by the thread doing the epoll. I wonder if it would work to send the ABRT to the other thread. If you ls /proc/$pid/task you can see the "pids" for the other threads. Russ
Sign in to reply to this message.
I can still get it with -f for whatever reason: execve("./6.out", ["./6.out"], [/* 49 vars */]) = 0 brk(0) = 0x99049de000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af5141ae000 uname({sys="Linux", node="hoko", ...}) = 0 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af5141af000 arch_prctl(ARCH_SET_FS, 0x2af5141af6d0) = 0 rt_sigaction(SIGQUIT, {0x4386e9, ~[], SA_RESTORER|SA_STACK|SA_SIGINFO, 0x43870a}, NULL, 8) = 0 rt_sigaction(SIGILL, {0x4386e9, ~[], SA_RESTORER|SA_STACK|SA_SIGINFO, 0x43870a}, NULL, 8) = 0 rt_sigaction(SIGTRAP, {0x4386e9, ~[], SA_RESTORER|SA_STACK|SA_SIGINFO, 0x43870a}, NULL, 8) = 0 rt_sigaction(SIGABRT, {0x4386e9, ~[], SA_RESTORER|SA_STACK|SA_SIGINFO, 0x43870a}, NULL, 8) = 0 rt_sigaction(SIGBUS, {0x4386e9, ~[], SA_RESTORER|SA_STACK|SA_SIGINFO, 0x43870a}, NULL, 8) = 0 rt_sigaction(SIGFPE, {0x4386e9, ~[], SA_RESTORER|SA_STACK|SA_SIGINFO, 0x43870a}, NULL, 8) = 0 rt_sigaction(SIGSEGV, {0x4386e9, ~[], SA_RESTORER|SA_STACK|SA_SIGINFO, 0x43870a}, NULL, 8) = 0 rt_sigaction(SIGPIPE, {0x438709, ~[], SA_RESTORER|SA_STACK|SA_SIGINFO, 0x43870a}, NULL, 8) = 0 rt_sigaction(SIGCHLD, {0x438709, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x43870a}, NULL, 8) = 0 rt_sigaction(SIGWINCH, {0x438709, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x43870a}, NULL, 8) = 0 rt_sigaction(SIGSYS, {0x4386e9, ~[], SA_RESTORER|SA_STACK|SA_SIGINFO, 0x43870a}, NULL, 8) = 0 mmap(NULL, 131072, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af5141b0000 mmap(NULL, 1048576, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af5141d0000 mmap(NULL, 2097152, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af5142d0000 mmap(NULL, 524288, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af5144d0000 mmap(NULL, 131072, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af514550000 sigaltstack({ss_sp=0x2af51421e100, ss_flags=0, ss_size=32768}, NULL) = 0 socket(PF_INET6, SOCK_STREAM, IPPROTO_TCP) = 3 close(3) = 0 write(1, "chanrecv: chan=", 15chanrecv: chan=) = 15 write(1, "0x2af5142400f0", 140x2af5142400f0) = 14 write(1, "\n", 1 ) = 1 socket(PF_INET6, SOCK_STREAM, IPPROTO_IP) = 3 fcntl(3, F_SETFD, FD_CLOEXEC) = 0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 bind(3, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0 getsockname(3, {sa_family=AF_INET6, sin6_port=htons(35832), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0 getpeername(3, 0x2af5141d9850, [112]) = -1 ENOTCONN (Transport endpoint is not connected) pipe([4, 5]) = 0 fcntl(4, F_SETFD, FD_CLOEXEC) = 0 fcntl(5, F_SETFD, FD_CLOEXEC) = 0 fcntl(4, F_GETFL) = 0 (flags O_RDONLY) fcntl(4, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 fcntl(5, F_GETFL) = 0x1 (flags O_WRONLY) fcntl(5, F_SETFL, O_WRONLY|O_NONBLOCK) = 0 epoll_create(16) = 6 epoll_ctl(6, EPOLL_CTL_ADD, 4, {EPOLLIN|0x2000, {u32=4, u64=4}}) = 0 clone(Process 25859 attached child_stack=0x2af514244100, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_PARENT|CLONE_THREAD) = 25859 [pid 25858] fcntl(3, F_GETFL) = 0x2 (flags O_RDWR) [pid 25858] futex(0x4aaf68, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25859] gettid() = 25859 [pid 25859] sigaltstack({ss_sp=0x2af51424a100, ss_flags=0, ss_size=32768}, NULL) = 0 [pid 25859] futex(0x4aaf68, FUTEX_WAKE, 1 <unfinished ...> [pid 25858] <... futex resumed> ) = 0 [pid 25858] futex(0x6bac18, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25859] <... futex resumed> ) = 1 [pid 25859] futex(0x6bac18, FUTEX_WAKE, 1 <unfinished ...> [pid 25858] <... futex resumed> ) = 0 [pid 25858] fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0 [pid 25858] listen(3, 128) = 0 [pid 25858] gettimeofday({1259698599, 491528}, NULL) = 0 [pid 25858] open("/etc/localtime", O_RDONLY|0x80000 /* O_??? */) = 7 [pid 25858] read(7, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\4\0\0\0\4\0\0\0\0"..., 32768) = 3519 [pid 25858] read(7, "", 32768) = 0 [pid 25858] close(7) = 0 [pid 25858] write(2, "2009/12/01 15:16:39 Test WebSock"..., 672009/12/01 15:16:39 Test WebSocket server listening on [::]:35832 ) = 67 [pid 25858] clone(Process 25860 attached child_stack=0x2af514246300, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_PARENT|CLONE_THREAD) = 25860 [pid 25858] socket(PF_INET6, SOCK_STREAM, IPPROTO_IP <unfinished ...> [pid 25860] gettid( <unfinished ...> [pid 25858] <... socket resumed> ) = 7 [pid 25858] futex(0x4aaf68, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25859] <... futex resumed> ) = 1 [pid 25860] <... gettid resumed> ) = 25860 [pid 25859] epoll_wait(6, <unfinished ...> [pid 25860] sigaltstack({ss_sp=0x2af51425f100, ss_flags=0, ss_size=32768}, NULL) = 0 [pid 25860] futex(0x4aaf68, FUTEX_WAKE, 1) = 1 [pid 25858] <... futex resumed> ) = 0 [pid 25860] accept(3, <unfinished ...> [pid 25858] fcntl(7, F_SETFD, FD_CLOEXEC <unfinished ...> [pid 25860] <... accept resumed> 0x2af514268000, [112]) = -1 EAGAIN (Resource temporarily unavailable) [pid 25858] <... fcntl resumed> ) = 0 [pid 25860] write(5, "\0", 1 <unfinished ...> [pid 25858] setsockopt(7, SOL_SOCKET, SO_REUSEADDR, [1], 4 <unfinished ...> [pid 25860] <... write resumed> ) = 1 [pid 25858] <... setsockopt resumed> ) = 0 [pid 25858] connect(7, {sa_family=AF_INET6, sin6_port=htons(35832), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28 <unfinished ...> [pid 25860] write(1, "chanrecv: chan=", 15chanrecv: chan=) = 15 [pid 25860] write(1, "0x2af5142401e0", 140x2af5142401e0 <unfinished ...> [pid 25858] <... connect resumed> ) = 0 [pid 25860] <... write resumed> ) = 14 [pid 25860] write(1, "\n", 1 <unfinished ...> [pid 25858] futex(0x4aaf68, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25859] <... epoll_wait resumed> {{EPOLLIN, {u32=4, u64=4}}}, 1, -1) = 1 [pid 25860] <... write resumed> ) = 1 [pid 25859] futex(0x2af5142410a8, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25860] futex(0x2af5142410a8, FUTEX_WAKE, 1 <unfinished ...> [pid 25859] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable) [pid 25859] futex(0x6bac18, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25860] <... futex resumed> ) = 0 [pid 25860] futex(0x6bac18, FUTEX_WAKE, 1) = 1 [pid 25859] <... futex resumed> ) = 0 [pid 25860] getsockname(7, <unfinished ...> [pid 25859] read(4, <unfinished ...> [pid 25860] <... getsockname resumed> {sa_family=AF_INET6, sin6_port=htons(59749), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0 [pid 25859] <... read resumed> "\0", 100) = 1 [pid 25860] getpeername(7, <unfinished ...> [pid 25859] read(4, <unfinished ...> [pid 25860] <... getpeername resumed> {sa_family=AF_INET6, sin6_port=htons(35832), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0 [pid 25859] <... read resumed> 0x2af5141d9e00, 100) = -1 EAGAIN (Resource temporarily unavailable) [pid 25860] fcntl(7, F_GETFL) = 0x2 (flags O_RDWR) [pid 25860] futex(0x2af5142411c8, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25859] write(1, "chanrecv: chan=", 15chanrecv: chan=) = 15 [pid 25859] write(1, "0x2af514240140", 140x2af514240140) = 14 [pid 25859] write(1, "\n", 1 ) = 1 [pid 25859] futex(0x2af5142411c8, FUTEX_WAKE, 1) = 1 [pid 25860] <... futex resumed> ) = 0 [pid 25859] epoll_ctl(6, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLONESHOT|0x2000, {u32=3, u64=3}}) = 0 [pid 25859] futex(0x2af5142410a8, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25860] futex(0x2af5142410a8, FUTEX_WAKE, 1) = 0 [pid 25859] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable) [pid 25860] fcntl(7, F_SETFL, O_RDWR|O_NONBLOCK) = 0 [pid 25859] write(1, "chanrecv: chan=", 15 <unfinished ...> [pid 25860] futex(0x2af5142411c8, FUTEX_WAIT, 3, {1073741824, 0}chanrecv: chan= <unfinished ...> [pid 25859] <... write resumed> ) = 15 [pid 25859] write(1, "0x2af514240140", 140x2af514240140) = 14 [pid 25859] write(1, "\n", 1 ) = 1 [pid 25859] write(1, "chanrecv: chan=", 15chanrecv: chan=) = 15 [pid 25859] write(1, "0x2af514240190", 140x2af514240190) = 14 [pid 25859] write(1, "\n", 1 ) = 1 [pid 25859] futex(0x2af5142411c8, FUTEX_WAKE, 1) = 1 [pid 25860] <... futex resumed> ) = 0 [pid 25859] epoll_wait(6, {{EPOLLIN, {u32=3, u64=3}}}, 1, -1) = 1 [pid 25860] setsockopt(7, SOL_TCP, TCP_NODELAY, [1], 4) = 0 [pid 25859] epoll_ctl(6, EPOLL_CTL_DEL, 3, NULL) = 0 [pid 25859] futex(0x2af5142410a8, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25860] futex(0x2af5142410a8, FUTEX_WAKE, 1) = 1 [pid 25859] <... futex resumed> ) = 0 [pid 25860] write(7, "GET /echo HTTP/1.1\r\nUpgrade: Web"..., 106 <unfinished ...> [pid 25859] futex(0x4aaf68, FUTEX_WAKE, 1) = 1 [pid 25858] <... futex resumed> ) = 0 [pid 25859] epoll_wait(6, <unfinished ...> [pid 25858] accept(3, {sa_family=AF_INET6, sin6_port=htons(59749), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 8 [pid 25860] <... write resumed> ) = 106 [pid 25858] fcntl(8, F_SETFD, FD_CLOEXEC) = 0 [pid 25858] futex(0x4aaf68, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25860] futex(0x4aaf68, FUTEX_WAKE, 1) = 1 [pid 25858] <... futex resumed> ) = 0 [pid 25860] read(7, <unfinished ...> [pid 25858] fcntl(8, F_GETFL) = 0x2 (flags O_RDWR) [pid 25858] fcntl(8, F_SETFL, O_RDWR|O_NONBLOCK <unfinished ...> [pid 25860] <... read resumed> 0x2af51426b000, 4096) = -1 EAGAIN (Resource temporarily unavailable) [pid 25858] <... fcntl resumed> ) = 0 [pid 25860] write(5, "\0", 1) = 1 [pid 25860] futex(0x2af5142411c8, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25858] futex(0x2af5142411c8, FUTEX_WAKE, 1 <unfinished ...> [pid 25859] <... epoll_wait resumed> {{EPOLLIN, {u32=4, u64=4}}}, 1, -1) = 1 [pid 25858] <... futex resumed> ) = 1 [pid 25860] <... futex resumed> ) = 0 [pid 25860] write(1, "chanrecv: chan=", 15chanrecv: chan= <unfinished ...> [pid 25859] futex(0x6bac18, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25860] <... write resumed> ) = 15 [pid 25860] write(1, "0x2af514240a00", 14 <unfinished ...> 0x2af514240a00[pid 25858] futex(0x6bac18, FUTEX_WAKE, 1 <unfinished ...> [pid 25860] <... write resumed> ) = 14 [pid 25859] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable) [pid 25860] write(1, "\n", 1 <unfinished ...> [pid 25858] <... futex resumed> ) = 0 [pid 25860] <... write resumed> ) = 1 [pid 25859] futex(0x2af5142410a8, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25858] setsockopt(8, SOL_TCP, TCP_NODELAY, [1], 4) = 0 [pid 25858] futex(0x4aaf68, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25860] futex(0x4aaf68, FUTEX_WAKE, 1) = 1 [pid 25858] <... futex resumed> ) = 0 [pid 25860] read(4, "\0", 100) = 1 [pid 25860] futex(0x2af5142411c8, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25858] futex(0x2af5142411c8, FUTEX_WAKE, 1) = 1 [pid 25860] <... futex resumed> ) = 0 [pid 25858] accept(3, 0x2af5141d9a80, [112]) = -1 EAGAIN (Resource temporarily unavailable) [pid 25860] futex(0x2af5142410a8, FUTEX_WAKE, 1) = 1 [pid 25858] futex(0x6bac18, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25860] futex(0x6bac18, FUTEX_WAKE, 1 <unfinished ...> [pid 25859] <... futex resumed> ) = 0 [pid 25858] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable) [pid 25860] <... futex resumed> ) = 0 [pid 25860] read(4, <unfinished ...> [pid 25859] futex(0x6bac18, FUTEX_WAKE, 1 <unfinished ...> [pid 25860] <... read resumed> 0x2af5141d9e00, 100) = -1 EAGAIN (Resource temporarily unavailable) [pid 25860] futex(0x6bac18, FUTEX_WAKE, 1 <unfinished ...> [pid 25858] futex(0x6bac18, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25860] <... futex resumed> ) = 0 [pid 25860] futex(0x2af5142411c8, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25859] <... futex resumed> ) = 0 [pid 25858] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable) [pid 25859] read(8, <unfinished ...> [pid 25858] write(1, "chanrecv: chan=", 15 <unfinished ...> [pid 25859] <... read resumed> "GET /echo HTTP/1.1\r\nUpgrade: Web"..., 4096) = 106 chanrecv: chan=[pid 25858] <... write resumed> ) = 15 [pid 25859] futex(0x2af5142410a8, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25858] write(1, "0x2af514240140", 140x2af514240140) = 14 [pid 25858] write(1, "\n", 1 ) = 1 [pid 25858] futex(0x2af5142410a8, FUTEX_WAKE, 1) = 1 [pid 25859] <... futex resumed> ) = 0 [pid 25858] epoll_ctl(6, EPOLL_CTL_ADD, 7, {EPOLLIN|EPOLLONESHOT|0x2000, {u32=7, u64=7}} <unfinished ...> [pid 25859] futex(0x2af5142411c8, FUTEX_WAKE, 1 <unfinished ...> [pid 25858] <... epoll_ctl resumed> ) = 0 [pid 25859] <... futex resumed> ) = 1 [pid 25860] <... futex resumed> ) = 0 [pid 25858] futex(0x6bac18, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25860] futex(0x6bac18, FUTEX_WAIT, 5, {1073741824, 0}) = -1 EAGAIN (Resource temporarily unavailable) [pid 25859] futex(0x6bac18, FUTEX_WAKE, 1 <unfinished ...> [pid 25860] futex(0x6bac18, FUTEX_WAKE, 1) = 0 [pid 25859] <... futex resumed> ) = 0 [pid 25858] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable) [pid 25859] write(8, "HTTP/1.1 101 Web Socket Protocol"..., 164 <unfinished ...> [pid 25858] write(1, "chanrecv: chan=", 15chanrecv: chan= <unfinished ...> [pid 25859] <... write resumed> ) = 164 [pid 25858] <... write resumed> ) = 15 [pid 25859] futex(0x2af5142410a8, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25858] write(1, "0x2af514240140", 140x2af514240140 <unfinished ...> [pid 25860] write(5, "\0", 1 <unfinished ...> [pid 25858] <... write resumed> ) = 14 [pid 25860] <... write resumed> ) = 1 [pid 25858] write(1, "\n", 1 ) = 1 [pid 25860] futex(0x6bac18, FUTEX_WAKE, 1 <unfinished ...> [pid 25858] futex(0x6bac18, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25860] <... futex resumed> ) = 0 [pid 25860] futex(0x2af5142411c8, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25858] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable) [pid 25858] futex(0x2af5142411c8, FUTEX_WAKE, 1 <unfinished ...> [pid 25860] <... futex resumed> ) = 0 [pid 25858] <... futex resumed> ) = 1 [pid 25858] epoll_ctl(6, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLONESHOT|0x2000, {u32=3, u64=3}}) = 0 [pid 25858] futex(0x4aaf68, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25860] futex(0x4aaf68, FUTEX_WAKE, 1) = 1 [pid 25860] read(8, <unfinished ...> [pid 25858] <... futex resumed> ) = 0 [pid 25860] <... read resumed> 0x2af514273000, 4096) = -1 EAGAIN (Resource temporarily unavailable) [pid 25858] write(1, "chanrecv: chan=", 15chanrecv: chan= <unfinished ...> [pid 25860] futex(0x2af5142411c8, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25858] <... write resumed> ) = 15 [pid 25858] write(1, "0x2af5142401e0", 140x2af5142401e0) = 14 [pid 25858] write(1, "\n", 1 ) = 1 [pid 25858] write(1, "chanrecv: chan=", 15chanrecv: chan=) = 15 [pid 25858] write(1, "0x2af514240140", 140x2af514240140) = 14 [pid 25858] write(1, "\n", 1 ) = 1 [pid 25858] write(1, "chanrecv: chan=", 15chanrecv: chan=) = 15 [pid 25858] write(1, "0x2af514240190", 140x2af514240190) = 14 [pid 25858] write(1, "\n", 1 ) = 1 [pid 25858] futex(0x2af5142411c8, FUTEX_WAKE, 1) = 1 [pid 25860] <... futex resumed> ) = 0 [pid 25858] epoll_wait(6, {{EPOLLIN, {u32=4, u64=4}}}, 1, -1) = 1 [pid 25860] futex(0x4aaf68, FUTEX_WAKE, 1 <unfinished ...> [pid 25858] futex(0x4aaf68, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25860] <... futex resumed> ) = 0 [pid 25858] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable) [pid 25860] write(5, "\0", 1 <unfinished ...> [pid 25858] read(4, <unfinished ...> [pid 25860] <... write resumed> ) = 1 [pid 25858] <... read resumed> "\0\0", 100) = 2 [pid 25860] write(1, "chanrecv: chan=", 15 <unfinished ...> [pid 25858] futex(0x4aaf68, FUTEX_WAIT, 3, {1073741824, 0}chanrecv: chan= <unfinished ...> [pid 25860] <... write resumed> ) = 15 [pid 25860] write(1, "0x2af5142403c0", 140x2af5142403c0) = 14 [pid 25860] write(1, "\n", 1 ) = 1 [pid 25860] read(4, 0x2af5141d9e00, 100) = -1 EAGAIN (Resource temporarily unavailable) [pid 25860] write(1, "chanrecv: chan=", 15chanrecv: chan=) = 15 [pid 25860] write(1, "0x2af514240140", 140x2af514240140) = 14 [pid 25860] write(1, "\n", 1 ) = 1 [pid 25860] epoll_ctl(6, EPOLL_CTL_ADD, 8, {EPOLLIN|EPOLLONESHOT|0x2000, {u32=8, u64=8}}) = 0 [pid 25860] write(1, "chanrecv: chan=", 15chanrecv: chan=) = 15 [pid 25860] write(1, "0x2af514240140", 140x2af514240140) = 14 [pid 25860] write(1, "\n", 1 ) = 1 [pid 25860] write(1, "chanrecv: chan=", 15chanrecv: chan=) = 15 [pid 25860] write(1, "0x2af514240190", 140x2af514240190) = 14 [pid 25860] write(1, "\n", 1 ) = 1 [pid 25860] epoll_wait(6, {{EPOLLIN, {u32=7, u64=7}}}, 1, -1) = 1 [pid 25860] epoll_ctl(6, EPOLL_CTL_DEL, 7, NULL) = 0 [pid 25860] futex(0x4aaf68, FUTEX_WAKE, 1) = 1 [pid 25858] <... futex resumed> ) = 0 [pid 25860] epoll_wait(6, <unfinished ...> [pid 25858] read(7, "HTTP/1.1 101 Web Socket Protocol"..., 4096) = 164 [pid 25858] write(7, "\0hello, world\n\377", 15 <unfinished ...> [pid 25860] <... epoll_wait resumed> {{EPOLLIN, {u32=8, u64=8}}}, 1, -1) = 1 [pid 25858] <... write resumed> ) = 15 [pid 25860] epoll_ctl(6, EPOLL_CTL_DEL, 8, NULL <unfinished ...> [pid 25858] read(7, <unfinished ...> [pid 25860] <... epoll_ctl resumed> ) = 0 [pid 25860] futex(0x2af5142410a8, FUTEX_WAKE, 1 <unfinished ...> [pid 25858] <... read resumed> 0x2af51426b000, 4096) = -1 EAGAIN (Resource temporarily unavailable) [pid 25860] <... futex resumed> ) = 1 [pid 25859] <... futex resumed> ) = 0 [pid 25858] futex(0x6bac18, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25859] futex(0x6bac18, FUTEX_WAIT, 5, {1073741824, 0} <unfinished ...> [pid 25858] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable) [pid 25860] futex(0x6bac18, FUTEX_WAKE, 1 <unfinished ...> [pid 25858] futex(0x6bac18, FUTEX_WAKE, 1 <unfinished ...> [pid 25860] <... futex resumed> ) = 0 [pid 25859] <... futex resumed> ) = 0 [pid 25858] <... futex resumed> ) = 1 [pid 25859] read(8, <unfinished ...> [pid 25858] write(5, "\0", 1 <unfinished ...> [pid 25859] <... read resumed> "\0hello, world\n\377", 4096) = 15 [pid 25858] <... write resumed> ) = 1 [pid 25859] write(8, "\0hello, world\n\377", 15 <unfinished ...> [pid 25858] write(1, "chanrecv: chan=", 15chanrecv: chan= <unfinished ...> [pid 25860] epoll_wait(6, <unfinished ...> [pid 25859] <... write resumed> ) = 15 [pid 25858] <... write resumed> ) = 15 [pid 25860] <... epoll_wait resumed> {{EPOLLIN, {u32=4, u64=4}}}, 1, -1) = 1 [pid 25859] futex(0x2af5142410a8, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25858] write(1, "0x2af514240a00", 140x2af514240a00 <unfinished ...> [pid 25860] futex(0x2af5142411c8, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25858] <... write resumed> ) = 14 [pid 25858] write(1, "\n", 1 ) = 1 [pid 25858] futex(0x2af5142411c8, FUTEX_WAKE, 1 <unfinished ...> [pid 25860] <... futex resumed> ) = 0 [pid 25858] <... futex resumed> ) = 1 [pid 25860] read(4, <unfinished ...> [pid 25858] read(8, <unfinished ...> [pid 25860] <... read resumed> "\0", 100) = 1 [pid 25858] <... read resumed> 0x2af514273000, 4096) = -1 EAGAIN (Resource temporarily unavailable) [pid 25860] futex(0x2af5142411c8, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25858] read(4, 0x2af5141d9e00, 100) = -1 EAGAIN (Resource temporarily unavailable) [pid 25858] write(1, "chanrecv: chan=", 15chanrecv: chan=) = 15 [pid 25858] write(1, "0x2af514240140", 140x2af514240140) = 14 [pid 25858] write(1, "\n", 1 ) = 1 [pid 25858] futex(0x2af5142411c8, FUTEX_WAKE, 1 <unfinished ...> [pid 25860] <... futex resumed> ) = 0 [pid 25858] <... futex resumed> ) = 1 [pid 25858] epoll_ctl(6, EPOLL_CTL_ADD, 7, {EPOLLIN|EPOLLONESHOT|0x2000, {u32=7, u64=7}} <unfinished ...> [pid 25860] write(5, "\0", 1) = 1 [pid 25860] write(1, "chanrecv: chan=", 15chanrecv: chan=) = 15 [pid 25860] write(1, "0x2af5142403c0", 14 <unfinished ...> [pid 25858] <... epoll_ctl resumed> ) = 0 0x2af5142403c0[pid 25860] <... write resumed> ) = 14 [pid 25860] write(1, "\n", 1 ) = 1 [pid 25860] futex(0x2af5142411c8, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25858] write(1, "chanrecv: chan=", 15chanrecv: chan=) = 15 [pid 25858] write(1, "0x2af514240140", 140x2af514240140) = 14 [pid 25858] write(1, "\n", 1 ) = 1 [pid 25858] epoll_ctl(6, EPOLL_CTL_ADD, 8, {EPOLLIN|EPOLLONESHOT|0x2000, {u32=8, u64=8}}) = 0 [pid 25858] write(1, "chanrecv: chan=", 15chanrecv: chan=) = 15 [pid 25858] write(1, "0x2af514240140", 140x2af514240140) = 14 [pid 25858] write(1, "\n", 1 ) = 1 [pid 25858] write(1, "chanrecv: chan=", 15chanrecv: chan=) = 15 [pid 25858] write(1, "0x2af514240190", 140x2af514240190) = 14 [pid 25858] write(1, "\n", 1 ) = 1 [pid 25858] epoll_wait(6, {{EPOLLIN, {u32=4, u64=4}}}, 1, -1) = 1 [pid 25858] read(4, "\0", 100) = 1 [pid 25858] read(4, 0x2af5141d9e00, 100) = -1 EAGAIN (Resource temporarily unavailable) [pid 25858] write(1, "chanrecv: chan=", 15chanrecv: chan=) = 15 [pid 25858] write(1, "0x2af514240140", 140x2af514240140) = 14 [pid 25858] write(1, "\n", 1 ) = 1 [pid 25858] write(1, "chanrecv: chan=", 15chanrecv: chan=) = 15 [pid 25858] write(1, "0x2af514240190", 140x2af514240190) = 14 [pid 25858] write(1, "\n", 1 ) = 1 [pid 25858] epoll_wait(6, {{EPOLLIN, {u32=7, u64=7}}}, 1, -1) = 1 [pid 25858] epoll_ctl(6, EPOLL_CTL_DEL, 7, NULL) = 0 [pid 25858] futex(0x2af5142411c8, FUTEX_WAKE, 1) = 1 [pid 25860] <... futex resumed> ) = 0 [pid 25860] futex(0x6bac18, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25858] futex(0x6bac18, FUTEX_WAKE, 1 <unfinished ...> [pid 25860] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable) [pid 25858] <... futex resumed> ) = 0 [pid 25860] read(7, <unfinished ...> [pid 25858] epoll_wait(6, <unfinished ...> [pid 25860] <... read resumed> "\0hello, world\n\377", 4096) = 15 [pid 25860] shutdown(7, 2 /* send and receive */) = 0 [pid 25858] <... epoll_wait resumed> {{EPOLLIN|0x2000, {u32=8, u64=8}}}, 1, -1) = 1 [pid 25860] fcntl(7, F_GETFL <unfinished ...> [pid 25858] epoll_ctl(6, EPOLL_CTL_DEL, 8, NULL <unfinished ...> [pid 25860] <... fcntl resumed> ) = 0x802 (flags O_RDWR|O_NONBLOCK) [pid 25858] <... epoll_ctl resumed> ) = 0 [pid 25860] futex(0x2af5142411c8, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25858] futex(0x2af5142411c8, FUTEX_WAKE, 1 <unfinished ...> [pid 25860] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable) [pid 25860] futex(0x6bac18, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25858] <... futex resumed> ) = 0 [pid 25860] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable) [pid 25858] futex(0x6bac18, FUTEX_WAKE, 1 <unfinished ...> [pid 25860] futex(0x2af5142410a8, FUTEX_WAKE, 1) = 1 [pid 25860] fcntl(7, F_SETFL, O_RDWR) = 0 [pid 25860] futex(0x2af5142411c8, FUTEX_WAIT, 3, {1073741824, 0} <unfinished ...> [pid 25859] <... futex resumed> ) = 0 [pid 25858] <... futex resumed> ) = 0 [pid 25859] futex(0x2af5142411c8, FUTEX_WAKE, 1 <unfinished ...> [pid 25858] epoll_wait(6, <unfinished ...> [pid 25859] <... futex resumed> ) = 1 [pid 25860] <... futex resumed> ) = 0 [pid 25859] read(8, <unfinished ...> [pid 25860] close(7 <unfinished ...> [pid 25859] <... read resumed> "", 4096) = 0 [pid 25860] <... close resumed> ) = 0 [pid 25860] futex(0x2af5142411c8, FUTEX_WAIT, 3, {1073741824, 0} SIGABRT: abort Faulting address: 0x23c00006507 PC=0x40193e websocket·*Conn·Read+0x35 /home/devon/golang/src/pkg/websocket/websocket.go:53 websocket·*Conn·Read(0xd3e83aa0, 0x2aee, 0xd3ec4000, 0x2aee, 0x8000, ...) io·Copy+0x81 /home/devon/golang/src/pkg/io/io.go:218 io·Copy(0xd3e7a210, 0x2aee, 0xd3e83aa0, 0x2aee, 0xd3e7a1e0, ...) websocket·echoServer+0x7c /home/devon/golang/src/pkg/websocket/websocket_test.go:20 websocket·echoServer(0xd3e83aa0, 0x2aee) websocket·Handler·ServeHTTP+0x558 /home/devon/golang/src/pkg/websocket/server.go:72 websocket·Handler·ServeHTTP(0x401e6a, 0x0, 0xd3e77c00, 0x2aee, 0xd3e96140, ...) http·*ServeMux·ServeHTTP+0x1fd /home/devon/golang/src/pkg/http/server.go:500 http·*ServeMux·ServeHTTP(0xd3e13298, 0x2aee, 0xd3e77c00, 0x2aee, 0xd3e96140, ...) http·*Conn·serve+0x7d /home/devon/golang/src/pkg/http/server.go:284 http·*Conn·serve(0xd3e77c00, 0x2aee) goexit /home/devon/golang/src/pkg/runtime/proc.c:135 goexit() 0x2aeed3e77c00 unknown pc goroutine 4 [4]: gosched+0x34 /home/devon/golang/src/pkg/runtime/proc.c:521 gosched() chanrecv+0x126 /home/devon/golang/src/pkg/runtime/chan.c:345 chanrecv(0xd3e831e0, 0x2aee, 0xd3e5f130, 0x2aee, 0x0, ...) runtime·chanrecv1+0x50 /home/devon/golang/src/pkg/runtime/chan.c:415 runtime·chanrecv1(0xd3e831e0, 0x2aee) net·*pollServer·WaitRead+0x49 /home/devon/golang/src/pkg/net/fd.go:277 net·*pollServer·WaitRead(0xd3e183c0, 0x2aee, 0xd3e96000, 0x2aee) net·*netFD·accept+0x2e2 /home/devon/golang/src/pkg/net/fd.go:507 net·*netFD·accept(0xd3e96000, 0x2aee, 0x41becd, 0x0, 0x432a95, ...) net·*TCPListener·AcceptTCP+0x5a /home/devon/golang/src/pkg/net/tcpsock.go:253 net·*TCPListener·AcceptTCP(0xd3e132d0, 0x2aee, 0xd3e8c090, 0x2aee, 0x0, ...) net·*TCPListener·Accept+0x34 /home/devon/golang/src/pkg/net/tcpsock.go:263 net·*TCPListener·Accept(0xd3e132d0, 0x2aee, 0x0, 0x0, 0x0, ...) http·Serve+0x66 /home/devon/golang/src/pkg/http/server.go:532 http·Serve(0xd3e187c0, 0x2aee, 0xd3e132d0, 0x2aee, 0xd3e7a600, ...) goexit /home/devon/golang/src/pkg/runtime/proc.c:135 goexit() 0x2aeed3e187c0 unknown pc goroutine 3 [3]: runtime·entersyscall+0x50 /home/devon/golang/src/pkg/runtime/proc.c:545 runtime·entersyscall() syscall·Syscall6+0x5 /home/devon/golang/src/pkg/syscall/asm_linux_amd64.s:41 syscall·Syscall6() syscall·EpollWait+0x84 /home/devon/golang/src/pkg/syscall/zsyscall_linux_amd64.go:132 syscall·EpollWait(0x6, 0x2aee, 0xd3e7c780, 0x2aee, 0x1, ...) net·*pollster·WaitFD+0xcd /home/devon/golang/src/pkg/net/fd_linux.go:116 net·*pollster·WaitFD(0xd3e7c100, 0x2aee, 0x0, 0x0, 0x72, ...) net·*pollServer·Run+0x9e /home/devon/golang/src/pkg/net/fd.go:236 net·*pollServer·Run(0xd3e183c0, 0x2aee) goexit /home/devon/golang/src/pkg/runtime/proc.c:135 goexit() 0x2aeed3e183c0 unknown pc goroutine 2 [1]: gosched+0x34 /home/devon/golang/src/pkg/runtime/proc.c:521 gosched() runtime·exitsyscall+0x84 /home/devon/golang/src/pkg/runtime/proc.c:582 runtime·exitsyscall() syscall·Syscall+0x64 /home/devon/golang/src/pkg/syscall/asm_linux_amd64.s:37 syscall·Syscall() syscall·Close+0x3b /home/devon/golang/src/pkg/syscall/zsyscall_linux_amd64.go:117 syscall·Close(0x7, 0x0, 0x0, 0x0) os·*File·Close+0x47 /home/devon/golang/src/pkg/os/file.go:93 os·*File·Close(0xd3e59ba0, 0x2aee, 0x0, 0x0) net·*netFD·decref+0x61 /home/devon/golang/src/pkg/net/fd.go:343 net·*netFD·decref(0xd3e960a0, 0x2aee) net·*netFD·Close+0x5a /home/devon/golang/src/pkg/net/fd.go:365 net·*netFD·Close(0xd3e960a0, 0x2aee, 0x1, 0x0) net·*TCPConn·Close+0x4c /home/devon/golang/src/pkg/net/tcpsock.go:111 net·*TCPConn·Close(0xd3e133f0, 0x2aee, 0x0, 0x0) websocket·*Conn·Close+0x36 /home/devon/golang/src/pkg/websocket/websocket.go:111 websocket·*Conn·Close(0xd3e83460, 0x2aee, 0xd, 0xd) websocket·TestEcho+0x4a8 /home/devon/golang/src/pkg/websocket/websocket_test.go:60 websocket·TestEcho(0xd3e57da0, 0x2aee) testing·tRunner+0x22 /home/devon/golang/src/pkg/testing/testing.go:132 testing·tRunner(0xd3e57da0, 0x2aee, 0x4a4990, 0x0) goexit /home/devon/golang/src/pkg/runtime/proc.c:135 goexit() 0x2aeed3e57da0 unknown pc goroutine 1 [4]: gosched+0x34 /home/devon/golang/src/pkg/runtime/proc.c:521 gosched() chanrecv+0x30e /home/devon/golang/src/pkg/runtime/chan.c:319 chanrecv(0xd3e830f0, 0x2aee, 0xd3e5bf68, 0x2aee, 0x0, ...) runtime·chanrecv1+0x50 /home/devon/golang/src/pkg/runtime/chan.c:415 runtime·chanrecv1(0xd3e830f0, 0x2aee) testing·Main+0x263 /home/devon/golang/src/pkg/testing/testing.go:159 testing·Main(0x4a4990, 0x0) main·main+0x23 /home/devon/golang/src/pkg/websocket/_testmain.go:14 main·main() mainstart+0xf /home/devon/golang/src/pkg/runtime/amd64/asm.s:54 mainstart() goexit /home/devon/golang/src/pkg/runtime/proc.c:135 goexit() rax 0x0 rbx 0x2aeed3e7c270 rcx 0x0 rdx 0x2aeed3ebeb08 rdi 0x2aeed3ebecd8 rsi 0x2aeed3ebed28 rbp 0x2aeed3eb4040 rsp 0x2aeed3ebec28 r8 0x2aeed3eb4040 r9 0x2aeed3ebea88 r10 0x41 r11 0x246 r12 0x11 r13 0x7ffff2477870 r14 0x2aeed3e84120 r15 0x2aeed3e8c090 rip 0x40193e rflags 0x246 cs 0x33 fs 0x0 gs 0x0 2009/12/1 Russ Cox <rsc@golang.org>: > The strace is only showing one thread. You have to say -f > to make it follow clone, and then at that point I think the > bug disappears, because strace is making the thing run > single-threaded. > > That goroutine with the messed up stack is probably > executing on the other cpu, and we don't get its stack > because the ABRT gets handled by the thread doing > the epoll. I wonder if it would work to send the ABRT > to the other thread. If you ls /proc/$pid/task you can > see the "pids" for the other threads. > > Russ >
Sign in to reply to this message.
On Tue, Dec 01, 2009 at 12:15:31PM -0800, Russ Cox wrote: > The strace is only showing one thread. You have to say -f to make > it follow clone, and then at that point I think the bug disappears, > because strace is making the thing run single-threaded. strace -f still shows the hang for me > That goroutine with the messed up stack is probably executing on the > other cpu, and we don't get its stack because the ABRT gets handled > by the thread doing the epoll. I don't understand why that would be ... do you mean because it's running (not sleeping) it's in a state where the trace logic can generate a usable stack trace? > I wonder if it would work to send the ABRT to the other thread. If > you ls /proc/$pid/task you can see the "pids" for the other threads. I don't think that will work, signals are process level entities not thread and surely the signal will be delivered to the first unblocked thread in the process?
Sign in to reply to this message.
>> That goroutine with the messed up stack is probably executing on the >> other cpu, and we don't get its stack because the ABRT gets handled >> by the thread doing the epoll. > > I don't understand why that would be ... do you mean because it's > running (not sleeping) it's in a state where the trace logic can > generate a usable stack trace? yes. the trace logic is just inspecting memory - a saved copy of the registers - not inspecting other active processes. >> I wonder if it would work to send the ABRT to the other thread. If >> you ls /proc/$pid/task you can see the "pids" for the other threads. > > I don't think that will work, signals are process level entities not > thread and surely the signal will be delivered to the first unblocked > thread in the process? i've grown skeptical of "surely" when dealing with linux. ;-) if you explicitly asked for a different thread you might just get it. russ
Sign in to reply to this message.
In any case, I'm doing a binary search between release and now to see if I can find where this started happening. Hopefully it's a `regression.' Seems like it's something that's been reported too recently, though Issue #271 doesn't give me any high hopes. --dho
Sign in to reply to this message.
2009/12/1 Devon H. O'Dell <devon.odell@gmail.com>: > In any case, I'm doing a binary search between release and now to see > if I can find where this started happening. Hopefully it's a > `regression.' Seems like it's something that's been reported too > recently, though Issue #271 doesn't give me any high hopes. Or not. websockets are 3 days old, and the race in #321 has existed for quite some time. --dho
Sign in to reply to this message.
|