ohNet hang after TimerManager thread crashes
|
19-04-2013, 08:00 AM
(This post was last modified: 19-04-2013 08:13 AM by andreww.)
Post: #11
|
|||
|
|||
RE: ohNet hang after TimerManager thread crashes
I had another look at the stack dumps. I think LWP 8015 could well be the timer thread. It does not appear in the Java stack dump, and it appears early on between two other threads both created by ohNet. Since it is not in the Java stack dump, I would assume that it has never run Java code at all.
For reference, here's a mapping between Java thread names and LWP numbers: Code: Thread-378 is LWP 19630 Is the hang reproduceable? If it's a Debian-based distro, have you installed the libc6-dbg package? I think that package should provide the debug information needed for gdb to give an accurate stack trace through frames in libc. When it's missing debug information it tries to figure it out automatically, and I think gdb is a bit poor at doing so on armel. Also, here's a snippet of Python (2.7) that takes the Java stack dump and prints out the mapping of thread names to LWP numbers: Code: import re |
|||
19-04-2013, 08:55 AM
Post: #12
|
|||
|
|||
RE: ohNet hang after TimerManager thread crashes
Thanks for looking into this. See comments inline below.
(19-04-2013 08:00 AM)andreww Wrote: I had another look at the stack dumps. I think LWP 8015 could well be the timer thread. It does not appear in the Java stack dump, and it appears early on between two other threads both created by ohNet. Since it is not in the Java stack dump, I would assume that it has never run Java code at all. I don't think LWP 8015 is the TimerManager thread. I've done a thread dump of MinimServer running normally on the QNAP (see attachment), and the TimerManager thread has an LWP number that is 3 less than the NetworkAdapterChangeNotifier thread, indicating that it was created before the NetworkAdapterChangeNotifier thread. There is a thread in the "normal" dump (LWP 5293) that has an LWP number of 1 more than the the NetworkAdapterChangeNotifier thread (as does LWP 8015), with a stack trace that looks the same as LWP 8015, and it isn't the TimerManager thread. Quote:For reference, here's a mapping between Java thread names and LWP numbers: Thanks! Quote:Is the hang reproduceable? No, I have only seen it once. I have seen other hangs that might have been similar, but at the time of these other hangs I didn't have a working gdb on the QNAP so I couldn't get a native thread dump. Quote:If it's a Debian-based distro, have you installed the libc6-dbg package? I think that package should provide the debug information needed for gdb to give an accurate stack trace through frames in libc. When it's missing debug information it tries to figure it out automatically, and I think gdb is a bit poor at doing so on armel. The QNAP doesn't use debian. I'll look at libc6-dbg to see if it's possible to retrofit it into the QNAP environment. Quote:Also, here's a snippet of Python (2.7) that takes the Java stack dump and prints out the mapping of thread names to LWP numbers: Thanks! |
|||
25-04-2013, 09:30 AM
Post: #13
|
|||
|
|||
RE: ohNet hang after TimerManager thread crashes
I'd like to fix the MinimServer bug of not terminating the ohNet process after an unhandled exception fatal error call.
To ensure the Visual Studio debugger is called on Windows, I'd like to do the process termination by calling Os::Quit or abort(). I don't think this is possible with the current Java bindings. Would you be willing to accept a patch to the Java bindings to provide this capability? |
|||
25-04-2013, 09:50 AM
Post: #14
|
|||
|
|||
RE: ohNet hang after TimerManager thread crashes
(25-04-2013 09:30 AM)simoncn Wrote: I'd like to fix the MinimServer bug of not terminating the ohNet process after an unhandled exception fatal error call. That sounds like a good idea. It'd be great if you were able to provide a patch for this. |
|||
29-04-2013, 10:28 AM
Post: #15
|
|||
|
|||
RE: ohNet hang after TimerManager thread crashes
(25-04-2013 09:50 AM)simonc Wrote: That sounds like a good idea. It'd be great if you were able to provide a patch for this. The patch is attached. It adds a new exitProcess() method to the Library class. The following files are affected: OpenHome/Net/Bindings/Java/org/openhome/net/core/Library.java OpenHome/Net/Bindings/Java/Library.c OpenHome/Net/Bindings/Java/Library.h OpenHome/Net/Bindings/C/OhNetC.cpp OpenHome/Net/Bindings/C/OhNet.h I've tested this on Windows, Linux and Mac. |
|||
30-04-2013, 09:14 AM
Post: #16
|
|||
|
|||
RE: ohNet hang after TimerManager thread crashes
(29-04-2013 10:28 AM)simoncn Wrote:(25-04-2013 09:50 AM)simonc Wrote: That sounds like a good idea. It'd be great if you were able to provide a patch for this. Thanks very much. I've applied this locally so it should be on github later today. Note that I made a couple of small changes to your patch - the function is now called abortProcess() (I thought this made it slightly clearer that the function was not intended to be called during normal execution of a program) and added an equivalent C# API. |
|||
30-04-2013, 10:05 AM
Post: #17
|
|||
|
|||
RE: ohNet hang after TimerManager thread crashes
(30-04-2013 09:14 AM)simonc Wrote: Thanks very much. I've applied this locally so it should be on github later today. Thanks very much! I chose the name exitProcess() to correspond to the Java System.exit() call, but I think you're right that it might cause misunderstanding. |
|||
« Next Oldest | Next Newest »
|