Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Race conditions in EventServerUpnp
14-02-2013, 10:33 PM
Post: #11
RE: Race conditions in EventServerUpnp
(14-02-2013 05:20 PM)simonc Wrote:  Now committed. I changed the patch very slightly (RemovePendingAdds iterates from end to start of its list rather than tweaking a loop counter) but its otherwise as you suggested.

Unfortunately, this change has an undesirable side-effect. If multiple threads are waiting for the same Sid, it would be better to signal them in the same order that their PendingSubscription objects were added to the list. This will improve the chances of these threads arriving at the UpdateSequenceNumber() call in the correct order.

Do you think tweaking the loop counter isn't safe, or are you concerned that it looks ugly? Is there a better way to write this loop using a start-to-end iteration order?

Quote:Thanks again for taking the time to do this!

You're very welcome!
Find all posts by this user
15-02-2013, 09:22 AM
Post: #12
RE: Race conditions in EventServerUpnp
(14-02-2013 10:33 PM)simoncn Wrote:  Unfortunately, this change has an undesirable side-effect. If multiple threads are waiting for the same Sid, it would be better to signal them in the same order that their PendingSubscription objects were added to the list. This will improve the chances of these threads arriving at the UpdateSequenceNumber() call in the correct order.

Good point. I'll change this to match your proposed diff.

(14-02-2013 10:33 PM)simoncn Wrote:  Do you think tweaking the loop counter isn't safe, or are you concerned that it looks ugly? Is there a better way to write this loop using a start-to-end iteration order?

Tweaking the loop counter is safe. Its not a terribly common pattern though so I've sometimes found that people are more likely to make mistakes when editing code like this. I can't suggest a better pattern though so I've changed to start-to-end iteration with counter tweaking.
Find all posts by this user
15-02-2013, 04:47 PM
Post: #13
RE: Race conditions in EventServerUpnp
The latest code should be up there now.

I hope I've also fixed (or greatly reduced) the instability in the code publication process so future updates ought to come out a bit more promptly again.
Find all posts by this user
18-02-2013, 06:00 PM
Post: #14
RE: Race conditions in EventServerUpnp
(03-02-2013 11:32 AM)simoncn Wrote:  I think a global mutex on EventServerUpnp is required to ensure correct ordering of event message processing for the initial stages until the subscription for each event message is known. This will require some reorganization of the current code.

I've implemented a patch for this (attached). When I looked through the code in more detail, it turned out to be a lot simpler to do this than I was expecting.

There's already a mutex in SocketTcpServer that ensures serialization of the calls from SocketTcpServer::Accept to Socket::Accept by multiple session threads (lines 544/546/553/559 of Network.cpp). At present, this mutex is always released on line 556, which allows another thread to call Socket::Accept. To allow for cases where the release of the mutex should be delayed, the iMutex.Signal() call on line 553 can be replaced by a call to a new virtual function in the SocketTcpServer class which just calls iMutex.Signal(). This allows a subclass of SocketTcpServer to override this virtual function and delay releasing the mutex until the subclass has done some additional processing. I've named this virtual function ConnectionAccepted.

To use this mechanism for subscription event notifications, the class EventUpnpServer needs to become a subclass of SocketTcpServer instead of a "wrapper" class for SocketTcpServer. This is a very small change to the existing code. As a subclass, it can override the ConnectionAccepted virtual function and delay the release of the SocketTcpServer mutex until the session thread has reached the UpdateSequenceNumber call in EventSessionUpnp::Run on line 116 of EventUpnp.cpp. If the code in EventSessionUpnp::Run takes the alternative path that calls WaitForPendingAdd, the mutex is released by WaitForPendingAdd after it has added the PendingSubscription object to the list but before it waits on the semaphore.

Here's a summary of the code changes:

1) In Network.cpp and Network.h, add the new virtual function SocketTcpServer::ConnectionAccepted and modify SocketTcpServer::Accept to call this virtual function.

2) In Network.cpp and Network.h, add a new method SocketTcpServer::Unlock to allow the mutex to be released by a subclass.

3) In EventUpnp.cpp and EventUpnp.h, change EventUpnpServer to be a subclass of SocketTcpServer.

4) In EventUpnp.cpp and EventUpnp.h, give the EventSessionUpnp class an iServer reference to the EventServerUpnp object. It needs this reference so it can call the Unlock method to release the SocketTcpServer mutex.

5) In EventUpnp.cpp, modify the code in EventSessionUpnp::Run to call iServer.Unlock(). Also, pass iServer as an argument on the call to WaitForPendingAdd so that WaitForPendingAdd can call iServer.Unlock().

6) In CpiSubscription.cpp and CpiSubscription.h, add an EventServerUpnp& parameter to WaitForPendingAdd and add an Unlock() call to the code in WaitForPendingAdd.

I've compiled and tested these changes, and they seem to work OK as far as I'm able to test them. Together with the other recent changes in this area, they ensure that all event notification messages for established subscriptions are processed in the correct order, without causing any undesirable side-effects or adding a lot of complexity.


Attached File(s)
.zip  ConnectionAccepted.zip (Size: 1.89 KB / Downloads: 1)
Find all posts by this user
19-02-2013, 10:06 AM
Post: #15
RE: Race conditions in EventServerUpnp
(18-02-2013 06:00 PM)simoncn Wrote:  I think a global mutex on EventServerUpnp is required to ensure correct ordering of event message processing for the initial stages until the subscription for each event message is known. This will require some reorganization of the current code.

Sorry, I'm afraid I'm not keen to accept this patch. It seems like a relatively disruptive/risky change for minor benefit.

I understand that we can currently generate unnecessary resubscribes. Overall this seems less bad than serialising all evented updates. I'm also very uncomfortable about the extensions to SocketTcpServer. (Adding functionality to a base class for the exclusive use of a single sub-class doesn't feel right.)

If you disagree, can you explain the impact of the current behaviour please? As far as I can tell, the worst that happens is that we sometimes process events out of order, triggering a re-subscribe. This re-subscribe is wasteful but doesn't seem like a big deal (assuming it happens relatively rarely).

If re-subscribes are happening very frequently, you could possibly review how often state variables are being updated; a slower rate of update would remove most races and reduce network load more effectively than serialising event processing.

If it isn't practical to change the rate of event generation, your current approach of reducing the number of event server threads to 1 seems like the best approach.
Find all posts by this user
19-02-2013, 11:03 AM (This post was last modified: 19-02-2013 11:27 AM by simoncn.)
Post: #16
RE: Race conditions in EventServerUpnp
(19-02-2013 10:06 AM)simonc Wrote:  Sorry, I'm afraid I'm not keen to accept this patch. It seems like a relatively disruptive/risky change for minor benefit.

I'm sorry to hear that.

Quote:I understand that we can currently generate unnecessary resubscribes. Overall this seems less bad than serialising all evented updates.

They would only be serialised until they are allocated to a subscription. Most of the processing happens after this.

Quote:I'm also very uncomfortable about the extensions to SocketTcpServer. (Adding functionality to a base class for the exclusive use of a single sub-class doesn't feel right.)

It's not adding functionality: it's providing an extension point that subclasses can use if they need to extend the period of serialization for any reason. There is only one such subclass at present, but there might be others in the future.

Quote:If you disagree, can you explain the impact of the current behaviour please? As far as I can tell, the worst that happens is that we sometimes process events out of order, triggering a re-subscribe. This re-subscribe is wasteful but doesn't seem like a big deal (assuming it happens relatively rarely).

It means that the out-of-order updates are missed by the subscriber, and further updates may also be missed while the resubscribe happens.

It's hard to say how frequently this happens. The symptom would be that the subscriber is less responsive to updates.

Quote:If re-subscribes are happening very frequently, you could possibly review how often state variables are being updated; a slower rate of update would remove most races and reduce network load more effectively than serialising event processing.

I'm currently throttling the rate of updates to no more than one every 50ms. I could reduce this further, but this would make the subscriber less responsive, which is somewhat equivalent to the current situation.

Quote:If it isn't practical to change the rate of event generation, your current approach of reducing the number of event server threads to 1 seems like the best approach.

Given that you're not willing to accept the patch, this seems like the best workaround for me.
(Edit: see comment below.)
(19-02-2013 11:03 AM)simoncn Wrote:  
Quote:If it isn't practical to change the rate of event generation, your current approach of reducing the number of event server threads to 1 seems like the best approach.

Given that you're not willing to accept the patch, this seems like the best workaround for me.

Unfortunately, there's a problem with this. If the single event server thread blocks in WaitForPendingAdd, no event notifications can be processed for any subscription. The thread could be blocked for as long as 2000 ms.

Would you be willing to accept a patch to WaitForPendingAdd to release the event server thread for other processing instead of blocking it? The event server thread would need to be re-acquired later by WaitForPendingAdd when the semaphore is signalled, and I'm not sure if there's a clean way to do this.
Find all posts by this user
19-02-2013, 01:10 PM
Post: #17
RE: Race conditions in EventServerUpnp
(19-02-2013 11:03 AM)simoncn Wrote:  
(19-02-2013 10:06 AM)simonc Wrote:  If you disagree, can you explain the impact of the current behaviour please? As far as I can tell, the worst that happens is that we sometimes process events out of order, triggering a re-subscribe. This re-subscribe is wasteful but doesn't seem like a big deal (assuming it happens relatively rarely).

It means that the out-of-order updates are missed by the subscriber, and further updates may also be missed while the resubscribe happens.

It's hard to say how frequently this happens. The symptom would be that the subscriber is less responsive to updates.

I appreciate that you feel strongly about this but I'm still not clear whether there is any user impact to the current behaviour.

I'd suggest that we park this issue until we find evidence that control points are triggering regular unnecessary re-subscribes and that this is making them less responsive.

If you do find that this happens, I'd be interested to understand more about your use case. Specifically, which service(s) and state variable(s) does it happen with and how is the service spec'd to behave.
Find all posts by this user
19-02-2013, 01:48 PM
Post: #18
RE: Race conditions in EventServerUpnp
(19-02-2013 01:10 PM)simonc Wrote:  I appreciate that you feel strongly about this but I'm still not clear whether there is any user impact to the current behaviour.

I'd suggest that we park this issue until we find evidence that control points are triggering regular unnecessary re-subscribes and that this is making them less responsive.

If you do find that this happens, I'd be interested to understand more about your use case. Specifically, which service(s) and state variable(s) does it happen with and how is the service spec'd to behave.

I'm not sure how I can park this without having an acceptable interim solution. With the current implementation of WaitForPendingAdd, I can't use a single event server thread as you suggested previously. Have you ruled out changing WaitForPendingAdd to prevent it from blocking the thread?
Find all posts by this user
19-02-2013, 02:04 PM (This post was last modified: 19-02-2013 02:33 PM by simonc.)
Post: #19
RE: Race conditions in EventServerUpnp
(19-02-2013 01:48 PM)simoncn Wrote:  
(19-02-2013 01:10 PM)simonc Wrote:  I appreciate that you feel strongly about this but I'm still not clear whether there is any user impact to the current behaviour.

I'd suggest that we park this issue until we find evidence that control points are triggering regular unnecessary re-subscribes and that this is making them less responsive.

If you do find that this happens, I'd be interested to understand more about your use case. Specifically, which service(s) and state variable(s) does it happen with and how is the service spec'd to behave.

I'm not sure how I can park this without having an acceptable interim solution. With the current implementation of WaitForPendingAdd, I can't use a single event server thread as you suggested previously. Have you ruled out changing WaitForPendingAdd to prevent it from blocking the thread?

I don't understand what the actual problem is.

This thread started with reports of two issues - a crash and a race condition that resulted in unnecessary resubscribes. I think the crash is fixed; the resubscribe race is still present but I don't believe will cause any appreciable issues.

Are you worried that current behaviour with the resubscribe race just seems wrong? If so, I'd encourage you to live with it until you can point to a user-visible problem it causes.

Alternatively, is the current behaviour already causing a significant problem? If it is, can you please explain what that problem is? Other background information about the service(s) this happens with would also be very useful.
Find all posts by this user
19-02-2013, 03:16 PM
Post: #20
RE: Race conditions in EventServerUpnp
(19-02-2013 02:04 PM)simonc Wrote:  I don't understand what the actual problem is.

This thread started with reports of two issues - a crash and a race condition that resulted in unnecessary resubscribes. I think the crash is fixed; the resubscribe race is still present but I don't believe will cause any appreciable issues.

Are you worried that current behaviour with the resubscribe race just seems wrong? If so, I'd encourage you to live with it until you can point to a user-visible problem it causes.

Alternatively, is the current behaviour already causing a significant problem? If it is, can you please explain what that problem is? Other background information about the service(s) this happens with would also be very useful.

I have a custom service that sends regular updates to a custom control point. When the control point receives an update, it refreshes its status. I'm using an evented property to do this, as discussed in this thread, which gives more details about the application use case.

The updates and refreshes need to be fairly frequent to ensure acceptable responsiveness for the control point. In the referenced discussion, you suggested that ohNet could handle sending an event message every 5ms. You also said that no level of traffic would overload either the device or control point stacks.

After experimenting with various values for the interval between event messages, I have arrived at a figure of 50 ms as an acceptable compromise between network overload and application responsiveness. This seemed a very conservative figure given the comment about ohNet being able to handle sending messages every 5ms.

My application requirement is for these events to be delivered on an established subscription every 50ms, without events being lost because of race conditions or resubscribes. If an update is missing or delayed, the control point status won't be refreshed in a timely manner. If the ohNet control point stack is unable to do this, I'm surprised and disappointed, and I'll need to implement another way to transmit these events that doesn't use ohNet.
Find all posts by this user


Forum Jump: