Search This Blog

16 July 2011

AOS service does not stop and stays pending (what happens when the AOS stops)

In my article about stopping AOS services via PowerShell, I implemented a timeout for stopping the AOS gracefully before killing that service. In that script I set the timeout to 60 sec, which should be more then enough under normal circumstances. If this timeout needs to be different for you, just change that. But at the end, you need to be sure that the AOS service is stopped, and if it isn't, the service needs to be stopped by killing the process.

Why can an AOS stuck in a pending state? 
First an explanation what the AOS does when it is stopped: The AOS is managed by the service control manager (MSDN/Wiki) (SCM). This SCM does send the messages STOP, PAUSE, CONTINUE and SHUTDOWN to the AOS, which does react on these messages. In my script, I'm using the .Net class ServiceController to work with the SCM in a very comfortable way because the .Net class wraps the Win32 API (PInvoke) and guarantees you a safe way to work with windows services. By calling the Stop method, the script just sends a STOP message via SCM to the AOS and, if the main-thread of the AOS service does have time to handle that message, it will set the first set the service in a pending state and then trigger the Shutdown. The shutdown of an AOS does first interrupt all user sessions (you might find this message in the event-log: "Object Server 01: RPC error: Client provided an invalid session ID 9") , then terminates the server session ("Object Server 01: Server main session is being destroyed.") and once all these sessions are closed, it will then remove the RPC interface from the RPC run-time library registry and then stop listening to the RPC calls. Only then the SCM sets the status of the AOS to stopped. (This article on MSDN shows in a very simple example what the AOS does for disposing the RPC, too).
So there shouldn't be any problem when stopping the AOS, but unfortunately the AOS does sometimes not succeed to stop the service gracefully and this happens only from time to time when the AOS is under stress and it is therefore difficult to reproduce. But because the shutdown waits until the ServerSession is terminated, it is enough to freeze the thread of the AOS session with a sleep.
public server static void FreezeAOSSessionThread()
{
    ;
    sleep(60000);
}
Calling the sleep on the client wouldn't work, because the client-sessions are killed. In no case, the client can prevent the AOS to stop and hold him in a pending state (In all cases I could track back, the situation on the AOS was similar: The AOS-session could not be terminated). If you are looking for the cause, take a dump with ADPlus from the AOS and check the X++ call stack. Once again, thank you Tariq... Attach AdPlus to the process and kill the process with the task-manager. The dump will then automatically taken.
Killing the AOS might terminate open connections to the SQL-Server as well. But because CUD (create, update and delete) operations are done within transactions, killing the AOS-service shouldn't break the database integrety - if no custom code breaks this best-practice. Just keep in mind that you will loose all data of non-committed transactions.

No comments:

Post a Comment