Transaction manager high availability

The WebSphere Application Server Transaction Manager writes to its transaction recovery logs when it handles global transactions that involve two or more resources. Transaction recovery logs are stored on disk and are used for recovering in-flight transactions from system crashes or process failures. To enable WebSphere application server transaction peer recovery, it is necessary to place the recovery logs on a highly available file system, such as IBM SAN FS or NAS, for all the application servers within the same cluster to access. All application servers must be able to read from and write to the logs.

For a peer server to recover in-flight transactions, any database locks associated with the failed transactions should be released prior to the recovery. You need to use the lease-based exclusive locking protocol, such as Common Internet File System (CIFS) or Network File System (NFS) Version4, to access remote recovery logs from WebSphere application server nodes. Without the lease-based locking support, if one of the nodes crashes, locks held by all the processes on that node will not automatically be released. As a result, the transactions cannot be completed, and database access can be impaired due to the unreleased locks

In the event of a server failure, the transaction service of the failed application server is out of service. Also, the in-flight transactions that have not be committed might leave locks in the database, which blocks the peer server from gaining access to the locked records. There are only two ways to complete the transactions and release the locks. One is to restart the failed server and the other is to start an application server process on another box that has access to the transaction logs. Using the new HAManager support, a highly available file system and a lease-based locking protocol, a recovery process will be started in a peer member of the cluster. The recovery locks are released and in-flight transactions are committed.