Call now: 252-767-6166  
Oracle Training Oracle Support Development Oracle Apps

 
 Home
 E-mail Us
 Oracle Articles
New Oracle Articles


 Oracle Training
 Oracle Tips

 Oracle Forum
 Class Catalog


 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Analysis
 Design
 Implementation
 Oracle Support


 SQL Tuning
 Security

 Oracle UNIX
 Oracle Linux
 Monitoring
 Remote s
upport
 Remote plans
 Remote
services
 Application Server

 Applications
 Oracle Forms
 Oracle Portal
 App Upgrades
 SQL Server
 Oracle Concepts
 Software Support

 Remote S
upport  
 Development  

 Implementation


 Consulting Staff
 Consulting Prices
 Help Wanted!

 


 Oracle Posters
 Oracle Books

 Oracle Scripts
 Ion
 Excel-DB  

Don Burleson Blog 


 

 

 


 

 

 

 

 

Avoiding Split Brain

Oracle RAC Cluster Tips by Burleson Consulting

This is an excerpt from the bestselling book Oracle Grid & Real Application Clusters.  To get immediate access to the code depot of working RAC scripts, buy it directly from the publisher and save more than 30%.


It is reasonable to expect that server components are prone to failures. It is the responsibility of the cluster to detect and monitor and stabilize the application running on the cluster. Clusters Systems are geared to handle peculiar situations like Amnesia and Split Brain conditions.

Amnesia occurs when the cluster restarts after a shutdown with cluster data older than at the time of the shutdown. This can happen if multiple versions of the framework data are stored on disk and a new incarnation of the cluster is started when the latest version is not available.

Split Brain Condition occurs when a single cluster has a failure that results in reconfiguration of cluster into multiple partitions, with each partition forming its own sub-cluster without the knowledge of the existence of other. This would lead to collision and corruption of shared data as each sub-cluster assumes ownership of shared data.

As an example, when two systems have access to the shared storage, integrity of the data depends on the systems communication through heartbeats using the private interconnects. When the private links are lost and failed or if one of the systems is hung or too busy to send/receive heartbeats, each system thinks the other system has exited the cluster, then it tries to become the master or form a sub-cluster and claim exclusive access to the shared storage. This condition leads to Split Brain.

There are definite methods, also known as fencing, to avoid such a tricky and undesirable situation. The two basic approaches to fencing are resource based fencing and system reset or STOMITH or STONITH fencing.

Resource-based fencing includes I/O fencing and the maintenance of Quorum disks. In resource-based fencing, a hardware mechanism is employed, which immediately disables or disallows access to shared resources. If the shared resource is a SCSI disk or disk array, one can use SCSI reserve/release or better yet persistent reserve/release operations. If the shared resource is a fiber channel disk or disk array, then one can instruct a fiber channel switch to deny the problem node access to shared resources. In general, the errant node itself is left undisturbed, and its resources are instructed to deny access to it. If the node is able to later become part of a cluster with quorum, it will then go through the normal channels to reacquire its resources.

STOMITH stands for Shoot the Other Machine in the Head. STOMITH fencing takes a completely different approach. In STOMITH systems, the errant cluster node is simply reset and forced to reboot. When it rejoins the cluster it acquires resources in the normal way. In many cases, STOMITH operations are performed via smart power switches, which simply remove power from the errant node for a brief period of time. 

However, implementation of processes to avoid split brain varies from vendor to vendor, and also depends on the type of shared storage in use for the cluster. For example, Sun Cluster avoids split brain by using the majority vote principle coupled with quorum disks and Linux cluster using Polyserve Matrix Server employs fabric fencing. The next section examines these techniques in detail.

 


This is an excerpt from the bestselling book Oracle Grid & Real Application Clusters, Rampant TechPress, by Mike Ault and Madhu Tumma.

You can buy it direct from the publisher for 30%-off and get instant access to the code depot of Oracle tuning scripts.

http://www.rampant-books.com/book_2004_1_10g_grid.htm


 

 
��  
 
 
Oracle Training at Sea
 
 
 
 
oracle dba poster
 

 
Follow us on Twitter 
 
Oracle performance tuning software 
 
Oracle Linux poster
 
 
 

 

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  

and include the URL for the page.


                    









Burleson Consulting

The Oracle of Database Support

Oracle Performance Tuning

Remote DBA Services


 

Copyright © 1996 -  2017

All rights reserved by Burleson

Oracle ® is the registered trademark of Oracle Corporation.

Remote Emergency Support provided by Conversational