High Availability for pSeries and RS/6000
Complete Availability Line-up
HACMP: can keep mission-critical
applications highly available within a location through application fallover
and monitoring.
HAGEO: can quickly restore access
to data following a location failure. HAGEO provides the same functionality
as GeoRM, and, together with HACMP, it can automate failover, recovery
and reintegration between geographically separate locations.
GeoRM: can provide an environment
to mirror data to another location. It can provide remote data mirroring
for backup of regional (up to seven) locations to a single centralized
server or between just two geographically separate servers. GeoRM is mutually
exclusive and cannot be used in conjunction with HAGEO.
HACMP
Highlights
- Combines world-class, easy-to-use, 24x7 clustering technology with
IBM advanced systems technologies
- Significantly reduces planned and unplanned outages, allowing for
cluster upgrades and system maintenance without interrupting operations
- Offers multiple data backup and recovery methods to meet disaster
management needs
Need for High Availability
What happens when IT systems fail? During the business day, IT investments
are hard at work: recording customer activities, tracking inventory, keeping
company statistics, providing employees with the computing power needed
to generate business revenue. But what happens when those systems fail?
The cost of computer downtime is widely documented; unplanned outages
cost real money and increase the total cost of ownership (TCO) for IT.
Planned outages for system maintenance can also impact business performance.
Keeping systems highly available should be the top goal of every system
administrator or corporate CIO. What every business needs are high-availability
(HA) solutions that keep a company's IT investment running 24x7, allow
end users to never experience any system outages, and let system maintenance
occur without causing downtime.
IBM HA Clustering Solution
Better protect critical business applications from failures with the
capabilities of IBM High Availability Cluster Multiprocessing for
AIX 5L V5.1.0 (HACMP V5.1). For over 10 years, HACMP has been providing
reliable high-availability services, monitoring capabilities and dependable
detection of application failures. HACMP manages the fallover of business
application environments to backup servers. And with the introduction
of the new optional package, HACMP/XD (Extended Distance), HACMP will
also manage fallover to backup servers at remote sites. HACMP/XD provides
long distance remote fallover for ESS/PPRC peers, and unlimited distance
fallover for IP connected peers using proven IBM HAGEO (High Availability
Geographic Cluster) technology. Now there is a single, world-class source
of protection for mission-critical applications.
HACMP makes use of redundant hardware configured in a cluster to keep
an application running, restarting it on a backup server if necessary.
This minimizes expensive downtime for both planned and unplanned outages
and provides flexibility to accommodate changing business needs. Up to
32 servers can participate in an HACMP cluster - ideal for an environment
requiring horizontal growth with rock-solid reliability. HACMP can also
detect software problems that are not severe enough to interrupt proper
operation of the system, such as process failure or exhaustion of system
resources. HACMP monitors, detects and reacts to such failure events,
allowing the system to stay available during random, unexpected software
problems. HACMP can be configured to react to hundreds of system events.
Using HACMP can virtually eliminate planned outages, since users, applications
and data can be moved to backup systems during scheduled system maintenance.
HACMP clusters can be configured to meet complex and varied application
availability and recovery needs.
Benefits of HACMP
HACMP takes advantage of AIX 5L - the high-performance, scalable
UNIX operating system from IBM - and exploits its systems and network
management capabilities. AIX 5L is one of the world's most open UNIX operating
systems and includes functions to improve usability, security, system
availability, and performance. These include improved availability of
mirrored data and enhancements to AIX Workload Manager that help
solve problems of mixed workloads by quickly and dynamically providing
resource availability to critical applications. Used across the IBM eServer
pSeries line of on demand servers along with the Reliable Scalable
Cluster Technology (RSCT) infrastructure technology layer in AIX 5L, HACMP
can provide both horizontal and vertical scalability without downtime.
High Availability Cluster Enhancements
HACMP V5.1 requires AIX 5L and builds upon its features. New HACMP V5.1
functions include:
- Reduced fallover time using fast disk takeover which happens within
10 seconds
- Streamlined configuration interface which requires only six user inputs
to build a simple HA cluster
- New non-IP heartbeating protection over disks where no additional
hardware is required
- Enhanced security mechanism, removing the need for /.rhosts
- Increased administration productivity through faster cluster verification
and synchronization
- Greater control over resources owning application startup and fallover
behaviour
- More cluster status information readily available in the cluster monitor
- Addition of multiple disaster recovery technologies to keep the system
accessible if disaster strikes
Business Continuity
The HACMP/XD (Extended Distance) optional feature is a must for customers
with business-critical data who want to mirror data between separate sites
to aid in disaster recovery. This applies to businesses of any size, with
multiple sites or regional operations, or wherever decentralization of
data is desired. HACMP/XD is an attractive and affordable high-availability
solution for small- and medium-sized enterprises, and for small- and medium-sized
business units of large enterprises. "High availability" should
be a fundamental buying criterion for business-critical and e-business
applications.
In a single package, HACMP/XD offers multiple technologies for achieving
long distance data mirroring, fallover, and resynchronisation.
- HACMP/XD supports IBM Enterprise Storage Server (ESS) Peer-to-Peer
Remote Copy (PPRC). This allows HACMP clusters to support automatic
fallover of disks that are PPRC pairs and creates a powerful solution
for customers on ESS with PPRC. By automating the management of PPRC,
recovery time is minimized after an outage, regardless of whether the
clustered environment is local or geographically dispersed. HACMP/XD
in combination with PPRC manages a clustered environment to ensure mirroring
of critical data is maintained at all times.
- HACMP/XD IP-based mirroring will provide the well-known unlimited
distance data mirroring of the IBM High Availability Geographic Cluster
(HAGEO) for AIX product. IP-base mirroring allows a cluster of pSeries
servers to be placed in two widely separated geographic locations, each
maintaining an exact replica of the application and data. Data synchronization
during production, fallover, recovery, and restoration is provided.
HACMP/XD is independent of the disk storage used. RAID or mirroring
can be used for local protection. HACMP/XD IP-based mirroring is done
at the logical volume layer.
Complementary Cluster Software
IBM also offers a broad range of additional tools to aid in efficiently
building, managing and expanding HA clusters in AIX 5L environments. These
include:
- Integrated Cluster File System utilizing General Parallel File System
(GPFS) for AIX V1.5. GPFS is a high-performance, shared-disk file system
using standard UNIX file system interfaces and providing concurrent
access to data from all nodes in a cluster.
- Workload Manager for AIX, which provides resource balancing between
applications
- Geographic Remote Mirroring (GeoRM ) for AIX to provide unlimited
distance data mirroring for backup/recovery
- Tivoli for enterprise level systems management and monitoring
New Generation of On Demand Servers
HACMP runs on IBM eServer pSeries, the server platform of choice for
UNIX-based on demand applications. This technology-driven line of servers
offers the availability, scalability and range of performance demanded
by today's growing on demand business environments. It combines the benefits
of high-performance copper chip and RISC technology with AIX 5L for reliable
handling of mission-critical applications.
pSeries is part of the IBM eServer product line, a generation of servers
featuring innovative technology, logical partitioning, outstanding scalability
and availability, broad support of open standards for application flexibility,
and a full range of new tools to manage IT infrastructure in an on demand
world.
Gaining the IBM Advantage
HA solutions are often inherently single-sourced to reduce the risk of
failures occurring since each element of the solution is designed and
tested for proven reliability. This can be a critical decision factor
for business environments, and IBM provides the advantage of pSeries servers,
the AIX 5L operating system, and IBM TotalStorage offerings and HACMP
solutions.
The IBM eServer product line is backed by comprehensive offerings and
resources that provide value at every stage of IT implementation. These
include High Availability Cluster Implementation Services, an offering
which provides basic and customized assistance for installation of HACMP
clusters. This service is customisable with the following elements:
- High Availability Cluster Proof of Concept Review
- Planning and design of a pSeries Availability Cluster
- Installation and configuration of a pSeries Availability Cluster
- Applications integration assistance
- Development and execution of a Cluster Test Plan
- Enhanced monitoring and reporting setup
- Operations planning and operations documentation development
- Migration/upgrades services
Based on an assessment of the complete system environment, IBM availability
experts can design a customer solution to meet the target availability
level for on demand business needs.
GeoRM for AIX and HAGEO for AIX
Highlights
- Provides disaster recovery and resynchronisation capability for geographically
separated sites
- Protects data against total location failure by remote mirroring of
data
- Supports unlimited distance between participating sites
- Performs automatic site takeover and recovery
- Tight integration with IBM's High Availability Cluster Multiprocessing
(HACMP) for AIX clustering software
GeoRM/HAGEO Key Features
Key features:
- Support for both UDP and TCP transport options.
- 64-bit kernel support for the TCP protocol.
- Choice of "Write Ordering by Volume Group" under the TCP
transport option which can realize performance gains.
- Tighter integration with HACMP simplifies configuration of both products.
- Allows automatic detection and response to site and network failures
in the geographic cluster without user intervention.
- Provides load balancing across the links and enhanced by choosing
the fastest path.
- Removes the AIX limitation of three mirror copies of a disk and allows
three copies at each geographic site.
- Wider range of data transmission rates, allowing more efficient use
of networks and better tuning of network utilization.
- Support for maximum sized logical volumes
Disaster Recovery Excellence
Today, keeping a business operational increasingly means keeping critical
data and information systems available around the clock. To compete successfully
in the global marketplace, companies are striving to protect critical
information systems to help minimize costly business impacts, such as
lost sales, decreased customer satisfaction and reduced employee productivity.
One aspect of high availability is protection against location disasters,
such as power outages, hardware or software failures, and natural disasters.
This is accomplished by eliminating the system and the site as points
of failure.
Two software products provide differing levels of disaster recovery features
for IBM eServer pSeries and IBM RS/6000 UNIX systems.
Geographic Remote Mirror (GeoRM) for AIX protects critical data by duplicating
the most up-to-date data reliably and quickly at a remote location. High
Availability Geographic Cluster (HAGEO) for AIX helps keep mission-critical
systems and applications operational in the event of disasters.
HAGEO provides the geographic mirroring functions of GeoRM and adds automatic
failover and recovery capabilities.
GeoRM
GeoRM is a data mirroring product that provides a point-to-point method
of duplicating the customer data in real-time over unlimited geographic
distances. Since GeoRM is both database and file system independent, there
is no modification required of applications that utilize GeoRM's mirroring
capabilities.
Businesses can be assured that GeoRM is designed to mirror any data destined
for one server (the source server) across any IP-based network to another
server (the target server). A total failure (e.g., CPU, disk, network,
power) of the source server at the local site will not cause the loss
of data on the target server at the remote site.
GeoRM has the ability to continue operations while recovering from a
server failure. Since a target server can support up to seven source servers
in GeoRM, the flexibility to design the correct backup configuration serves
all types of business recovery needs and allows business applications
to continue running on the takeover system while you recover from a disaster
or planned outage. Each of these source and target servers can be as near
(in the same room) or as far (halfway around the world) as required.
GeoRM offers a wide range of mirroring configurations allowing for the
most stringent data integrity mode to a higher performance mode. Data
between the GeoRM sites can be mirrored in three modes:
- Synchronous mode helps ensure that the same data exists on both sites
at the completion of every write. This mode provides a high level of
data integrity.
- Synchronous with mirror write consistency helps ensure that both sites
can be restored with identical data, even in the event of a site failure
in mid-transaction. This mode provides data integrity and better performance
results.
- Asynchronous mode writes on the local disk without waiting for the
remote write to complete. All data may not be on the remote site when
a site failure occurs. This mode is chosen when performance is the highest
priority in disaster recovery.
GeoRM is suitable for all customers, from small and medium-sized companies
to large corporate enterprises. It is scalable and flexible across the
entire range of IBM AIX servers.
HAGEO
HAGEO supports the same critical data mirroring functions as GeoRM like
point-to-point mirroring, three mirroring modes, and backup configuration
flexibility. Not only is data protected, but HAGEO also has built-in features
to automatically respond to site and communication failures and provide
for automatic site takeover.
An HAGEO cluster consists of two geographically separated sites, supporting
a total of eight systems. There are three types of disaster protection:
remote hot backup, remote mutual takeover and concurrent access.
Remote Hot Backup
A remote geographic site is designated as the hot backup site. This backup
site includes hardware, system and application software, and application
data and files. It is live and ready to takeover the current workload.
In the event of a failure, the failed site's application workload automatically
transfers to the remote hot backup site.
Remote Mutual Takeover
Remote mutual takeover takes remote hot backup a little further and allows
geographically separated system sites to be designated as hot backups
for each other. Should either site experience a failure, the other acts
as a hot backup and automatically takes over the designated application
workload of the failed site. Two different workloads running at two different
sites are protected!
Concurrent Access
Concurrent access configurations have systems at both sites concurrently
updating the same database. Users run instances of the same application
at both sites for increased system throughput and extremely fast failover.
HAGEO is one of the few products to have this ability.
Remote System Recovery
Because of the above types of disaster protection, after a failed site
has been restored to operation, HAGEO can resynchronise mission-critical
data and reintegrate the failed system with the remote hot backup. HAGEO
updates the failed system with a current mirror of application data and
files processed by the backup system after the failed system ceased operations.
Upon completing restoration of an up-to-date data and file mirror, the
HAGEO cluster will resume synchronized system operations, including the
mirroring of real-time data and files between the system sites. This can
occur while the remote backup is currently in user operation.
Complete Availability Line-Up
HAGEO is complemented by High Availability Cluster Multiprocessing (HACMP)
for AIX, which can be used for local or campus disaster survivability
with real-time automated fallover and reintegration for up to 32 servers.
HACMP can protect against local system and application failures, preserve
data integrity and consistency, and maintain cluster operations during
unplanned and planned downtime. This strong line-up provides IBM AIX system
customers with a wide choice of high availability and disaster recovery
technologies.
|