Outils personnels

Failover database sur Standard Edition

De wikiGite

Introduction

The goal of this procedure is to install and configure an Oracle failover database. Like Standby or DataGuard databases, the failover db is on another server, but unlike them, Oracle is down on this server, so it requires no license. The instance is started ONLY if the production server crashes, and the production db is down or dead.

To recover the most recent datas from the primary server, a manual “standby -like” process is configured. First we transfer manually the archive logs and a snapshot copy of the control files every 15 minutes from primary to failover server. Then, at night, we transfer the daily Rman backup from primary to failover server.

In case of crash, the failover control files will be replaced by the copy from the primary database, and datafiles will be restored from the backup. Eventually some archive logs are restored from the backup file if rman find that it lakes in the archive directory. As the control files are most recent than restored files, the archived logs are used to recover datas to the last SCN.

The loss of records is maximum of 15 minutes, which is the delay, adjustable in the scheduled task, between two copy of the control files from the production server.

Items to install

In this procedure, the servers will be called TEST1 for the production server, and TEST2 for the failover server.

Oracle 10g

Install Oracle 10g binaries on the two servers, and create a database on each with the same directory tree for datafiles, and the same SID.

It's important because the control files used when activating the failover database will be those from production server, with the SID and all the paths on this server.

We consider that you know how installing Oracle, and here are only the specific configurations.

The databases will have “ORA” for SID in the rest of the procedure. Replace ORA with your SID in the commands and services name.

Put the production database in ARCHIVELOG mode.

On the failover server, go to the “Manage” mmc by right-clicking on “My computer”, then Services and Applications, then Services, and edit the properties of OracleServiceORA. Choose “manual” as startup type, and stop it.

Do the same with the DBConsoleORA service.

CwRsync

Rsync is a tool, coming from unix systems, which synchronize two or more directories on a same computer or between two or more computers.

CwRsync (for “CygWing Rsync”) is a Windows port of this tool. It can synchronize files through two ways :

  • using SSH tunneling, it's the default on most unix machines, where ssh server and client are installed with the system (on Windows servers you must install and configure OpenSSH)
  • using the rsync client-server specific protocol

We'll use here the second solution which is easier to implement.

Get the last version of server and client here :
http://sourceforge.net/project/showfiles.php?group_id=69227&package_id=68081

The latest version is 2.1.5 in the writing of this procedure.

Install by uncompress, then by launching the executable, the server on the production server, and the client on the failover server. Let all parameters by default.

Configuration

We consider in this part that you know how scheduling a task in Windows.

Oracle10g on the production server

Daily backup

First, we create some RMAN scripts to back up the database, the archivelog files, and to generate a snapshot copy of the control files. Put all the files in a proper directory, for example “C:\ORABACKUP\RMAN”.

A directory “C:\ORABACKUP\ORA” must be created too to receive the backups (see the Rman “CONFIGURE CHANNEL” parameter bellow).

Backup command script RMAN_BCK_FULL.BAT:

@echo off
set ORACLE_SID=ORA
rman log=rman_config.log target / cmdfile rman_config.rman
rman log=rman_full.log target / cmdfile rman_bck_full.rman
rman log=rman_valid_restore.rman.log target / cmdfile rman_valid_restore.rman
rman log=rman_maintenance.rman.log target / cmdfile rman_maintenance.rman
rman log=rman_orafiles_copy.rman.log target / cmdfile rman_orafiles_copy.rman

This command file is scheduled at 04:00 AM each day. It launches some Rman scripts. The first, RMAN_CONFIG.RMAN, force the configuration of Rman parameters :

# rman_conf
#This script configure RMAN. It must be run only once.
CONFIGURE DEFAULT DEVICE TYPE TO DISK;
CONFIGURE RETENTION POLICY TO REDUNDANCY 2;
CONFIGURE DEVICE TYPE DISK PARALLELISM 1;
CONFIGURE CHANNEL DEVICE TYPE DISK FORMAT 'C:\orabackup\ORA\%U';
#Autobackup control file to flash_recovery_area
CONFIGURE CONTROLFILE AUTOBACKUP ON;

Then we start the backup with RMAN_BCK_FULL.RMAN :

# rman_bck_full
# This script do a FULL backup
# Run it once a week
# The database must be in ARCHIVELOG mode to do a hot backup
BACKUP AS COMPRESSED BACKUPSET DATABASE PLUS ARCHIVELOG DELETE ALL INPUT ;
BACKUP SPFILE;
ALLOCATE CHANNEL FOR MAINTENANCE TYPE DISK;
DELETE NOPROMPT OBSOLETE DEVICE TYPE DISK;
RELEASE CHANNEL;

We validate the backup with RMAN_VALID_RESTORE.RMAN :

# rman_valid_restore
#Run this script when you want to control that RMAN can restore database with existing backups
RESTORE DATABASE VALIDATE;

We do some maintenance tasks with RMAN_MAINTENANCE.RMAN :

# rman_maintenance
# It check for obsolete backups (see REDUDANCY parameter) and delete them
CROSSCHECK BACKUP OF DATABASE;
CROSSCHECK ARCHIVELOG ALL;
REPORT NEED BACKUP;
DELETE NOPROMPT OBSOLETE;
# To delete a specific backup do this :
#LIST BACKUP;
#DELETE BACKUPPIECE numero_BP;
# Print all existing backups
LIST BACKUP SUMMARY

And finally we back up the SPFILE and the PASSWORD file of the database with RMAN_ORAFILES_COPY.RMAN :

# rman__orafiles_copy
#Run this script when you want to backup (host copy) SPFILE and PWD file
HOST 'COPY C:\oracle\product\10.2.0\db_1\dbs\SPFILEORA.ORA C:\orabackup\ORA\';
HOST 'COPY C:\oracle\product\10.2.0\db_1\database\PWDORA.ORA C:\orabackup\ORA\';

Archive log forced and control files snapshot

At regular intervals, we force a redo log switch to generate an archive log file, and we back up the control files for sending a copy to the failover server. The following script, RMAN_BCK_CONTROLFILE.BAT, is scheduled every 15 minutes :

@echo off
set ORACLE_SID=ORA
del /Q /F c:\oradata\ORA\archives\control.back
sqlplus /nolog @switchlog.sql
rman log=rman_bck_controlfile.log target / cmdfile rman_bck_controlfile.rman

It lauches a sql script SWITCHLOG.SQL :

connect / as sysdba
alter system switch logfile;
exit

It launches also RMAN_BCK_CONTROLFILE.RMAN script :

# rman_bck_controlfile
# This script do a backup of controlfile
BACKUP AS COPY CURRENT CONTROLFILE FORMAT 'C:\oradata\ORA\Archives\control.back';

The copy of the control file is recorded in the “Archives” directory of the database, the same where the archive logs files are created, because all this files will be synchronized with the failover server.

Oracle 10g on the failover server

On this server, we copy all the command and Rman scripts from the production server. The daily backup script will be used if this server become the production server.

You can schedule a task for the backup. Just deactivates it (on the scheduled task properties, uncheck the “Enabled” box at the bottom of the first tab), you will enable it if necessary.

Moreover, we create a script to restore and recover the database if the production server is down. Here is the RMAN_RESTORE_RECOVERY.CMD command file :

@echo off
REM Restore control files
copy C:\oradata\ORA\Archives\control.back C:\oradata\ORA\control01.ctl
copy C:\oradata\ORA\Archives\control.back C:\oradata\ORA\control02.ctl
copy C:\oradata\ORA\Archives\control.back C:\oradata\ORA\control03.ctl
REM delete old redo logs
del /Q /F C:\oradata\ORA\REDO*.LOG
REM Start Windows service
net start OracleServiceORA
REM Restore and recover database
rman log=rman_restore_recovery.log @rman_restore_recovery.rman

It replaces the control files with the last snapshot received from the production server, then delete the redo logs (there will be re-created by the RESETLOGS option of the OPEN statement, later). Then we start the Oracle service.

At this point, the service try to start the database, but the datafiles are not up to date, and the start fails.

Now we mount, restore, recover and open (with RESETLOGS) the database with the Rman script RMAN_RESTORE_RECOVERY.RMAN :

# rman_restore_recovery
#Do a full database reco very
#we need all configuration files :
#spfile, tnsnames.ora, and listener.ora at right location
#Put the DBID on the following line :
SET DBID 3309712888;
CONNECT TARGET /;
STARTUP NOMOUNT;
RUN
{
  ALTER DATABASE MOUNT;
#  SET UNTIL TIME 'SYSDATE-3';
  RESTORE DATABASE CHECK READONLY;
  RECOVER DATABASE;
  ALTER DATABASE OPEN RESETLOGS;
}
#if an ORA-01152 error occurs, do that :
#SQL> recover database until cancel using backup controlfile;
#SQL> Alter database open resetlogs;

Note that we must put the DBID of the production database (which will be restored here) in this file. It's necessary because Oracle is not started up at the launch of Rman, so it can't find this DBID in memory.

To find it, do the following on the production server, on a command line :

C:\> set ORACLE_SID=ORA
C:\> RMAN TARGET /
Recovery Manager: Release 9.2.0.4.0 - 64bit Production

Copyright (c) 1995, 2002, Oracle Corporation.  All rights reserved.

connected to target database: ORA (DBID=3309712888)
using target database controlfile instead of recovery catalog

RMAN> exit
C:\> 

Copy the DBID and report it in the RMAN_RESTORE_RECOVERY.RMAN file.

CwRsync

Server side on the production server

Here, the installation of the server cwRsync must have created a “Rsync_server” Windows service. Check it. Note that a new user has been created for this service (see in the “Log On” tab), be sure to give a READ privilege on the two directories used by Rsync (backup and archive directory, see bellow).

Then go to %PROGRAMFILES%\cwrsync_server directory, and edit the rsyncd.conf file. In this file, we configure the shares which will be managed by the rsync service.

Modify the “[test]” sample block, it becomes a “[orabackup]” share pointing to the directory where are stored the daily backups, then create a “[Oraarchives]” share on the directory where are stored the archive log files and the snapshot copy of the control files :

use chroot = false
strict modes = false
hosts allow = *
log file = rsyncd.log
pid file = rsyncd.pid
# Module definitions
# Remember cygwin naming conventions : c:\work becomes /cygwin/c/work
#
[Orabackup]
path = /cygdrive/c/orabackup/ORA
read only = true
transfer logging = yes

[Oraarchives]
path = /cygdrive/c/oradata/ORA/Archives
read only = true
transfer logging = yes

Note the CygWin notation of the directory path, beginning with “cygdrive”, then the drive letter, then the full path, and the use of “/” instead of “\”.

Restart the rsyncd service in the management console to load the new configuration.

Client side on the failover server

On this server, we create two command scripts. Use a copy of the sample file provided with Rsync client, “cwrsync.cmd” in the directory %PROGRAMFILES%\cwRsync, because all the required variables are initialized at the beginning of the script.

We'll give here only the useful line added at the bottom of the file, note all the variables and comments that we leave unchanged.

The first script, CWRSYNC_ORABACKUP.CMD, synchronizes the backup every night from the production server, it is scheduled at 04:00 AM :

rsync -r TEST1::Orabackup /cygdrive/c/orabackup/ORA --delete
  • The “-r” option means that the remote subdirectories of the share will be synchronized too
  • then we give the rsync server host name and the share name
  • then the local directory (in cygwin notation) to which the files will be synchronized
  • finally, the “--delete” option tells rsync to delete local files if there are deleted on the remote host.

The second script, CWRSYNC_ORAARCHIVES.CMD, synchronizes the archive log files and the control files snapshot, it is scheduled every 15 minutes :

rsync -r TEST1::Oraarchives /cygdrive/c/oradata/ORA/Archives --delete

Steps to switch to the failover database

On normal operations, the production database on TEST1 is up, and the failover database on TEST2 is down.

If a crash occurs on TEST1 :

  • Be sure to completely stop TEST1 host to avoid conflicts (Host IP, Oracle SID...)
  • Launch RMAN_RESTORE_RECOVERY.CMD on TEST2 to recover and start the database
  • Change the IP address of the server, give it the IP of TEST1, so the clients will find it without changing their TNSNAME.ORA file

OR

  • Modify DNS, change the IP of “TEST1” entry to point to “TEST2”

OR

  • Use a DNS alias, just change the hostname behind the alias to point to “TEST2”
  • Disable the scheduled tasks on CWRSYNC_ORABACKUP.CMD and CWRSYNC_ORAARCHIVES.CMD in the tasks properties
  • Enable the backup (RMAN_BCK_FULL.BAT) scheduled task
  • Change the startup type of OracleServiceORA from “manual” to “automatic” in the service management console.

Steps to switch back to the production server

  • Re-install and configure the server, Oracle, Rsync server, and the command and Rman scripts on TEST1.
  • Stop the database on TEST2, actually in production. Back up it by copying all the database files (cold backup) to TEST1
  • Start the database on TEST1
  • Change the IP addresses (on TEST2 and TEST1) OR the DNS, so that the clients find the database on TEST1
  • On test2, change the startup type of OracleServiceORA from “automatic” to “manual”
  • Re-activate the Rsync tasks (Orabackup and Oraarchives)
  • De-activate the backup task