Resize diskgroup without downtime in Exadata [ID 1272569.1]

ruanrong發表於2011-08-22

Provide a smooth and repeatable procedure to resize Exadata Disk Groups without re-installation or system outage.


CAUTION 1: DO NOT TRY TO RESIZE THE DISK, WHEN DISK GROUP’S SPACE IS EXHAUSTED. You need minimum free space of “required_mirror_free_mb” for these methods to work. Use the following SQL to check the current space usage.


SQL> select name, total_mb, free_mb, required_mirror_free_mb from v$asm_diskgroup;

Note: Result of the query for “free_mb” need to be greater than “required_mirror_free_mb”.


CAUTION 2: BEFORE YOU START THE RESIZE OF DISKS, TAKE A CLEAN BACKUP OF DATABASE.


CAUTION 3: CHECK THE SPACE USAGE IN EACH DISK. IF ANY REASON ONE OR MORE OF THE MOUNTED DISKS ARE SHOWING THE FREE SPACE CLOSER TO 0 (ZERO) MB, THEN IT MAY CAUSE FAILURE OF REBALANCING.


SQL Query to check disk space
SQL> select name,total_mb,free_mb from v$asm_disk where mount_status='CACHED';

[@more@]Diskgroup resizing on the Database Machine after installation and data loading is a big challenge for customers. This note is an attempt to address this challenge and provide different methods to resize the disk groups without downtime and avoid interleaving of disk partitions.

There are three methods, by which you can achieve this. I will discuss three methods.

The most time consuming step is ASM rebalancing. We have optimized the rebalancing in the different methods to reduce total time for the entire operations.

Method-1:
Best method, number of rebalance operations equals to number of cells + 1 multiplied by the number of disk groups ( rebalance ops = (#cells+1) x #DGs)). Using this method we maintain redundancy all the time. But this method need at least TWICE free space of “required_mirror_free_mb”.

Method-2:
Number of rebalance operations equals to number of cells multiplied by the number of disk groups ( rebalance ops = ( #cells x #DGs)). Using this method redundancy will not be maintained for a small window of rebalance. Minimum free space required is equal to required_mirror_free_mb”.

Method-3:
EASY and SIMPLE. Number of rebalance operations equals to double number of cells (2 x # cells) multiplied by the number of disk groups ( rebalance ops = (#cells x 2 x #DGs )). Using this method we maintain redundancy all the time. But time spent rebalancing is too long because of the high number of rebalance operations is high. Minimum free space required is equal to “required_mirror_free_mb”.

--- Document History ---

Author: Amit Das,Philip Newlan, Aiman Al-Khammash,Su Tang, Mike Pettigrew

Create Date 08-DEC-2010
Update Date dd-mon-yyyy
Expire Date dd-mon-yyyy (ignore after this date)

--- Resize diskgroup without downtime in Exadata ---


These steps are true for any size Exadata configuration.
I am showing the steps for a ½ RAC Database Machine Configuration with the following compute and cell node names:
Compute Nodes: exacomp1, exacomp2, exacomp3, and exacomp4
Cell Nodes: exacell1, exacell2, exacell3, … and exacell7
Diskgroup Names: DATA and RECO.

Method-1

Step 1 : Check the CAUTION 1, CAUTION 2, and CAUTION 3. Also check the free space for the diskgroups. It needs TWICE free space of “required_mirror_free_mb”.
If you qualify, then proceed with this method.

Step 2: Take a complete backup of DATABASE.

Step 3: Log into the ASM instance as sysasm on the first compute node (e.g. exacomp1) to check the status in GV$ASM_OPERATION. Check the gv$asm_operation view. Select * on this view should return ZERO record to proceed for next step.

Step 4: Drop disks of failgroup for first cell1 (e.g. EXACELL1) from DATA Diskgroup using compute node-1. In exadata each cell represent as individual failgroup.


SQL> alter diskgroup DATA drop disks in failgroup EXACELL1 rebalance power 11 NOWAIT;

Note: The default power is 4, and max is 11. Depending on the database load, you can choose the appropriate power to avoid impacting your business processes.

Step 5: Drop disks of failgroup for first cell1 (e.g. EXACELL1) from RECO Diskgroup using compute node-2. Log into the ASM instance as sysasm on second compute node (e.g. exacomp2).


SQL> alter diskgroup RECO drop disks in failgroup EXACELL1 rebalance power 11 NOWAIT;

Note: We are using second compute node, so we can do the rebalance in parallel for RECO diskgroup. If we run the “Step 5” from first compute node, then rebalance will be serialized followed by rebalance of diskgroup DATA.

Step 6: Wait until it finishes REBALANCING of all the diskgroups. To monitor the progress of the rebalance, please monitor the “GV$ASM_OPERATION” from ASM instance.

SQL> select * from gv$asm_operation;
You will see a similar output, when rebalance is in progress.
INST_ID GROUP_NUMBER OPERA STAT POWER ACTUAL SOFAR EST_WORK
---------- ------------ ----- ---- ---------- ---------- ---------- ----------
EST_RATE EST_MINUTES ERROR_CODE
---------- ----------- --------------------------------------------
1 3 REBAL RUN 11 11 13201 39425
6405 4


When rebalance finishes, this view will return ZERO records.

Before proceeding to the next step, please double check the DROPPED disk status. Use the following query:

SQL> column header_status heading HSTATUS format A15
SQL> column name format A35
SQL> column path format A55
SQL> set lines 200
SQL> set pages 100
SQL> select mount_status,name,failgroup,path from v$asm_disk where FAILGROUP='EXACELL1';

Note: MOUNT_STATUS should not be “CACHED” for DROPPED Disks.

Step 7: Drop griddisk from relevant cell (e.g. EXACELL1)
Go to the correct cell, run cellcli command to perform the following operations.
#Drop griddisk

cellcli> drop griddisk all prefix='RECO' force;
cellcli> drop griddisk all prefix='DATA' force;

#To verify the diskgroup status run:
cellcli >list griddisk

Step 8: Recreate the griddisk on relevant cell (e.g. EXACELL1) #Create the griddisk with correct size in GigaBytes using cellcli command on the relevant cell.

cellcli> CREATE GRIDDISK ALL PREFIX='DATA', size=inGigaByte; //like 300G//
cellcli> CREATE GRIDDISK ALL PREFIX='RECO', size=inGigaByte;

#Check the griddisk information, whether it has satisfied your requirement.

cellcli> list griddisk DATA_CD_01_ detail
cellcli> list griddisk RECO_CD_01_ detail

Step 9: It is the most important step to reduce rebalance counts. Follow the exact process like below for DATA diskgroup
• Add the griddisks back from relevant cell to the DATA diskgroup and drop disks of next cell.

#Sample command is below for DATA Disk Group, build your own command with correct IP address and cell name

SQL> alter diskgroup DATA add disk
'o/192.168.62.9/DATA_CD_03_exacell1', 'o/192.168.62.9/DATA_CD_08_exacell1',
'o/192.168.62.9/DATA_CD_04_exacell1', 'o/192.168.62.9/DATA_CD_01_exacell1',
'o/192.168.62.9/DATA_CD_10_exacell1', 'o/192.168.62.9/DATA_CD_07_exacell1',
'o/192.168.62.9/DATA_CD_11_exacell1', 'o/192.168.62.9/DATA_CD_00_exacell1',
'o/192.168.62.9/DATA_CD_09_exacell1', 'o/192.168.62.9/DATA_CD_06_exacell1',
'o/192.168.62.9/DATA_CD_02_exacell1', 'o/192.168.62.9/DATA_CD_05_exacell1' drop disks in failgroup EXACELL2 rebalance power 11 NOWAIT;

Step 10: Do the exact same thing like Step 9 for RECO diskgroup from second compute node to run the ASM rebalance in parallel for RECO Diskgroup.
• Add the griddisks back from relevant cell (e.g. EXACELL1) to the RECO diskgroup and Dropp the disks of next cell (e.g. EXACELL2) from RECO diskgroup with appropriate rebalance power.
#Build a similar command for the RECO Disk Group like Step 9.

Step 11: Wait for Rebalance of all the diskgroups until it finishes. Use the script of Step 6 to monitor.

Step 12: Repeat the Step 6 to Step 11 for remaining cells.


Method-2

Step 1 : Check the CAUTION 1, CAUTION 2, and CAUTION 3. Also check the free space for the diskgroups. It needs EQUAL free space of “required_mirror_free_mb”.
If you qualify, then proceed with this method.

Step 2: Take a complete backup of DATABASE.

Step 3: Log into the ASM instance as sysasm on first compute node (e.g. exacomp1) to check the status in GV$ASM_OPERATION.

Check the gv$asm_operation view. Select * on this view should return ZERO record to proceed for next step.

Step 4: Drop the disks of failgroup for first cell1 (e.g. EXACELL1) from DATA Diskgroup using compute node-1. In Exadata each cell represent as individual failgroup.

Drop failgroup command from SQL prompt of ASM

sql> alter diskgroup DATA drop disks in failgroup EXACELL1 force rebalance power 0;
sql> alter diskgroup RECO drop disks in failgroup EXACELL1 force rebalance power 0;

Note: I am using force option rebalance power 0, because I do not want to rebalance while running the DROP disk command.

Step 5: Drop griddisk from relevant cell (e.g. EXACELL1)
Go to the correct cell, for the example, I will login to ‘cellcli’ on exacell1.
#Drop griddisk using cellcli command from cell

cellcli> drop griddisk all prefix='RECO' force;
cellcli> drop griddisk all prefix='DATA' force;

#To verify the diskgroup status run
cellcli >list griddisk

Step 6: Recreate the griddisk on relevant cell (e.g. EXACELL1)

#Create the griddisk with correct size from cellcli command

cellcli> CREATE GRIDDISK ALL PREFIX='DATA', size=inGigaByte; //like 300G//
cellcli> CREATE GRIDDISK ALL PREFIX='RECO', size=inGigaByte

#Check the griddisk information, whether it has satisfied your requirement.

cellcli> list griddisk DATA_CD_01_ detail
cellcli> list griddisk RECO_CD_01_ detail


Step 7: Add the disk back for DATA diskgroup from compute node-1.

Add the griddisks back from the relevant cell to the DATA diskgroup with rebalance power 11
#Sample command is below for DATA Disk Group, build your own command with correct IP address and cell name

Adding disk with rebalance power 11
sql> alter diskgroup DATA add disk
'o/192.168.62.9/DATA_CD_03_exacell1', 'o/192.168.62.9/DATA_CD_08_exacell1',
'o/192.168.62.9/DATA_CD_04_exacell1', 'o/192.168.62.9/DATA_CD_01_exacell1',
'o/192.168.62.9/DATA_CD_10_exacell1', 'o/192.168.62.9/DATA_CD_07_exacell1',
'o/192.168.62.9/DATA_CD_11_exacell1', 'o/192.168.62.9/DATA_CD_00_exacell1',
'o/192.168.62.9/DATA_CD_09_exacell1', 'o/192.168.62.9/DATA_CD_06_exacell1',
'o/192.168.62.9/DATA_CD_02_exacell1', 'o/192.168.62.9/DATA_CD_05_exacell1'
rebalance power 11 NOWAIT;

Step 8: Do the exact same thing like Step 7 for RECO diskgroup from second compute node to run the ASM rebalance in parallel.

Note: We are using second compute node, so we can do the rebalance in parallel for RECO diskgroup. If we run the “Step 9” from first node, then rebalance will be serialized followed by rebalance of diskgroup DATA.

Step 9: Wait for Rebalance of all the diskgroups until it finishes. To monitor the progress of the rebalance, please monitor the “GV$ASM_OPERATION”

sql> select * from gv$asm_operation;

You will see a similar output, when rebalance is in progress.
INST_ID GROUP_NUMBER OPERA STAT POWER ACTUAL SOFAR EST_WORK
---------- ------------ ----- ---- ---------- ---------- ---------- ----------
EST_RATE EST_MINUTES ERROR_CODE
---------- ----------- --------------------------------------------
1 3 REBAL RUN 11 11 13201 39425
6405 4

When rebalance finishes, this view will return ZERO records.

Before proceeding to the next step, please double check the DROPPED disk status. Use the following query:

sql> column header_status heading HSTATUS format A15
sql> column name format A35
sql> column path format A55
sql> set lines 200
sql> set pages 100
sql> select mount_status,name,failgroup,path from v$asm_disk where FAILGROUP='EXACELL1';
MOUNT_STATUS should not be “CACHED” for DROPPED Disks.

Step 10: Repeat the Step 4 to Step 9 for remaining cells.

Method 3:

Step 1 : Check the CAUTION 1, CAUTION 2, and CAUTION 3.

Also check the free space for the diskgroups. It need EQUAL free space of “required_mirror_free_mb”.
If you qualify, then proceed with this method.

Step 2: Take a complete backup of DATABASE.

Step 3: Log into the ASM instance as sysasm on first compute node (e.g. exacomp1) to check the status in GV$ASM_OPERATION.
Check the gv$asm_operation view. Select * on this view should return ZERO record to proceed for next step.

Step 4: Drop the disks of failgroup for first cell1 (e.g. EXACELL1) from DATA Diskgroup using compute node-1. In Exadata each cell represent as failgroup.

sql> alter diskgroup DATA drop disks in failgroup EXACELL1 rebalance power 11 NOWAIT;

Note: The default power is 4, and max is 11. Depending on the database load, you can choose the appropriate power to avoid impacting your business processes.

Step 5: Drop the disks of failgroup for first cell1 (e.g. EXACELL1) from RECO Diskgroup using compute node-2. Log into the ASM instance as sysasm on second compute node (e.g. exacomp2).

sql> alter diskgroup RECO drop disks in failgroup EXACELL1 rebalance power 11 NOWAIT;

Note: We are using second compute node, so we can do the rebalance in parallel for RECO diskgroup. If we run the “Step 5” from first node, then rebalance will be serialized followed by rebalance of diskgroup DATA.

Step 6: Wait for Rebalance of all the diskgroups until it finishes. To monitor the progress of the rebalance, please monitor the “GV$ASM_OPERATION”

sql> select * from gv$asm_operation; You will see a similar output, when rebalance is in progress.

INST_ID GROUP_NUMBER OPERA STAT POWER ACTUAL SOFAR EST_WORK
---------- ------------ ----- ---- ---------- ---------- ---------- ----------
EST_RATE EST_MINUTES ERROR_CODE
---------- ----------- --------------------------------------------
1 3 REBAL RUN 11 11 13201 39425
6405 4
When rebalance finishes, this view will return ZERO records.

Before proceeding to the next step, please double check the DROPPED disk status. Use the following query:

sql> column header_status heading HSTATUS format A15
sql> column name format A35
sql> column path format A55
sql> set lines 200
sql> set pages 100
sql> select mount_status,name,failgroup,path from v$asm_disk where FAILGROUP='EXACELL1';

MOUNT_STATUS should not be “CACHED” for DROPPED Disks.

Step 7: Drop griddisk from relevant cell (e.g. EXACELL1).
Go to the correct cell, for the example, I will login to ‘cellcli’ on exacell1.

#Drop griddisk
cellcli> drop griddisk all prefix='RECO' force;
cellcli> drop griddisk all prefix='DATA' force;
#To verify the diskgroup status run
cellcli >list griddisk

Step 8: Recreate the griddisk on relevant cell (e.g. EXACELL1)

#Create the griddisk with correct size
cellcli> CREATE GRIDDISK ALL PREFIX='DATA', size=inGigaByte; //like 300G//
cellcli> CREATE GRIDDISK ALL PREFIX='RECO', size=inGigaByte

#Check the griddisk information, whether it has satisfied your requirement.

cellcli> list griddisk DATA_CD_01_ detail
cellcli> list griddisk RECO_CD_01_ detail

Step 9: Add the disk back for DATA diskgroup from compute node-1. Add the griddisk back from relevant cell to the DATA diskgroup with rebalance power 11
#Sample command is below for DATA Disk Group, build your own command with correct IP address and cell name

Adding disk with rebalance power 11
sql> alter diskgroup DATA add disk
'o/192.168.62.9/DATA_CD_03_exacell1', 'o/192.168.62.9/DATA_CD_08_exacell1',
'o/192.168.62.9/DATA_CD_04_exacell1', 'o/192.168.62.9/DATA_CD_01_exacell1',
'o/192.168.62.9/DATA_CD_10_exacell1', 'o/192.168.62.9/DATA_CD_07_exacell1',
'o/192.168.62.9/DATA_CD_11_exacell1', 'o/192.168.62.9/DATA_CD_00_exacell1',
'o/192.168.62.9/DATA_CD_09_exacell1', 'o/192.168.62.9/DATA_CD_06_exacell1',
'o/192.168.62.9/DATA_CD_02_exacell1', 'o/192.168.62.9/DATA_CD_05_exacell1'
rebalance power 11 NOWAIT;

Step 10: Do the exact same thing like Step 9 for RECO diskgroup from second compute node to run the ASM rebalance in parallel.

Step 11: Repeat all steps serially from Step 4 to Step 10 for the remaining cells.

--- Summary ---

This procedure provides a simple way to re-size all of your diskgroups with no system downtime.
The rebalance steps are time consuming and it may take 2 days to finish the entire process for a full rack Database Machine. Time for this entire operation will depend on the number of storage cells, existing disk usage and load on the system.

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/1933/viewspace-1054175/,如需轉載,請註明出處,否則將追究法律責任。

相關文章