MOS上一片關於ASM Rebalance很好的文章

sundog315發表於2014-03-27

 

ORA-15041 IN A DISKGROUP ALTHOUGH FREE_MB REPORTS SUFFICIENT SPACE (文件 ID 460155.1)
--------------------------------------------------------------------------------

修改時間:2013-1-3型別:BULLETIN    
 

In this Document


 Purpose
 Scope
 Details
  Capacity of disks within ASM diskgroup are different
  ASM instance is shutdowned normal/immediate before a rebalance is completed
  Disk is DROPPING / HUNG state
  After an add disk command the rebalance is still in place

--------------------------------------------------------------------------------


Applies to:
Oracle Server - Enterprise Edition - Version 10.1.0.2 to 11.2.0.4 [Release 10.1 to 11.2]
Information in this document applies to any platform.


***Checked for relevance on 02-Jan-2013***
Purpose
ORA-15041 is the most common space error reported when disk free space is not sufficient
to complete file allocation request. Due to imbalanced space distribution, ORA-15041 can still
be encountered although ASM views reports sufficient free space. This article is intended to
help the reader to understand the most common reasons for ORA-15041 error although sufficient
free space is reported in ASM views.
Error:  ORA-15041  (ORA-15041)
Text:   diskgroup space exhausted 
---------------------------------------------------------------------------
Cause:  The diskgroup ran out of space. 
Action: Add more disks to the diskgroup, or delete some existing files. 
Scope
This article is intended to help the reader to understand the most common reasons
for ORA-15041 error in ASM diskgroups.Details
ASM spreads file extents evenly accross all the disks disks on a diskgroup accordingly with
capacity, free space within diskgroup and its redundancy type. Total free Mega-bytes in an
ASM diskgroup is reported in FREE_MB column but the maximum space that can be allocated
actually changes with the type of redundancy as summarized below: 
Redundancy type       Max.Space  ----------------      -------------------------
External              FREE_MB of diskgroup
Normal                1/2 FREE_MB of diskgroup
High                  1/3 FREE_MB of diskgroup
v$ASM_DISK.FREE_MB column is simply a sum of free megabyte space reported in each v$asm_disk.
On the other hand USABLE_FILE_MB in V$ASM_DISKGROUP also indicates the amount of free space,
adjusted for mirroring, that is available for new allocations.
Although FREE_MB and USABLE FILE_MB columns reports sufficient free space, an ORA-15041 error
can still be encountered due to imbalanced free space between disks. The reason for this is
that one disk lacking sufficient free space makes it impossible to do any allocation in a
disk group because every file must be evenly allocated across all disks per ASM stripping
policy.

Unbalanced disk configuration and certain operations on ASM disks can create this type of
problem. The problem frequently should be resolved after a succesfull rebalance as far
as all disks have the same storage capacity and there is no underlying hardware problems. 
The most common reasons can be classified as follows:

1- Capacity of disks within ASM diskgroup are different
2- ASM instance is shutdowned normal/immediate before a rebalance is ompleted.
3- Disk is DROPPING / HUNG state
4- After an add disk command the rebalance is still in place
   
   
   
   
Capacity of disks within ASM diskgroup are different
When an asm diskgroup is having the disks in different capacity, one disk lacking free
space makes it impossible to do any allocation in the other disks as well.
This is expected behaviour because every file must be evenly allocated across
all disks. Rebalancing and allocation attempts to make the percentage of allocated
space about the same on every disk. File allocation may fail with an ORA-15041
in case of imbalanced space distribution.

Extents are allocated evenly accross the disks accordingly with the capacity
(TOTAL_MB) of disks. If all ASM disks are of the same size (e.g. 10 disks, 50GB
each), ASM allocator places extents on each disk in a sequence. The first disk
allocation is chosen randomly, but all subsequent disks for extent allocation
are chosen to evenly spread each file across all disks and to evenly fill all
disks.
   
On the other hand, if ASM disks are not of the same size (e.g. disk 1 is 10GB,
disk 2 is 50GB and disks 3-10 are 10GB), ASM allocator will place one extent on
disk 1, five extents on disk 2, one extent on disk 3 and so on. This is to
ensure balanced disk utilization.

Extent allocation also differs with the type of redundancy. If redundancy is    
NORMAL/HIGH, no allocation is possible when free space in any of the disks
is not sufficient for the requested allocation size. On the other hand, in
an external redundancy diskgroup, ASM distributes the extents evenly across
the disks accordingly with the capacity (TOTAL_MB) of disks and the allocation
continue till there exists at least two disks having enough space to complete
the allocation.

The following test demonstrates the space allocation behaviour according
to redundancy type when there exists disks with different capacity in the
diskgroups.

 
 
  DG_EXT: External redundancy diskgroup with 3 disks: 447M, 447M, 70M bytes.
DG_NOR: Normal redundancy diskgroup with 3 disks: 447M, 447M, 70M bytes.

ASM/Database instance version is 10.2.0.3.
In order to test the file allocation on different redundancy types
(external/normal), files with different sizes are created on diskgroups.
In order to ensure the same amount of free space left after each file creation,
size of the files created on external redundancy diskgroup is double
the amount of the file sizes created on normal redundancy diskgroup. The
following table demonstrates the free space after each file creation and the
point where ORA-15041 error is encountered on each type of diskgroup.

FREE_MB(0): Initial free space at each disk.
FREE_MB(1): After 200M/100M files are created external/normal redundancy
diskgroup.
FREE_MB(2): After 100/50M files are created external/normal redundancy
diskgroup.
FREE_MB(3): After 500M/- file is created on external redundancy diskgroup.
 NAME              TOTAL_MB FREE_MB(0)  FREE_MB(1)  FREE_MB(2) FREE_MB(3)
                                       +200M/+100M +100M/+50M  +500M/-
------------    ---------- ---------- ------------ ---------- ----------
DG_EXT_0001            447        423          330        283         50 **
DG_EXT_0000            447        421          326        279         48 **
DG_EXT_0002             70         66           51         43          8 **
DG_NOR_0001            447        395          301    ORA-15041 *      X
DG_NOR_0000            447        395          301    ORA-15041 *      X
DG_NOR_0002             70         18            1    ORA-15041 *      X* File allocation (50MB) on  normal redundancy diskgroup fails with ORA-15041
when there is no more than 50M at a single disk. This is mainly because normal
redundancy diskgroup can’t allocate primary/secondary extents of file on
separate disks due to insufficient space. ** On the other hand, file allocation (100M + 500M) succeeds for external
redundancy diskgroup as this type of redundancy only stripes data over
available free space in all disks. Further tests show that file creation in
external redundancy is possible as long as there is some space in at least
two disks in the diskgroup.
 ASM instance is shutdowned normal/immediate before a rebalance is completed
A rebalance can be stopped if ASM instance is shutdowned and it is expected
that rebalance should resume after the instance is restarted. However, due to
a known issue (Unpublished Bug 5089819) if ASM instance is shutdowned with
normal/immediate option, rebalance doesn't kick off again upon a new startup.ASM instance requires to either do a shutdown abort or restart rebalance manually.NAME                           GROUP_NUMBER   TOTAL_MB    FREE_MB
------------------------------ ------------ ---------- ----------
DG_NORMAL                                 2        894        386
NAME                           GROUP_NUMBER   TOTAL_MB    FREE_MB
------------------------------ ------------ ---------- ----------
DG_NORMAL_0000                            2        447        193
DG_NORMAL_0001                            2        447        193

     
A new disk is being added:

alter diskgroup dg_normal add disk '/dev/hdb10';
   

select * from v$asm_operation;GROUP_NUMBER OPERA STAT      POWER      SOFAR   EST_WORK   EST_RATE
------------ ----- ---- ---------- ---------- ---------- ----------
 2           REBAL RUN           1          75       194        181

ASM instance is shutdowned:

SQL> shutdown immediate
ASM diskgroups dismounted
ASM instance shutdown
SQL>


Upon startup, there is no relabalance operation going on and free
space in asm disks is not balanced.

NAME                           GROUP_NUMBER   TOTAL_MB    FREE_MB
------------------------------ ------------ ---------- ----------
DG_NORMAL                                 2       1341        781

NAME                           GROUP_NUMBER   TOTAL_MB    FREE_MB
------------------------------ ------------ ---------- ----------
DG_NORMAL_0000                            2        447        240
DG_NORMAL_0001                            2        447        239
DG_NORMAL_0002                            2        447        302

Normally, we should be able create a file with approx. 350-400M as
asm diskgroup reports sufficient space.SQL> alter tablespace test1 add datafile '+DG_NORMAL' size 370m;*
ERROR at line 1:
ORA-01119: error in creating database file '+DG_NORMAL'
ORA-17502: ksfdcre:4 Failed to create file +DG_NORMAL
ORA-15041: diskgroup space exhaustedSQL> alter diskgroup dg_normal rebalance power 11;
SQL> alter tablespace test1 add datafile '+DG_NORMAL' size 370m;

Tablespace altered

A new rebalance remedies this situation. Diskgroup has the following free
space figures after 370MB file is created.
 NAME                           GROUP_NUMBER   TOTAL_MB    FREE_MB
------------------------------ ------------ ---------- ----------
DG_NORMAL                                 2       1341         70NAME                           GROUP_NUMBER   TOTAL_MB    FREE_MB
------------------------------ ------------ ---------- ----------
DG_NORMAL_0000                            2        447         24
DG_NORMAL_0001                            2        447         22
DG_NORMAL_0002                            2        447         24  Disk is DROPPING / HUNG state
 Free space in an asm diskgroup can be imbalanced if a drop disk fails for any
reason (lack of space, disk crash, etc.). Disks may be  stuck in the DROPPING
state in this case.

The most common reasons for DROPPING state are that a careless drop disk command
is submitted on a diskgroup runing with full capacity or dropping the disk reduces
the amount of available disk space to less than that required for all the existing
extents. After a drop disk command, a rebalance is triggered and completed however
there exits disks at DROPPING state in this case.

It is not possible to allocate space from diskgroup any more as no free space is
also reported in v$asm_diskgroup. To resolve the problem, you can either add more
disks to provide extra space or undrop the disk to roll back the drop.

- Add more disks
Adding more disks provides starts a rebalance implicity and provides extra space for the rebalance
to complete. Once the data is copied out of the dropping disks, they will be expelled out of the diskgroup.

alter diskgroup add disk 'path';- Undrop the disk
when an undrop command is issued, it simply rolls back the drop. If the disks
dropping has not gone too far, ASM will be able to re-integrate the disks back into
the diskgroup. UNDROP DISKS triggers a rebalance implicitly which rolls back the drop
and make the space again available to diskgroup. Space should be balanced between disks
once the command is completed.alter diskgroup undrop disks; While disks are runing near to capacity, imagine a drop disk brings the
disk state to HUNG. Drop disk can't be completed as due to lack of space as
current extents can't be fit into the remaining disks.

NAME                             TOTAL_MB    FREE_MB STATE
------------------------------ ---------- ---------- --------
DG2_0001                              447         97 NORMAL
DG2_0000                              447         90 NORMAL
DG2_0002                              447         97 NORMAL


NAME                             TOTAL_MB    FREE_MB TYPE
------------------------------ ---------- ---------- ------
DG2                                  1341        284 EXTERNalter diskgroup DG2 drop disk DG2_0002;
While rebalance is runing, disk state stays at DROPPING but it changes to
HUNG after rebalance is completed.NAME                             TOTAL_MB    FREE_MB STATE
------------------------------ ---------- ---------- --------
DG2_0001                              447          7 NORMAL
DG2_0000                              447          6 NORMAL
DG2_0002                              447        271 DROPPING

 


After rebalance is complete, disk state is HUNG as disk can't be expelled out
from the diskgroup.      NAME                             TOTAL_MB    FREE_MB STATE
------------------------------ ---------- ---------- --------
DG2_0001                              447          0 NORMAL
DG2_0000                              447          0 NORMAL
DG2_0002                              447        284 HUNG


alter diskgroup DG2 undrop disks;
Undrop disk triggers a new rebalance implicitly and resolves the problem.
This state  can also be resolved with ADD DISK by providing extra space for
the rebalance to complete. Once the data is copied out of the dropping disks
they will be expelled out of the diskgroup.   After an add disk command the rebalance is still in place
   
When a disk is added to a disk group its space is not immediately available
for allocation. Since every file must be evenly allocated, extents must be
rebalanced off other disks to the new disk to make space evenly available.
   

Free space will be available in the course of time while rebalance is
progressing. Since rebalance takes a while, users may not be able to allocate
files and could get out of space errors (ORA-15041).

 
  As a workaround, WAIT option with ADD disk command can be used. If the WAIT
option given with add disk, the command doesn't return until rebalance is
complete.  This may provide more intuitive to users who run disks with near
to full capacity.    

  
The following test shows how free space at each disk change while rebalance is
going.

NAME                           GROUP_NUMBER   TOTAL_MB    FREE_MB
------------------------------ ------------ ---------- ----------
TEST_DG                                   2        894         48PATH                 GROUP_NUMBER   TOTAL_MB    FREE_MB
-------------------- ------------ ---------- ----------
/dev/hdb8                       2        447         24
/dev/hdb9                       2        447         24
     
     
Two more 457MB disks are added to diskgroup.
     
alter diskgroup test_dg add disk '/dev/hdb10','/dev/hdb11';Free space is reported immediately in v$asm_diskgroup however it is imbalanced
as rebalance is not completed yet.NAME          TOTAL_MB    FREE_MB(1) FREE_MB(2) ... FREE_MB(3)
------------- ---------- ---------- ----------      ----------
TEST_DG         1788            888        887             885PATH             TOTAL_MB FREE_MB(1) FREE_MB(2) ... FREE_MB(3)
-------------- ---------- ---------- ----------     ----------
/dev/hdb8             447         36        144            221
/dev/hdb9             447         36        144            220
/dev/hdb10            447        409        300            223
/dev/hdb11            447        407        299            221

Free space is getting balanced while the rebalance is progressing. Till the
rebalance is completed large file allocations may fail with ORA-15041 errors
although free space is reported in v$asm_diskgroup.

As a workaround, WAIT option can be used with add disk command.  When the WAIT
option is used, add disk command doesn't return until rebalance is complete.
This may provide more intuitive results when running disks with near to full
capacity.


@However if we tried to do unbalanced disk groups

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/19423/viewspace-1130911/,如需轉載,請註明出處,否則將追究法律責任。

相關文章