A peek Inside Oracle ASM

Luca Canali, CERN, Jan 2006.

Overview

Oracle ASM (Automatic Storage Management), available with Oracle database 10g provides a solution to streamline storage management and provisioning. ASM provides volume and (cluster) filesystem management where the IO subsystem is directly handled by Oracle kernel [Ref 1,2]. With Oracle 10g and ASM it is possible to build a scalable and highly available storage infrastructure build on low-cost hardware [Ref 3]. A typical example is using SATA HD with fiber channel controllers arranged in a SAN network. A scalable architecture built on low-cost HW is deployed at CERN combining ASM and Oracle 10g RAC on Linux.

Oracle documentation and whitepapers [Ref 1-5] provide the necessary information to set up Oracle ASM instances and configure the storage with it. Configuration details and performance metrics are exposed via a few V$ views. Those are also exposed via the command line interface, asmcmd, and the graphical interface of OEM.

Metadata are however partially hidden to the end user (ASM 10g R1 and R2). That is the mapping between physical storage, ASM allocation units and database files is not completely exposed via V$ views. The author has found that is however possible to query additional information via undocumented X$ tables. For example, it is possible to determine the exact physical allocation on disk of each extent (or mirror copies of extents) for each file allocated on ASM. This kind of information can be put to profit by Oracle practitioners wanting to extend their knowledge on the inner workings of the ASM or to diagnose features and/or bugs in the rebalancing algorithms [Ref 6].

Direct access to datafile (data peeking) and/or emergency data rescue, possibly automated in a small utility, are further applications and extensions of the findings documented here.

 

Oracle ASM Metadata

Database storage is organized by ASM using the following components: each LUN is mapped as a disk (possibly using asmlib on Linux). Disks are grouped together into disk groups. Each disk group can be segmented in more fail groups. Typically 2 fail groups are created per disk group, ASM takes care of writing mirror data in two or more copies allocated in different fail groups. Oracle files are allocated by ASM from the pool of storage defined by the disk groups, as specified by the DBA. ASM takes care of mirroring and striping the data, following the S.A.M.E. concept [Ref 5].

A typical configuration [Ref 1-4] uses 2 disk groups: a data disk group (datafiles, redologs, spfile, tempfile and controfile) and a recovery disk group for the flash recovery area (archive logs, disk backups, redolog and controlfile multiplexed members). Configuration of the disk groups is performed via SQL commands [Ref 4].

 

The following V$ views are available to expose the configuration and some usage statistics, such as the number of physical reads and writes. All these views are accessible from the ASM instance (while some of them display no records when queried from the database instance).

 

View Name

Based on

Description

V$ASM_DISKGROUP

X$KFGRP

performs disk discovery and lists diskgroups

V$ASM_DISKGROUP_STAT

X$KFGRP_STAT

lists diskgroups

V$ASM_DISK

X$KFDSK, X$KFKID

perform disk discovery and lists disks + usage metrics

V$ASM_DISK_STAT

X$KFDSK_STAT, X$KFKID

List disks + usage metrics

V$ASM_FILE

X$KFFIL

lists ASM files (1 row per file)

V$ASM_ALIAS

X$KFALS

lists ASM aliases (files, directories)

V$ASM_CLIENT

X$KFTMTA

lists instances DB instances connected to ASM

V$OPERATION

X$KFGMG

lists running rebalancing operations

N.A.

X$KFFXP

Extent mapping table for ASM files

 

From the table above we can see that the V$ASM_* views are based on X$KF* (i.e. X$ tables with KF as a prefix). There are more of such tables that are not used to build V$ASM_* views: X$KFFXP, X$KFDAT, X$KFCBH, X$KFCCE, X$KFBH, X$KFDPARTNER, X$KFCLLE. Note: the findinds reported here are based on querying the documented dictionary views: V$FIXED_VIEW_DEFINITION and V$FIXED_TABLE.

By querying the undocumented X$ tables listed above the author has found that the extent mapping table for ASM is contained in X$KFFXP (see also Ref 6 and 7).

 

ASM Allocation table

X$KFFXP contains the physical allocation table for each ASM file, that is it contains the mapping between ASM files (identified by columns NUMBER_KFFXP and COMPOUND_KFFXP) and their location on disk is. Space on disk (identified by GROUP_KFFXP, DISK_KFFXP) is segmented in 1MB allocation units (column AU_KFFXP). ASM files are correspondingly allocated in extents that are mapped to the Disk allocation unit. When mirroring is used, each extent is allocated to 2 or 3 allocation units (2-way or 3-way mirroring).

 

X$KFFXP description

 

By querying X$KFFXP on a test database running ASM 10g R2 and RAC the following description for X$KFFXP has been speculated:

 

Column Name

Description

ADDR

table address/identifier

INDX

row identifier

INST_ID

instance number (RAC)

NUMBER_KFFXP

ASM file number. Join with v$asm_file and v$asm_alias

COMPOUND_KFFXP

File identifier. Join with compound_index in v$asm_file

INCARN_KFFXP

File incarnation id. Join with incarnation in v$asm_file

PXN_KFFXP

Extent number per file

XNUM_KFFXP

Logical extent number per file (mirrored extents have the same value)

GROUP_KFFXP

ASM disk group number. Join with v$asm_disk and v$asm_diskgroup

DISK_KFFXP

Disk number where the extent is allocated. Join with v$asm_disk

AU_KFFXP

Relative position of the allocation unit from the beginning of the disk. The allocation unit size (1 MB) in v$asm_diskgroup

LXN_KFFXP

0,1 used to identify primary/mirror extent, 2 identifies file header allocation unit (hypothesis)

FLAGS_KFFXP

N.K.

CHK_KFFXP  

N.K.

 

Example 1 – Direct file retrival from ASM disks

The server parameter file (spfile) for the database instance can be stored in an ASM diskgroup. This is a small file, useful to illustrate how to use the X$KFFXP table.

 

1. We find the disk group and file number of the database spfile:

sys@+ASM1> select GROUP_NUMBER,FILE_NUMBER,BYTES from v$asm_file where type='PARAMETERFILE';

 

GROUP_NUMBER FILE_NUMBER      BYTES

------------ ----------- ----------

           1         267       3584

2. We find the number and location of the extents where the spfile is written:

sys@+ASM1> select DISK_KFFXP,AU_KFFXP,PXN_KFFXP,XNUM_KFFXP,LXN_KFFXP from x$kffxp where GROUP_KFFXP=1 and NUMBER_KFFXP=267;

 

DISK_KFFXP   AU_KFFXP  PXN_KFFXP XNUM_KFFXP  LXN_KFFXP

---------- ---------- ---------- ---------- ----------

        24       3820          0          0          0

         0        176          1          0          1

 

3. From steps 1. and 2. above we know that the spfile is 3584 bytes long and is stored in 2 mirrored extents: one on disk 0, the other on disk 24 (on disk group). We can find the OS path of the disks with the following query (note the test system used was on Linux using asmlib):

sys@+ASM1> select failgroup,disk_number,path from v$asm_disk where GROUP_NUMBER=1 and DISK_NUMBER in (0,24);    

FAILGROUP  DISK_NUMBER PATH

---------- ----------- --------------------

FG1                 24 ORCL:ITSTOR08_2_EXT

FG2                  0 ORCL:ITSTOR11_10_EXT

 

4.  We can now confirm with OS commands that the mapping is correct. ‘dd’ allows the sysadmin to read the disks directly (bs=1M is the block size, while skip=176 means that the command starts reading at the offset 176M) . Using the disk name found in step 3 (only disk 0 demonstrated here) and the offsets found in step 2 we can confirm that the spfile data is at the expected physical location.

$ dd if=/dev/oracleasm/disks/ITSTOR11_10_EXT bs=1M count=1 skip=176|strings|head -4

test12.__db_cache_size=1476395008

test11.__db_cache_size=1476395008

test12.__java_pool_size=16777216

test11.__java_pool_size=16777216

 

We can see the first 4 lines of the spfile are printed out, as expected.

 

Example 2 – 100MB datafile Extent Allocation

This example has the goal to query datafile allocation mapping on ASM for a diskgroups with normal redundancy (2-way mirroring). A tablespace with 1 datafile of size 100MB has been created on an ASM diskgroup, composed of 4 disks arranged in two failgroups: disk 20 and 24 (failgroup FG1), disk 7 and 8 make (failgroup FG2). The space allocation mapping (mapping between ASM file extents and allocation units on disk) for such a file can be queried like this:

 

select DISK_KFFXP,AU_KFFXP,PXN_KFFXP,XNUM_KFFXP,LXN_KFFXP from x$kffxp where GROUP_KFFXP=1 and NUMBER_KFFXP=271;

 

DISK_KFFXP   AU_KFFXP  PXN_KFFXP XNUM_KFFXP  LXN_KFFXP

---------- ---------- ---------- ---------- ----------

        20      13682          0          0          0

         7      14358          1          0          1

        24      13699          2          1          0

         8      14364          3          1          1

         8      14366          4          2          0

        24      13678          5          2          1

. . .

. . .

         8      14490        198         99          0

        24      13750        199         99          1

        20      13791        200        100          0

         7      14493        201        100          1

         7      14384          0 2147483648          0

        24      13714          1 2147483648          1

     65534 4294967294          2 2147483648          2

 

205 rows selected.

 

We can see that 205 allocation units of 1MB each are listed for the tablespace created with 100MB size. The tablespace’s datafile size is 101MB (the extra megabyte is allocated by Oracle for tablespace internal structures). Since the used diskgroup has ‘normal redundancy’, i.e. 2-way mirroring, this explains 101x2=202 allocation units. The remaining 3 allocation units are listed as the last 3 rows in the x$kffxp table and are identifiable by XNUM_KFFXP=2147483648. These are most likely file headers and/or metadata (to be further investigated).

One of the goals of ASM is providing uniform allocation of datafiles across the available spindles. This is to maximize throughput and IOPS count. The following query investigates how the extents are allocated across the four available disks in the test diskgroup used for this example.

 

select DISK_KFFXP, count(*)

    from x$kffxp

    where GROUP_KFFXP=1

        and NUMBER_KFFXP=271

        and XNUM_KFFXP!=2147483648

    group by DISK_KFFXP ;

 

DISK_KFFXP   COUNT(*)

---------- ----------

         7         51

         8         50

        20         51

        24         50

 

We can see that space is allocated in a uniform way between the different disks (and failgroups). With the following query we can see that the primary and mirror extents (identified by the value of LXN_KFFXP) are mixed uniformly for each disk. Note: ASM needs only to read the primary extent of a mirrored pair for read-only operations (while both extents need to be accessed for a write operation).

 

sys@+ASM1>

    select DISK_KFFXP, LXN_KFFXP, count(*)

    from x$kffxp

    where GROUP_KFFXP=1

          and NUMBER_KFFXP=271

        and XNUM_KFFXP!=2147483648

    group by

           DISK_KFFXP, LXN_KFFXP

    order by

          DISK_KFFXP, LXN_KFFXP;

 

DISK_KFFXP  LXN_KFFXP   COUNT(*)

---------- ---------- ----------

         7          0         25

         7          1         26

         8          0         25

         8          1         25

        20          0         26

        20          1         25

        24          0         25

        24          1         25

 

Example 3 – diskgroup rebalancing

This example illustrates the outcome of a ‘online’ diskgroup enlargement operation: four additional disks have been added to the diskgroup used in the example 2 above (disks 2, 3, 0, 1 will be added) and the diskgroup has then be rebalanced (see SQL below). After this operation the ASM space allocation map for the test datafile has been queried again. The outcome, shown below, is that the file is again uniformly spread over all the disks available in the diskgroup, as expected.

 

alter diskgroup test1_datadg1

add

failgroup fg1 disk 'ORCL:ITSTOR08_1_EXT','ORCL:ITSTOR08_3_EXT'

failgroup fg2 disk 'ORCL:ITSTOR11_3_EXT','ORCL:ITSTOR11_4_EXT';

alter diskgroup test1_datadg1 rebalance power 4;

 

select a.disk_number, a.failgroup, count(*) N#_AU

from x$kffxp x join v$asm_disk a

     on x.disk_kffxp=a.disk_number and x.GROUP_KFFXP=a.group_number

where GROUP_KFFXP=1

and NUMBER_KFFXP=271

and XNUM_KFFXP!=2147483648

group by a.failgroup, a.disk_number

order by a.failgroup, a.disk_number;

 

DISK_NUMBER FAILGROUP       N#_AU

----------- ---------- ----------

          2 FG1                25

          3 FG1                25

         20 FG1                26

         24 FG1                25

          0 FG2                25

          1 FG2                26

          7 FG2                25

          8 FG2                25

 

We can see that the datafile is spread uniformly across the disks and correctly mirrored across the two failgroups.

 

Summary

Oracle ASM is a powerful and easy to use volume manager and filesystem for Oracle 10g and 10g RAC. Configuration details and performance metrics of the configured ASM disks and diskgroups are exposed via V$ASM_* views. However, the space allocation mapping (ASm file extents to disk allocation unit mapping) is not fully documented. This paper details how queries on the X$KFFXP internal table can be used to work around this limitation. A set of working examples has been discussed to demonstrate the findings and to directly explore some inner workings of ASM. As expected, it was found that datafiles are automatically spread over the available disks in a diskgroup, that mirroring is taken care by ASM and it is done at the extent level (as opposed to volume-level mirroring found in many other volume managers), and that ‘online’ disk additions to a diskgroup allow to spread datafiles uniformly over a large number of spindles in a ‘transparent’ way and can be used to improve performance and possibly reduce the impact of ‘hot spots’.

A few open points remain to be investigated, such as: the role of the ‘extra 3 allocation unit’ allocated for each datafile, that were documented in example N.2 (see rows where XNUM_KFFXP=2147483648). Rebalancing operations have been demonstrated to (re)distribute datafile extents uniformly over the available disks. From the example given it seems that ASM rebalancing algorithm does not utilize workload metrics (from v$asm_disk_stat) to spread datafiles (such as spread apart hot parts of the datafiles). A few additional X$KF* tables have been identified (see above), but their purpose has not been yet documented.

 

References

 

1. N. Vengurlekar, 2005, http://www.oracle.com/technology/products/database/asm/pdf/asm_10gr2_bptwp_sept05.pdf

2. A. Shakian, OOW 2005, take the guesswork out of db tuning, http://www.oracle.com/pls/wocprod/docs/page/ocom/technology/products/database/asm/pdf/take%20the%20guesswork%20out%20of%20db%20tuning%2001-06.pdf

3. J Loaiza and S Lee, OTN 2005, http://www.oracle.com/technology/deploy/availability/pdf/1262_Loaiza_WP.pdf

4. Oracle 10g administrator’s guide, “Using Automatic Storage Management”

5. J Loaiza, OTN 1999, Optimal storage configuration made easy, http://www.oracle.com/technology/deploy/availability/pdf/oow2000_same.pdf

6. Metalink Bug N. 4306135

7. S. Rognes, 2004, posting to oracle-l mailing list