A peek Inside Oracle ASM Metadata

Luca Canali, CERN, Jan 2006.

Overview

Oracle ASM (Automatic Storage Management) is a feature of Oracle 10g intended to streamline storage management and provisioning. ASM is a cluster filesystem and volume manager specialized for Oracle workload [Ref 1,2]. With Oracle 10g and ASM it is possible to build a scalable and highly available storage infrastructure build on low-cost hardware [Ref 3]. A typical example is using SATA HD with fiber channel controllers arranged in a SAN network. A scalable architecture built on low-cost HW is deployed at CERN combining ASM and Oracle 10g RAC on Linux [Ref 4].

Oracle documentation and whitepapers [Ref 1,2,3,5] provide the necessary information to set up Oracle ASM instances and configure the storage with it. Configuration details and performance metrics are exposed via a few V$ views. Other possibilities are the command line interface, asmcmd (10g R2) and the graphical interface of OEM.

Metadata are however partially hidden to the end user. That is the mapping between physical storage, ASM allocation units, and database files is not completely exposed via V$ views. The author has found that is however possible to query such information via undocumented X$ tables. For example, it is possible to determine the exact physical location on disk of each extent (or mirror copies of extents) for each file allocated on ASM (and if needed access the data directly via the OS). This kind of information can be put to profit by Oracle practitioners wanting to extend their knowledge of the inner workings of the ASM or wanting to diagnose hotspots and ASM rebalancing issues [Ref 6].

Direct access to ASM files (data peeking), possibly automated in a small ‘utility’, for didactical scopes or for emergency data rescue is a possible extension of the findings documented here (see also at the end of this document, ‘addendum’ link a)

 

Oracle ASM Recap

Database storage is organized by ASM using the following components: each LUN is mapped as a disk (possibly using asmlib on Linux). Disks are grouped together into disk groups. Each disk group can be segmented in one or more fail groups. Typically 2 fail groups are created for a normal redundancy disk group, ASM takes then care of writing mirror data in two copies allocated in different fail groups [Ref 5]. Oracle files are allocated by ASM from the pool of storage defined by the disk groups, as specified by the DBA. ASM takes care of mirroring and striping the data, following the S.A.M.E. concept [Ref 7].

A typical configuration [Ref 1-5] uses 2 disk groups: a data disk group (datafiles, redologs, spfile, tempfile and controlfile) and a recovery disk group for the flash recovery area (archive logs, disk backups, redolog and controlfile multiplexed members). Configuration of the disk groups is performed via SQL commands [Ref 4].

 

The following V$ views are available to expose the configuration and some usage statistics, such as the number of physical reads and writes. All these views are accessible from the ASM instance (while some of them display no records when queried from the database instance).


 

View Name

Based on

Description

V$ASM_DISKGROUP

X$KFGRP

performs disk discovery and lists diskgroups

V$ASM_DISKGROUP_STAT

X$KFGRP_STAT

lists diskgroups

V$ASM_DISK

X$KFDSK, X$KFKID

perform disk discovery and lists disks + usage metrics

V$ASM_DISK_STAT

X$KFDSK_STAT, X$KFKID

List disks + usage metrics

V$ASM_FILE

X$KFFIL

lists ASM files (1 row per file)

V$ASM_ALIAS

X$KFALS

lists ASM aliases (files, directories)

V$ASM_CLIENT

X$KFTMTA

lists instances DB instances connected to ASM

V$OPERATION

X$KFGMG

lists running rebalancing operations

N.A.

X$KFFXP

Extent mapping table for ASM files

 

From the table above we can see that the V$ASM_* views are based on X$KF* (i.e. X$ tables with KF as a prefix). There are more of such tables that are not used to build V$ASM_* views: X$KFFXP, X$KFDAT, X$KFCBH, X$KFCCE, X$KFBH, X$KFDPARTNER, X$KFCLLE. Note: the findings reported here are based on querying the documented dictionary views: V$FIXED_VIEW_DEFINITION and V$FIXED_TABLE.

By querying the undocumented X$ tables listed above the author has found that the extent mapping table for ASM is contained in X$KFFXP (see also Ref 7 and 8).

 

ASM Allocation table

X$KFFXP contains the physical allocation table for each ASM file, that is it contains the mapping between ASM files (identified by columns NUMBER_KFFXP and COMPOUND_KFFXP) and their location on disk is. Space on disk (identified by GROUP_KFFXP, DISK_KFFXP) is segmented in 1MB chunks called allocation units (column AU_KFFXP). ASM files are correspondingly allocated in extents that are mapped to the Disk allocation unit. When mirroring is used, each extent is allocated to 2 or 3 allocation units (2-way or 3-way mirroring).

X$KFFXP description

By querying X$KFFXP on a test database running ASM 10g R2 and RAC the following description for X$KFFXP has been speculated:

 

Column Name

Description

ADDR

table address/identifier

INDX

row identifier

INST_ID

instance number (RAC)

NUMBER_KFFXP

ASM file number. Join with v$asm_file and v$asm_alias

COMPOUND_KFFXP

File identifier. Join with compound_index in v$asm_file

INCARN_KFFXP

File incarnation id. Join with incarnation in v$asm_file

PXN_KFFXP

Extent number per file

XNUM_KFFXP

Logical extent number per file (mirrored extents have the same value)

GROUP_KFFXP

ASM disk group number. Join with v$asm_disk and v$asm_diskgroup

DISK_KFFXP

Disk number where the extent is allocated. Join with v$asm_disk

AU_KFFXP

Relative position of the allocation unit from the beginning of the disk. The allocation unit size (1 MB) in v$asm_diskgroup

LXN_KFFXP

0,1 used to identify primary/mirror extent, 2 identifies file header allocation unit (hypothesis)

FLAGS_KFFXP

N.K.

CHK_KFFXP  

N.K.

 

Example 1 – Direct file retrival from ASM disks

The server parameter file (spfile) for the database instance is stored in an ASM diskgroup. This is a small file, useful to illustrate how to use the X$KFFXP table.

 

1. We find the disk group and file number of the database spfile:

sys@+ASM1> select GROUP_NUMBER,FILE_NUMBER,BYTES from v$asm_file where type='PARAMETERFILE';

 

GROUP_NUMBER FILE_NUMBER      BYTES

------------ ----------- ----------

           1         267       3584

2. We find the number and location of the extents where the spfile is written:

sys@+ASM1> select DISK_KFFXP,AU_KFFXP,PXN_KFFXP,XNUM_KFFXP,LXN_KFFXP from x$kffxp where GROUP_KFFXP=1 and NUMBER_KFFXP=267;

 

DISK_KFFXP   AU_KFFXP  PXN_KFFXP XNUM_KFFXP  LXN_KFFXP

---------- ---------- ---------- ---------- ----------

        24       3820          0          0          0

         0        176          1          0          1

 

3. From steps 1. and 2. above we know that the spfile is 3584 bytes long and is stored in 2 mirrored extents: one on disk 0, the other on disk 24 (on disk group). We can find the OS path of the disks with the following query (note the test system used was on Linux using asmlib):

sys@+ASM1> select failgroup,disk_number,path from v$asm_disk where GROUP_NUMBER=1 and DISK_NUMBER in (0,24);    

FAILGROUP  DISK_NUMBER PATH

---------- ----------- --------------------

FG1                 24 ORCL:ITSTOR08_2_EXT

FG2                  0 ORCL:ITSTOR11_10_EXT

 

4.  We can now confirm with OS commands that the mapping is correct. ‘dd’ allows the sysadmin to read the disks directly (bs=1M is the block size, while skip=176 means that the command starts reading at the offset 176M) . Using the disk name found in step 3 (only disk 0 demonstrated here) and the offsets found in step 2 we can confirm that the spfile data is at the expected physical location.

$ dd if=/dev/oracleasm/disks/ITSTOR11_10_EXT bs=1M count=1 skip=176|strings|head -4

test12.__db_cache_size=1476395008

test11.__db_cache_size=1476395008

test12.__java_pool_size=16777216

test11.__java_pool_size=16777216

 

We can see the first 4 lines of the spfile are printed out, as expected.

 

Example 2 – 100MB datafile Extent Allocation

This example demonstrates how to query the datafile allocation map on ASM for a disk groups with normal redundancy (2-way mirroring). A tablespace with 1 datafile of size 100MB has been created in an ASM disk group, composed of 4 disks arranged in two fail groups: disk 20 and 24 (fail group FG1), disk 7 and 8 make (fail group FG2). The space allocation mapping (mapping between ASM file extents and allocation units on disk) for such a file can be queried like this:

 

select DISK_KFFXP,AU_KFFXP,PXN_KFFXP,XNUM_KFFXP,LXN_KFFXP from x$kffxp where GROUP_KFFXP=1 and NUMBER_KFFXP=271;

 

DISK_KFFXP   AU_KFFXP  PXN_KFFXP XNUM_KFFXP  LXN_KFFXP

---------- ---------- ---------- ---------- ----------

        20      13682          0          0          0

         7      14358          1          0          1

        24      13699          2          1          0

         8      14364          3          1          1

         8      14366          4          2          0

        24      13678          5          2          1

. . .

. . .

         8      14490        198         99          0

        24      13750        199         99          1

        20      13791        200        100          0

         7      14493        201        100          1

         7      14384          0 2147483648          0

        24      13714          1 2147483648          1

     65534 4294967294          2 2147483648          2

 

205 rows selected.

 

We can see that 205 allocation units of 1MB each are listed for the tablespace created with a datafile of 100MB nominal size. The actual datafile size is 101MB (the extra megabyte is allocated by Oracle for tablespace internal structures independently of the use of ASM). The disk group used for this test has ‘normal redundancy’, i.e. 2-way mirroring, this accounts for 101x2=202 allocation units. The remaining 3 allocation units are listed in the last 3 rows in the x$kffxp table and are identifiable by aving XNUM_KFFXP=2147483648. These are most likely file headers and/or metadata (this conclusion has still to be further investigated).

One of the goals of ASM is providing uniform allocation of datafiles across the available spindles. This is a key feature of ASM and has the scope of maximizing performance (throughput, IOPS), lower latency times and increase HA. The following query investigates how the extents are allocated across the four available disks in the test disk group used for this example.

 

select DISK_KFFXP, count(*)

    from x$kffxp

    where GROUP_KFFXP=1

        and NUMBER_KFFXP=271

        and XNUM_KFFXP!=2147483648

    group by DISK_KFFXP ;

 

DISK_KFFXP   COUNT(*)

---------- ----------

         7         51

         8         50

        20         51

        24         50

 

We can see that space is allocated in a uniform way between the different disks (and fail groups). With the following query we can see that the primary and mirror extents (identified by the value of LXN_KFFXP) are mixed uniformly for each disk.

Note: Oracle needs only has to read the primary extent of a mirrored pair of ASM-allocated extents for read operations (while both extents need to be accessed for a write operation). Therefore a uniform primary/mirror extent allocation provides for better performance of read operations.

 

sys@+ASM1>

    select DISK_KFFXP, LXN_KFFXP, count(*)

    from x$kffxp

    where GROUP_KFFXP=1

          and NUMBER_KFFXP=271

        and XNUM_KFFXP!=2147483648

    group by

           DISK_KFFXP, LXN_KFFXP

    order by

          DISK_KFFXP, LXN_KFFXP;

 

DISK_KFFXP  LXN_KFFXP   COUNT(*)

---------- ---------- ----------

         7          0         25

         7          1         26

         8          0         25

         8          1         25

        20          0         26

        20          1         25

        24          0         25

        24          1         25

 

Example 3 – diskgroup rebalancing

This example illustrates the outcome of an ‘online’ disk group enlargement operation: four additional disks have been added to the disk group used in the example 2 above (disks 2, 3, 0, 1 will be added) and the disk group has then be rebalanced (see SQL below). After this operation the ASM space allocation map for the test datafile has been queried again. The outcome, shown below, is that the file is again uniformly spread over all the disks available in the disk group, as expected.

 

alter diskgroup test1_datadg1

add

failgroup fg1 disk 'ORCL:ITSTOR08_1_EXT','ORCL:ITSTOR08_3_EXT'

failgroup fg2 disk 'ORCL:ITSTOR11_3_EXT','ORCL:ITSTOR11_4_EXT';

alter diskgroup test1_datadg1 rebalance power 4;

 

select a.disk_number, a.failgroup, count(*) N#_AU

from x$kffxp x join v$asm_disk a

     on x.disk_kffxp=a.disk_number and x.GROUP_KFFXP=a.group_number

where GROUP_KFFXP=1

and NUMBER_KFFXP=271

and XNUM_KFFXP!=2147483648

group by a.failgroup, a.disk_number

order by a.failgroup, a.disk_number;

 

DISK_NUMBER FAILGROUP       N#_AU

----------- ---------- ----------

          2 FG1                25

          3 FG1                25

         20 FG1                26

         24 FG1                25

          0 FG2                25

          1 FG2                26

          7 FG2                25

          8 FG2                25

 

We can see that the datafile, after the online redefinition of the disk group, is spread uniformly across the disks and correctly mirrored across the two fail groups, as expected.

 

Summary

Oracle ASM is a powerful and easy to use volume manager and filesystem for Oracle 10g and 10g RAC. Configuration details and performance metrics of the configured ASM disks and disk groups are exposed via V$ASM_* views. However, the space allocation mapping (ASM file extents to disk allocation unit mapping) is not fully documented. This paper details how queries on the X$KFFXP internal table can be used to work around this limitation. A set of working examples has been discussed to demonstrate the findings and to directly explore some inner workings of ASM. As expected, it was found that datafiles are automatically spread over the available disks in a disk group, that mirroring is taken care by ASM and it is done at the extent level (as opposed to volume-level mirroring found in many other volume managers), and that ‘online’ disk additions to a disk group allow to spread datafiles uniformly over a large number of spindles in a ‘transparent’ way and can be used to improve performance and possibly reduce the impact of ‘hot spots’.

A few open points remain to be investigated, such as: the role of the ‘extra 3 allocation unit’ allocated for each datafile, that were documented in example N.2 (see rows where XNUM_KFFXP=2147483648). Rebalancing operations have been demonstrated to (re)distribute datafile extents uniformly over the available disks. ASM rebalancing algorithm apparently does not utilize workload metrics (from v$asm_disk_stat) to spread datafiles (such as spread apart hot parts of the datafiles) but seems to use a simpler ‘round robin’ algorithm. A few additional X$KF* tables have been identified (see above), but their purposes have not yet been documented.

 

References

 

1. N. Vengurlekar, 2005, http://www.oracle.com/technology/products/database/asm/pdf/asm_10gr2_bptwp_sept05.pdf

2. A. Shakian, OOW 2005, take the guesswork out of db tuning, http://www.oracle.com/pls/wocprod/docs/page/ocom/technology/products/database/asm/pdf/take%20the%20guesswork%20out%20of%20db%20tuning%2001-06.pdf

3. J Loaiza and S Lee, OTN 2005, http://www.oracle.com/technology/deploy/availability/pdf/1262_Loaiza_WP.pdf

4. L. Canali, scalable Oracle 10g architecture, https://twiki.cern.ch/twiki/pub/PSSGroup/HAandPerf/Architecture_description.pdf

5. Oracle 10g administrator’s guide, “Using Automatic Storage Management”

6. Metalink, Bug N. 4306135

7. J Loaiza, OTN 1999, Optimal storage configuration made easy, http://www.oracle.com/technology/deploy/availability/pdf/oow2000_same.pdf

8. S. Rognes, 2004, posting to oracle-l mailing list

 

Addendum

a. http://www.freelists.org/archives/oracle-l/05-2006/msg00395.html

b. https://twiki.cern.ch/twiki/bin/view/PSSGroup/HAandPerf

c. http://oracloid.blogspot.com/2006/05/vldb-with-asm.html