Rucio in Hadoop

Introduction

What's the initial reason this data has been imported? The main developers are Thomas Beermann and Mario Lassnig.

The Data

The data is stored in the analytix cluster in the directory:

/user/rucio01/

Apache Server / Rucio Daemon logs

Stores log files for simple simple cat / grep analysis
  • read directly from log file and continuously streamed via Flume to HDFS
  • simple text log files
  • ~23GB per day

Traces

Contain updates of last access time of files/datasets, will be used for the popularity reports
  • update of last access time of files/datasets
  • send to ActiveMQ broker and continuously streamed via Flume to HDFS
  • text file with one JSON encoded dictionary per trace
  • ~5GB per day - 6M entries

Oracle Dumps

Contain:
  • daily reports for operations / site admins for consistency checks
  • file replicas / unique files per storage endpoint
  • primary / custodial dataset replicas
  • number of replicas per dataset / last access times
Import and sizes:
  • daily Sqoop dumps of most important tables to HDFS
  • bz2 compressed, tab-separated text files, ~16GB compressed size
    • DIDs: 550.000.00 entries
    • Rules: 7.500.000 entries
    • Replicas: 690.000.000 entries
    • Dataset Locks: 8.000.000 entries
    • RSEs: 700 entries

Table with column descriptions.

Column name type Description
A1 B2 C2
A3 B3 C3

Analysis Code

Where is it?

Table with all the scripts and short descriptions on what they do.

name type Description
FilePopularity.pig pig Calculates how many times a file has been accessed during...
A3 B3 C3

Table with all the UDFs and their descriptions.

UDF return type input parameters Description
A1 B2 C2
A3 B3 C3

Additional information

Talks:


Major updates:
-- IlijaVukotic - 2014-11-19

Responsible: IlijaVukotic
Last reviewed by: Never reviewed

Edit | Attach | Watch | Print version | History: r8 | r6 < r5 < r4 < r3 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r4 - 2014-12-10 - IlijaVukotic
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Atlas All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback