The EDISON project


Notes by Maria Dimou - CERN representative

The Horizon 2020 EU-funded EDISON project aims at defining the new Data Scientist profession. For universities to define their courses. For companies to define the required competencies. By: defining the Data Scientist Profile, providing a web-based tool to (self-)evaluate one's compliance with the profile, providing a basis for a Data Science professional certification. It is a 2-year project (started September 2015).


What is a Data Scientist?

A practictioner who has sufficient knowledge to dig through the life cycle of Big Data till the delivery of scientific and business results of value for science and industry. He/she has a focused interest in data digging, while looking for something usable. Communication skills are very important in this profession, also the innovation potential, the cost reduction etc. Data scientists span from Data Analyst to system administrator and librarian. Check here existing NIST (ACM IEEE) work on the subject and more. The table of relevant current and future professions is so large that we should better talk about Data ScienceS and not Data Science (as we say Natural Sciences) A survey will be out soon to poll various specialists' communities for input on the Data Science professional. The table of specialisations will be debated with ESCO (European Skills, Competences, Qualifications and Occupations) framework and platform

Why is this new profession important?

Because data analysis, statistics and data mining will "discover hidden and obscure relationships between processes and events, which will lead to new discoveries and innovation".


EDISON Champions Conference - Warsaw 2017/06/19-20

Highlights relevant to CERN

  1. Event agenda.
  2. CERN's answers to the Data Scientist Survey
  3. Actions since the Krakow meeting in September 2016:
    1. A pilot test of the toolkit with dutch ministries to evaluate the computing "literacy" of their elected members.
    2. Collaboration with selected Academic establishments. Indeed, at the EPFL Research Day on June 8th, the IC Dean (Computing) James Larus announced a new Masters' programme on Data Science.
    3. Proposing to CERN to become another pilot tester of the toolkit.
    4. Reminder: The document repository HERE.

In more detail

EDISON meeting - Krakow 2016/09/27

Highlights relevant to CERN

Given CERN's experience with large data production, analysis and analytics, Maria Dimou suggests to:

  • Find reviewers in the lab of the 4 main project documents
  • Publicise the (imminent) survey of understanding of the data scientist's job
  • Create a Vidyo room for holding meetings remotely with the main project players.
  • Establish a collaboration with the Research Data Alliance (RDA) because of their global education mission.
  • Check the portal and read about its functionality in the the CERN Edison twiki.

In more detail

The meeting agenda is attached to this twiki. Maria's free-format notes from the discussion:

The goals of the 27/9/2016 meeting were to collect participants thoughts on:

  • the emerging components of the EDISON Data Science Framework (EDSF) - the four documents discussed
  • how to best maximise the exploitation of the EDSF particularly in the context of European Research Data Infrastructure.

The full Expert Liaison Groups (ELG) members' list is . CERN is part of the Employers group.

The documents discussed were:

Data Science Competence Framework

Data Science Body of Knowledge

Data Science Model Curriculum

Data Science Professional Profiles

These documents classify data specialists in Information and Communications Technologies (ICT) in 'families'. The job definition business reminds of the CERN benchmark jobs but the families are much greater in number as there are many sub-divisions.

  • One commercial partner of the project participated, Jasper de Vries from company Kadenza in .nl. Consultant of many companies dealing with data, e.g. insurance companies.
  • Donatella Castelli from the National research council in Italy.
  • Community Portal being developed in Rome for the post-EDISON era, as the project is normally finishing next year.
  • The Data Scientists (DS) are prepared via various channels, not only universities and company trainings but also self-education via MOOCs and other online courses.
  • There is an issue at this moment about "WHAT is a DS". Hence, the project aims at building the job profile and the career path. Also to help universities to build a COMMON portfolio for accreditation and certification.
  • Champion universities are Southampton, Perugia, Frankfurt, Luzern and Bradford. Next conference of champion universities is next February 2017 in Madrid. EDISON tries to increase the number of participating universities.
  • Part of the project's challenge is to formalise the definition of people who inevitably move from an individual farm model in terms of data use to a supermarket model, i.e. pool data together and use across disciplines.
  • Kathrin Beck - computational linguist (natural language recognition) in Max Planck Institute in Munich. Research Data Alliance (RDA) Global initiative representative. Total number of RDA members 4345 from 111 countries. Started 6 years ago. A lot of commercial companies are interested and government agencies wish to encourage participation because of the RDA dynamic explosion. EU-funded for Europe. USA and Australia also active. Organising training events, webinars, f2f training courses, all free of charge. People speak about what they do in their work and how this can be useful to others. They also have an Atlas of Knowledge (AoK). They wish to share their educational material with others. Also, to UNIFY the definition of metadata, not so much for searching purposes but also for archiving. It will mean standardisation of nomenclature for volumes. SEEKING collaboration with other training providers.
  • Prof. Dimitar Trajanov - Saints Cyril and Methodius Skopje university. Data science is not a separate faculty. Machine learning and some web-based course material is the closest they have. They also have some students' projects, indirectly merging data from different disciplines, e.g. food - drug interactions. Their research results were fed into the Global Open Drug Data platform (GODD). They are now using the google BigQuery in Education for introductory classes in Data Science and Data Analytics. Data Visualisation and language processing for non-english speakers is a challenge. Very important to deal with "messy data", not clean datasets, as the commercial world is full of data to parse that are not uniform.

  • Presenters from the EDISON project team were Steve Brewer (univ. of Southampton), Malgorzata Krakovian (EGI Foundation) and Yuri Demchenko (univ. of Amsterdam), the main editor of the 4 documents. They explained the distinction between Certification (recognition of an individual) & Accreditation (recognition of a school). The EU, companies' HR depts and other standardisation bodies are approached by EDISON to get acceptance of the Framework Definition. It is very important to know how to deal with "messy data", not clean datasets, as the commercial world is full of data to parse that are not uniform.

  • Sustainability plan (Themis Athanasiadou- EGI Outreach): Chicken-&-Egg problem: To establishing the EDISON brand is a prerequisite for MOOC providers to register their courses with EDISON. EDISON does provide Intellectual, human, virtual and physical resources. In order for the project to gain money to maintain up-to-date the Body of Knowledge and Data Scientist's profile definition, some services should be provided. These services are the 'registration' of external courses with the EDISON quality warranty. Other service is the portal with virtual labs on competence benchmark for data scientists to play before a job interview. This portal is developed in Rome by the italian project partners. When one takes the competency test on the portal, if found lacking some area of expertise, the portal offers the right online course and prompts the user to enroll. Parallel to EDISON activities: IBM Workbench and many more known in the Netherlands.

  • Yuri Demchenko (EDISON papers' editor): After defining the Competence Framework we have to build an Online Educational Environment. It is worth the effort. The companies, like IBM, put 30% of their resources in education, they organise courses for which they charge 3K $ per person, so, the field is booming. There is a European Commission 'directive' for national initiatives in introducing digital education at all levels of schooling.

Various deadlines:

  • European Commission Programme funding activities that bring research results into the market (FTI) Fast Track to Innovation 25/10/2016,
  • Erasmus+ March 2017.
  • EDISON2 29.03.2017. Also envisaging crowd funding.


Yuri Demchenko said that Pascale Goy from CERN HR L&D gave a talk last March at EMBL in the UK about CERN planning to re-train the personnel. Slides attached to this twiki.

-- MariaDimou - 2016-09-29

