Computing challenges of Big-Data

Lecturer(s): Stéphane DERRODE
Course ⋅ 16 hStudy ⋅ 12 h

Objectives

In many scientific fields, such as biology or environmental sciences, the rapid evolution of scientific instruments as well as the intensive use of computer simulation have led, in the last few years, to an important production of data. Scientific applications are confronted with new problems mainly related to the storage and exploitation of these data. The teaching will allow us to discover the major problems raised by the emergence of these data flows (storage, interrogation, analysis and visualization), and to approach certain technological solutions currently proposed. The ethical and legal issues raised by the collection and exploitation of these data will also be examined.

Palabras clave

Big Data, Data scientist, NoSQL, Hadoop, Big Data analytics, Open Data, Linked open data

Programme

  • Big Data: an introduction to the issues, perspectives and applications
  • The problem of large databases (NoSql, NewSql)
  • Big Data and business model: the case of intermediation
  • Open Data: open public data
  • Big-Data Analytics: the basics of analyzing large volumes of data
  • Data representation and visualization
  • Three BEs on visualization, on Apache/Hadoop and on the web of data (SparQL).

Learning Outcomes

  • Upon completion of this MOS, students will be able to: - To know the issues, opportunities and ethical problems raised by big-data. - Create simple Hadoop/Map-Reduce programs to exploit distributed data. - Manipulate NoSql databases using a modern DBMS (e.g. Mongo-Db).

Assesment

Grade = 50% knowledge + 50% know-how Knowledge grade = 100% final exam Know-how mark = 50% bibliographic synthesis + 50% report on BE