In many scientific fields, such as biology or environmental sciences, the rapid evolution of scientific instruments as well as the intensive use of computer simulation have led, in the last few years, to an important production of data. Scientific applications are confronted with new problems mainly related to the storage and exploitation of these data. The teaching will allow us to discover the major problems raised by the emergence of these data flows (storage, interrogation, analysis and visualization), and to approach certain technological solutions currently proposed. The ethical and legal issues raised by the collection and exploitation of these data will also be examined.
Big Data, Data scientist, NoSQL, Hadoop, Big Data analytics, Open Data, Linked open data
- Big Data: an introduction to the issues, perspectives and applications
- The problem of large databases (NoSql, NewSql)
- Big Data and business model: the case of intermediation
- Open Data: open public data
- Big-Data Analytics: the basics of analyzing large volumes of data
- Data representation and visualization
- Three BEs on visualization, on Apache/Hadoop and on the web of data (SparQL).
- Upon completion of this MOS, students will be able to: - To know the issues, opportunities and ethical problems raised by big-data. - Create simple Hadoop/Map-Reduce programs to exploit distributed data. - Manipulate NoSql databases using a modern DBMS (e.g. Mongo-Db).
Grade = 50% knowledge + 50% know-how Knowledge grade = 100% final exam Know-how mark = 50% bibliographic synthesis + 50% report on BE