Course group - D-BDAT-S9
Information systems (IS) are becoming increasingly powerful and
reactive to support production, decision taking and collaborative
working and to increase customer or client satisfaction within
companies, institutions and in society. The functions of an IS are
developed by processing clearly defined information, generally stored in
data bases within the IS. Nevertheless increasing volumes of data are
now being generated from various sources; sensors, software programmes,
activity records and other information considered to be “volatile” and
not of immediate use. This data is sometimes stored for a certain period
of time but is rarely exploited because of its volume, the diversity of
format, the speed of accumulation and the lack of compatibility with
existing data processing tools.
The term “Big Data” refers to such data, characterised by significant
volume, a variety of formats and a speed of generation (V3), to which
must also be added the potential value of its extraction (V4). This
value is related to the knowledge extracted. Big Data companies also
observe that this question is now being dealt with by professional
engineers, hence the justification for allowing the largest number of
students, future generalist engineers, to be trained in the basics of
Big Data processing and current technologies. It is becoming imperative
to be able to train our students in these new methods and techniques to
face up to a challenge which will become increasingly present in the
years to come.
Large volumes of data have to be stored in a consistent manner; it is therefore necessary to be able to manipulate software and paradigms which are adapted. The algorithmic and mathematical processing methods are also adapted to the context of Big Data. There are four main axes of teaching:
The Units are :
The Big Data specialisation relies essentially on notions acquired in the Computer Science core curriculum course and the Probability Statistics unit of the Mathematics core curriculum course. The data to be processed during the teaching sessions does not require any specific knowledge. Certain notions explained during the Data Mining unit such as notions of classification and clustering, have been or will be presented in the Data Science major, but from a more statistical angle. In addition the Big Data framework imposes a generalisation of these notions and hence of other techniques which implement these concepts, since on the one hand the volume of data is very significant, and on the other it is heterogeneous (including text, graphs, dynamic data), factors which go beyond the context of the numerical data dealt with in the Data Science major. It is possible that students having followed the Computer Science major or the targeted courses on software programming and commuting will be more at ease than others during the practical courses, but not in any significant way. The idea of putting together this specialisation with little reference to existing majors is incited by the need to respond to the issue of Big Data management for various fields in industry and research.