Polytechnique > Research > Directory > A professor or researcher

Directory of Expertises

Photo of Soumaya Yacout

Soumaya Yacout

Department of Mathematical and Industrial Engineering


Research interests

Algorithms and Tools for Big Data Analysis and Automated Real Time Optimal or Near Optimal Decision Making for Industrial Systems

Data science and data engineering have arguably become some of the most important research fields in this century. These fields are based on fundamental branches of science and engineering, namely, information technology, sensor technology, statistics, operations research, optimization, artificial intelligence, data mining and machine learning.
Along with human centric applications, some of these techniques are now being recommended by researchers in machine centric applications in which data is manufactured by machines and decisions are also made by machines based on ‘Machine to Machine (M2M)' learning.
Presently, an important research question is how to exploit the available Big Data sets since, by definition, they consist of large volumes of data, acquired at high velocity, and in a variety of forms. Traditional data-processing and analysis techniques become inadequate.
The objective of this project is to develop algorithms and tools that are designed specifically to analyze and to extract knowledge from Big Data that are obtained from engineering systems. The extracted knowledge should lead to an understanding of how various components of a complex system influence each other and interact with their environment, and how an accurate prediction of the degradation can be obtained in a parallel computing framework.
The proposed methodology is based on an approach called Logical Analysis of Data (LAD), which is a data mining, machine learning approach that is based on Boolean logical reasoning. It   extracts knowledge in the form of patterns that distinguish and characterize sets of data, and that identify some phenomena of interest. Different LAD' s algorithms that are used to extract patterns in supervised and unsupervised learning will be considered in parallel computing frameworks; namely, enumeration techniques, mixed integer linear programming, and metaheuristics algorithms, mainly genetic algorithms, and ant colonies. The two parallel frameworks that will be used are Hadoop MapReduce and Spark; both are available in an open source environment, thus they are available to the public.
We intend to present to the scientific community the scaled up algorithms in an open source environment. As such, every interested individual can use them, improve upon them and add to them. The impact of this research is the possibility of learning, finding, understanding physical complex phenomena that are not fully understood yet, and the exploitation of this knowledge in decision making. Depending on the specific applications in which these algorithms will be used, this knowledge can lead to an increase in safety and security, energy savings, protection of the environment, and increased efficiency in consuming natural resources.  It will also lead to intelligent systems that can make the right decision at the right moment. Eventually, this will lead to self-sustaining and sustainable systems.

Diagnosis and Prognosis of Industrial Process Performance and Condition Based Maintenance

Engineering systems, such as aircraft, transportation systems, manufacturing systems, and industrial processes such as mining, are becoming more complex and are subject to failure modes that are difficult to detect and to explain. This situation has a negative impact on the systems' reliability, availability, safety , serviceability, maintainability and productivity. On-line, real-time fault detection, analysis, diagnosis and prognosis tools can assist the operator in achieving his/her mission of making the systems functional and safe to use quickly and efficiently. Recent advancements in Condition - Based Maintenance (CBM) and Equipment Health Management (EHM) produced new and innovative methods for diagnosis and prognosis of systems condition. Yet, these methods still have limited aplicability due to their structural limitations, to their complexity, and to their lack of readiness to use. In most industries, the most used diagnostic systems are still based on human expertise. This expertise, although very valuable, still has limitations, in particular in situations where multiple failures modes or multiple inputs to the system are interacting in an unknown new correlated manner that has not been seen or documented before. Moreover, this expertise can be lost due to age or retirement or resignation, thus leading to the loss of knowledge. Even if provisions are taken for knowledge transmission, this process may be time consuming and exhausting. Equipment Health Management systems need improvements in order to overcome these limitations, and to increase their efficiency, accessibily, applicabitity, and explicability power.

This research aims at developing an integrated system for diagnosis and prognosis that is generic and applicable to a variety of engineering systems. The system is capable of processing data with different data analytics techniques, and via data mining , it extract hidden information in the form of patterns, find correlations between inputs and root causes that can explain different known and unknown performances, in an exploitation and exploration phases. The tool is based on artificial intelligence notions of machine learning and testing, and use a reinforcement learning approach to improve its learning capabilities with time and  via accumulated knowledge, and as the computational capabilities of computers are enhanced.

A new software called cbmLAD is developed at École Polytechnique and it is now available for use.


Decision making based on Knowledge Discovery and Data Exploitation for Industrial Processes

Condition monitoring is the process of monitoring the operating characteristics of a process so that changes and trends of the monitored characteristics can be used to predict the present and the future process ' performance, and in order to control process's output.

Recently, there have been considerable research efforts to develop condition monitoring technologies for industrial processes and systems. The use of these technologies has resulted in the acquisition of large amounts of data and has given life to new fields of research, namely data mining and knowledge discovery from databases, and data analytics. In the past, one of the main problems in industrial setting was  the lack of data needed to support the decision making process.  Nowadays, most companies use one or more condition monitoring technologies and possess considerable databases containing performance indicators for their processes.  Consequently, researchers are now interested in finding techniques to extract information and interpret it accurately.

The objective of the proposed research is to use a new approach called Logical Analysis of Data (LAD) in order to exploit the databases of industrial processes ,to assist in decision making, and  to improve process performance. LAD is  a data mining artificial intelligence approach that is based on pattern recognition. It is a combinatorics and optimization-based data analysis technique. Historical data containing performance indicators and process output measurements are exploited in order to generate patterns that characterize the process performance. Data fusion techniques based on the generated patterns are used in order to combine information which is coming from  different sensors. Pattern-based clustering techniques are developped. Multi-clasification problems are  solved, and reinforcement pattern- based learning techniques with intelligent agent are developed.  


Research unit(s)

NSERC subjects

  • 1606 Operations management


© École Polytechnique de Montréal
Find Peoble or units | Site plan | Key word search | Conditions(*) this link leads to a section in french.