I would like to start this blog by highlighting first the challenges and needs these sectors are experiencing, especially the convergence of HPC, BD & AI. It seems that large amounts of generated data often hide an immeasurable value that is at the basis of these days’ transformation of the HPC world. As such, cross stack applications (i.e., those mixing HPC, BD and AI tasks) require innovative workload management solutions, including innovations in the way workflows are orchestrated all over the various software and hardware layers of the execution infrastructure. More specifically, I think BD and AI tasks have different requirements than HPC tasks, in terms of type of resources used and how they are allocated for the computations. Current job batch schedulers inherit their historical orienting to CPU-centric workloads, thus completely ignoring emerging technologies available in modern HPC systems while the growing interest in hardware accelerators is given by the fact that these devices expose massive and more efficient parallelism for the applications.
Now that I have quickly covered pains and challenges, let’s see how to address them: the ACROSS (HPC BIG DATA ARTIFICIAL INTELLIGENCE CROSS STACK PLATFORM TOWARDS EXASCALE) project aims at building an exascale-ready, HPC and data-driven execution platform, supporting modern complex workflows mixing HPC, BD and AI high-level tasks, by leveraging on an innovative software environment running upon advanced heterogeneous infrastructural components (including GPUs, FPGAs and neuromorphic processors), as well as innovative smart resource allocation policies and job scheduling algorithms, up to the management of tasks inside jobs (pipelines, DAGs). As such, envisioned ACROSS platform will provide high performance, and maximize resource utilization and energy efficiency. Also, through this execution platform, ACROSS project will demonstrate value creation and innovation generation for the pilots in sectors of aeronautics, weather and climate forecasting (supporting for smart farming and innovative water management), and energy and carbon sequestration industrial sectors. Moreover, it will show the potential of the new EuroHPC supercomputers’ ecosystem.
I would like to mention that the ACROSS project has been started since March 1, 2021 for 3 years in total.
Another point I would like to make: Atos is delighted to be part of this project to help in providing technologies and testbed in both HPC solutions and value-added software such as Atos Codex AI Suite for data-driven business transformation. The participation of Atos (through its BULL SAS affiliate) to ACROSS project is an opportunity to leverage HW acceleration of computation (in particular for AI) and orchestration. These advanced technologies will be embedded in the ACROSS platform and adapted in order to support the execution of large-scale HPC simulations combined with Big Data and deep learning computation steps with seamless exploitation of hardware heterogeneity.
What I would like to say here is that on one hand, Atos’s contribution will help ACROSS project to address the challenges in hardware accelerators by dedicating large effort in integration and validation of acceleration technologies (including GPUs, FPGAs and other NN accelerators) within a coherent platform supporting pilots’ use cases. As such, ACROSS will research for the best solution in terms of programming models, performance and productivity that will be provided by the envisioned accelerated execution platform.
On the other hand, Atos is involved in addressing the current HPC orchestration problems, such as having to deal with distinct single workflow schedulers and having no centralized scheduler. This will be achieved by introducing the possibility to declaratively describe the target platform and all the software stacks that are needed to run increasingly complex applications exhibiting different needs in term of architectural support and software stack. Specifically, Atos will work on the envisioned three levels workflow management solution by extending orchestration components for managing HPC, Big Data
and ML workflows. The highest level receives the workflow description as a set of high-level tasks mixing HPC/BD/AI, the middle level schedules jobs mapping of high-level tasks, by selecting the most appropriate set of resources (also considering heterogeneous hardware accelerators) towards energy efficiency and maximizing performance and resource utilization. The lowest level is responsible to dispatch the workload inside the jobs (e.g., task DAGs) among the selected resources to maximize the performance.
Now it is time to conclude, ACROSS project will enable or drastically improve performance of a wide range of computationally intensive tasks, such as large-scale complex simulations and advanced AI-enabled analysis including classification and feature extraction on huge data sets, inspired by complex neural networks (ML/DL); all with the aim of generating innovation and value creation in key societal and industrial sectors in Europe.
Hope this was helpful and of value to you.
Claire Chen, R&D Cooperative Project Manager – Atos Big Data and Security – R&D