Reliability Enhancement of Post-Von Neumann Hardware Accelerators

Submitted by Fernando FERNA… on Tue 28/01/2025 - 14:04

Team

TARAN

Website of the team

https://team.inria.fr/taran/

Date of the beginning of the PhD (if already known)

As soon as possible

Place

Inria Rennes

Laboratory

IRISA - UMR 6074

Description of the subject

Artificial Intelligence (AI) is increasingly indispensable across various society sectors due to its potential to transform conventional applications, from smart homes to safety-critical systems like autonomous driving and space exploration. Deep neural networks (DNNs) are state-of-the-art AI methods that outperform other approaches in language processing, image and video classification, audio and radar processing, and instance segmentation [1–3]. Notably, DNNs such as OpenAI GPT-4, Meta LLaMA2, and Mistral Mixture of Experts have captivated public interest with their high accuracy.

Due to their resource-intensive nature, DNNs require powerful dedicated hardware accelerators, such as GPUs and TPUs. However, large hardware accelerators are unsuitable for embedded safety-critical systems due to their high energy consumption. New unconventional accelerator architectures like the ones based on PIM [4] and neuromorphic computing [5] have been proposed for complex DNN deployment in critical applications where power and performance are critical requirements, offering energy-efficient alternatives to traditional GPUs and TPUs.

When PIM and neuromorphic accelerators are used in safety- and mission-critical applications, such as avionics and space exploration, it is essential to characterize their reliability to implement effective fault tolerance methods. This reliability characterization can be conducted by either fault simulation or physical fault injection. Fault simulation allows for the identification of specific fault sites in both hardware and software, however, it needs realistic fault models to prevent misleading conclusions. On the other hand, physical fault injection, such as exposing the system to radiation environments, provides realistic estimates of error rates but limits the ability to track fault propagation since observations are limited to the application's output. In this Ph.D. project, we will identify hardware and software vulnerabilities in PIM and neuromorphic accelerators for DNNs using radiation experiments and fault simulation. Additionally, we will propose fault mitigation techniques.

The Ph.D. student will characterize the radiation-induced impact on system reliability for different DNN model architectures and how the acceleration that PIM enables impacts the final error rate. The results will be combined with software simulation data for a detailed fault propagation analysis, aiming at deploying effective hardening solutions tailored for PIM executing DNNs. The Ph.D. student will participate in international experiments and internships at laboratories like Rutherford Appleton Laboratory in the UK and Los Alamos National Laboratory in the USA. The student will participate in conferences, and international projects and have their research published in prestigious scientific venues. This will help them develop their research skills and network with professionals in their field.

About the city and the university: Rennes is a vibrant and student-friendly city in northwestern France. The city has a thriving student culture, with plenty of bars, restaurants, cultural events, and an affordable cost of living. Additionally, Rennes is evaluated as one of the best cities to live in Europe [6].

Rennes is home to the University of Rennes, one of the largest universities in France. The University of Rennes has a strong focus on innovation and technology. It is home to many world-renowned research institutes, including INSA, IRISA, and INRIA Rennes. These institutes offer a wide range of Ph.D. programs in computer science, covering various topics such as artificial intelligence, machine learning, data science, and hardware and software engineering. Ph.D. students in Rennes benefit from close relationships with faculty and access to state-of-the-art facilities. The students also have the opportunity to collaborate with leading researchers worldwide.

Team’s Linkedin page: https://fr.linkedin.com/company/taran-team

Bibliography

[1] Xiaohua Zhai et al., Scaling Vision Transformers, IEEE/CVF CVPR, 2022

[2] Chong Chen, et al., Compound fault diagnosis for industrial robots based on dual-transformer networks, Journal of Manufacturing Systems, 2023

[3] Yuxin Fang, et al., EVA-02: A Visual Representation for Neon Genesis, CVPR 2023

[4] Laguna A. F. et al., In-Memory Computing based Accelerator for Transformer Networks for Long Sequences, IEEE DATE 2021

[5] Yao M. et al., Spike-driven Transformer V2: Meta Spiking Neural Network Architecture Inspiring the Design of Next-generation Neuromorphic Chips, ICLR 2024

[6] European Commission, Quality of life in European cities, 2024, https://ec.europa.eu/regional_policy/information-sources/maps/quality-of-life_en

Researchers

Kritikakou, Angeliki

Type of supervision

Director

Laboratory

IRISA

Department

D3 - Architecture

Team

TARAN

Fernandes dos Santos, Fernando