Incremental Deep Learning for Embedded Systems

Submitted by Elisa FROMONT on
Team
Date of the beginning of the PhD (if already known)
1/10/2024
Place
IRISA
Laboratory
IRISA - UMR 6074
Description of the subject

This PhD will occur in the context of the project Adapting from the PEPR AI. This project focuses on designing adaptive embedded hardware architectures for AI. In this context, our team wants to design new incremental machine learning algorithms that could serve as use cases in the Adapting project for other researchers who will focus on the hardware architecture design.

Continual learning [1], also known as lifelong learning or incremental learning, is a machine learning paradigm that focuses on the ability of a model to learn and adapt continuously over time as it encounters new data, tasks, or concepts, without forgetting or catastrophically overwriting what it has previously learned. In continual/incremental learning, the learned model should retain knowledge about previous tasks or data while incorporating new information. In this PhD, we will focus on designing new resource-efficient incremental learning algorithms that can run on embedded systems with their associated resource and privacy constraints. These constraints involve limited computational power, memory, and energy efficiency. They also involve real-time processing with low latency and often deterministic behavior. Updating embedded models is complex due to hardware limitations and the need for efficient updates while handling data locally to enhance privacy and security.

This PhD will focus on foundation models such as well-known LLM -Large Language Models- (e.g. GPT-3.5, Mixtral, Llama 3,…) or multimodal ones (involving for example ViT -Vision Transformer- models such as GPT-4o, Sora, Dall-E 3) and their ability to evolve continuously in an embedded environment.

Incremental learning consists in a multi-stage training in which the context (data domain, classes, or task) evolves between each training stage. This paradigm faces the stability-plasticity dilemma. It means that the model must continuously adapt to new contexts while avoiding catastrophic forgetting (performance on past contexts must not deteriorate). The naive approach is the standard fine-tuning strategy in which the parameters of the model (or part of it) are updated from one stage to another by training on the new context only. It is known to be efficient for adaptation but it is prone to catastrophic forgetting. Several works focused on replay strategies[2] to tackle this challenge. It consists in managing a buffer of examples from past contexts, which are preserved through the different training stages. Alternatives [3][4][5][6][7] have been proposed to avoid storing old data, for instance by learning to preserve the latent representation on the new data through the training stage [3:1]. However, current incremental foundation models do not include important desiderata for embedded models (e.g. lack of privacy, restricted hardware resources and low energy consumption, robustness, no need for an oracle or neurogenesis). Our current project will explore the models and methods having those important requirements for edge AI. Another line of research is the parameter-oriented approaches: the goal is to associate part of the model’s parameters with a specific context in order to maximize the performance on both past and current contexts. This is mainly done by mapping parameters to contexts [4:1], parameter pruning [5:1] or parameter addition [6:1]. Among the parameter-oriented techniques, a recent trend focused on prompt tuning, used for transformer architectures [7:1]. It consists in learning input tokens which are prepended to the input to condition the behavior of the model whose weights are frozen. While parameter pruning and mapping lead to saturation after many training stages, the parameter addiction approach lead to a growing number of parameters.

A complementary approach is the knowledge-editing strategy [8] which aims at correcting specific knowledge by locating and updating the responsible neurons.

Objectives

The objective of this PhD is to investigate incremental learning of foundation models on embedded systems.
The graduate student will initially address the question: “Are foundation models prone to catastrophic forgetting with standard fine-tuning techniques?” by conducting comprehensive studies and experiments.
Subsequently, she/he will explore how state-of-the-art (SOTA) methods can be applied to these models, assessing the extent to which these existing techniques provide effective solutions to the problem of incremental learning. This phase will involve adapting and optimizing SOTA methods to suit the constraints and requirements of embedded systems (efficiency, privacy, low energy consumption, robustness, etc.).
Finally, the PhD student will develop and propose new methodologies specifically designed to enable foundation models to learn incrementally when deployed on embedded systems, ensuring that these models maintain performance and adaptability over time.

Application process

The position is funded for 3 years (this is the standard duration of a PhD in France). The net salary is around

euros. The PhD student will be based in Rennes and will make a few stays in Grenoble during the 3-years contract.
Applications will be processed on a first-come, first-served basis until June 15, 2024.

What we are looking for

The ideal candidate will possess the following skills and attitudes:

  • A master degree in computer science, AI, or a closely related discipline.
  • A strong background in computer science and mathematics.
  • A scientific attitude and the ability to reason through problems.
  • Excellent programming skills.
  • The ability to communicate written and orally in English in a clear and precise manner.
  • A pro-active and independent attitude as well as the ability to function well in a team environment.
  • A good motivation for pursuing a Ph.D. and working in Rennes or Grenoble.

Application Material and Procedure

Applicants should send these documents to the entire supervision team (4 email-addresses above)

  • An academic CV.
  • An official transcript listing classes taken and grades received.
  • A short statement (maximum of 2 pages) that explains what your research interests are and why you would like to work with us.
  • A copy of your most recent English report (e.g. your master report).
  • Any published and relevant research papers (in English, preferably PDF format). If you have several papers, please indicate which ones you consider the three most important ones.
  • A list of references (e.g., supervisors).
Bibliography

Bibliography


  1. Matthias De Lange, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Ales Leonardis, Gregory Slabaugh, Tinne Tuytelaars, “A continual learning survey: Defying forgetting in classification tasks”, IEEE Trans. Pattern Anal. Mach. Intell (TPAMI), 44:7, 2022. ↩︎

  2. Li et al., “Learning without Forgetting”, European Conference on Computer Vision (ECCV) 2016
    https://arxiv.org/pdf/1606.09282.pdf ↩︎

  3. Chaudhry et al., “Continual learning with tiny episodic memories”, preprint 2019
    https://arxiv.org/pdf/1902.10486 ↩︎ ↩︎

  4. Serra et al., “Overcoming catastrophic forgetting with hard attention to the task”, International Conference on Machine Learning (ICML), 2018
    https://arxiv.org/pdf/1801.01423.pdf ↩︎ ↩︎

  5. Mallya et al., “PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning”, Conference on Computer Vision and Pattern Recognition (CPPR), 2018
    https://openaccess.thecvf.com/content_cvpr_2018/papers/Mallya_PackNet_Adding_Multiple_CVPR_2018_paper.pdf ↩︎ ↩︎

  6. Hu et al., “LoRA: Low-Rank Adaptation of Large Language Models”, International Conference on Learning Representations (ICLR), 2022
    https://arxiv.org/pdf/2106.09685.pdf ↩︎ ↩︎

  7. Lester et al., “The Power of Scale for Parameter-Efficient Prompt Tuning”, Conference on Empirical Methods in Natural language Processing (EMNLP), 2021
    https://arxiv.org/pdf/2104.08691.pdf ↩︎ ↩︎

  8. Meng et al., “Mass-editing Memory in a Transformer”, International Conference on Learning representations (ICLR), 2023 ↩︎

Researchers

Lastname, Firstname
Fromont Elisa
Type of supervision
Director
Laboratory
IRISA
Team

Lastname, Firstname
Coquenet Denis
Type of supervision
Supervisor (optional)
Laboratory
IRISA
Team

Lastname, Firstname
MERMILLOD Martial
Type of supervision
Director
Laboratory
LPNC

Lastname, Firstname
REYBOZ Marina
Type of supervision
Supervisor (optional)
Laboratory
CEA
Contact·s
Nom
Fromont Elisa
Email
elisa.fromont@irisa.fr
Nom
Coquenet Denis
Email
denis.coquenet@irisa.fr
Nom
MERMILLOD Martial
Email
martial.mermillod@univ-grenoble-alpes.fr
Nom
REYBOZ Marina
Email
marina.reyboz@cea.fr
Keywords
AI, Deep Learning, Embedded Systems, LLM, Incremental Learning, Continual Learning