ARLA: Using Reinforcement Learning to Strengthen DNNs

IEEE Computer Society Team
Published 04/01/2025
Share this on:

Deep neural networks will be crucial to future human–machine teams aiming to modernize safety-critical systems. Yet DNNs have at least two key problems:

  • Researchers have proposed many defense schemes to counter many attack vectors, yet none have yet secured DNNs from adversarial examples (AEs).
  • This DNN vulnerability to AEs renders their role in safety-critical systems problematic.

Enter the Adversarial Reinforcement Learning Agent (ARLA), a novel AE attack based on reinforcement learning that was designed to discover DNN vulnerabilities and generate AEs to exploit them.

ARLA is described in detail in Matthew Akers and Armon Barton’s Computer magazine article, “Forming Adversarial Example Attacks Against Deep Neural Networks With Reinforcement Learning.” Here, we offer a glimpse at ARLA’s approach and its capabilities.

The Reinforcement Learning Approach


ARLA is the first adversarial attack based on reinforcement learning (RL); in RL, an agent

  • Uses its sensors to observe an unknown environment
  • Makes decisions and receives feedback (rewards or penalties) on those decisions via changes in its sensory perceptions
  • Uses trial and error to learn actions that maximize its expected rewards

The authors offer a simple example of this in a Pac-Man RL agent acting in a 2D grid:

  • The agent senses its lo­cation and its distance from pellets and from ghosts.
  • To maximize its expected reward, it learns to avoid ghosts while eating the max­imum number of pellets in the shortest amount of time.

The agent learns which state–action pairs generate the most rewards but because the agent’s knowledge is always partial, RL entails an exploration/exploitation tradeoff:

  • During exploration, the agent randomly chooses actions to broaden its knowledge of the environment.
  • During exploitation, the agent uses existing knowledge to estimate actions with the highest reward.

To ensure that the agent continues to explore rather than simply greedily exploit known knowledge, it is given a policy that determines the amount of time it can engage in each of the two activities. This time is tuned to achieve the best-possible test performances.

The ARLA Attack


ARLA uses double Q-learning with a duel­ing deep Q-network agent archi­tecture. At a high level, ARLA

  • Uses a benign sample image as a learning environment to generate AEs
  • Seeks to find the adversary with the shortest Euclidean distance between it and the original sample

In experiments, the authors report that ARLA significantly degraded the accuracy of five CIFAR-10 DNNs—four of which used a state-of-the-art defense. They also compared ARLA to other state-of-the-art attacks and found evidence that ARLA is adaptive, making it a useful tool for testing the reliability of DNNs before they are deployed.

Dig Deeper


DNNs used in image recognition are especially susceptible to perturbed or noisy data. As the authors point out, an RL approach to adversarial testing such as ARLA could be used to develop robust testing protocols to identify these and other DNN vulnerabilities to adversarial at­tacks.

To read details about the innovative ARLA approach, its results, and future research areas, read Akers and Barton’s “Forming Adversarial Example Attacks Against Deep Neural Networks With Reinforcement Learning.”