Dejan Ničković is a Senior Scientist at the Center for Digital Safety and Security of the Austrian Institute of Technology (AIT).
His research interests include cyber-physical systems, runtime verification, testing, contract-based design, and real-time systems.
Robustness verification of Deep Neural Networks
Deep Neural Networks (DNNs) have become the dominant solution for perception, prediction, and decision-making tasks across a wide range of domains. As these systems are increasingly deployed in safety-critical applications, robustness has emerged as one of their most important properties, reflecting the ability of a model to maintain correct and predictable behavior under perturbations, uncertainties, and distributional variations. In this presentation, we will formally introduce the notion of robustness and discuss the challenges associated with assessing and guaranteeing robustness in DNNs. We will provide an overview of the main verification and analysis techniques that have been developed to reason about robustness, highlighting their underlying principles, strengths, and limitations. More specifically, we will examine approaches for robustness testing, verification of local robustness, and verification of global robustness, together with their representative application scenarios. Finally, we will look beyond traditional neural network architectures and discuss emerging challenges in reasoning about the robustness of modern DNN-based systems, such as large-scale foundation models.
Sebastian Elbaum is the Lowell Professor in the Department of Computer Science at the University of Virginia where he co-leads the Lab for Engineering Safe Software (LESS Lab). He aims to build dependable autonomous systems. He is the recipient of a National Science Foundation Career Award, an IBM Innovation Award, a Google Faculty Research Award, an Amazon Scholar recognition, an FSE Test of Time Award, multiple ACM SigSoft Distinguished and best paper awards. He regularly serves in program committees at the top software engineering and robotic conferences, and has served as Program Co-Chair for ISSTA, ESEM, and ICSE, and as Steering Committee Chair for ICSE. He is an Adjunct Senior Fellow for Emerging Computing Technologies at the Council on Foreign Relations connecting autonomous systems and AI with policy in national security. He is a council member for the CRA Computer Community Consortium. He is an ACM Fellow and an IEEE Fellow.
Sensors, Rules, Models: Rethinking V&V for Autonomous Vehicles
The promise of autonomous vehicles (AVs) hinges on a single, uncompromising prerequisite: rigorous driving assurances. Yet, specifying, testing, and verifying these safety architectures remains an incredible challenge. This difficulty is driven by a trio of distinct hurdles: first, the vast, unpredictable environment that must be mapped against intricate driving rules; second, the acute semantic gap dividing high-dimensional raw sensor data from high-level symbolic safety concepts; and third, the relentless pace of model evolution, where spiraling complexity causes late-stage development costs to escalate prohibitively. Reflecting on our work within this high-stakes landscape, I will walk you through a family of end-to-end analyses that overcome these hurdles by breaking the boundaries of traditional validation and verification. I will show you the core insights behind these techniques, lay out the empirical evidence of their efficacy and their impact in practice, and put hard questions and real-world problems to the room, challenging us to collectively map out the remaining frontiers in autonomous systems safety.
Son Tong is an R&D Manager at Siemens, working on ADAS, Digital Twin, and AI. His work focuses on algorithm development, testing, and validation in the automotive industry.
He currently leads an R&D team of research engineers and industrial Ph.D. students working on control, AI, generative AI, simulation, and autonomous driving.
Engineering Workflow, Reimagined with Agentic AI
Modern engineering relies on a wide range of tools for requirements management, design, simulation, optimization, validation, data analytics, and AI. This talk explores a future vision in which these tools collaborate as autonomous AI agents within an Agentic AI framework, helping engineers automate workflows, accelerate development, and improve performance while maintaining engineer-centric supervision. The session concludes with an introduction to the European AI-BOOST Generative AI Challenge on ADAS scenario generation
Fabrizio Pastore is an Associate Professor / Chief Scientist II at the SnT Centre for Security, Reliability and Trust of the University of Luxembourg.
His research interests concern Software Engineering and, more specifically, Automated Software Quality Assurance. He works with embedded and cyber-physical systems, AI-enabled components, and mobile devices. His research contributions concern DNN testing and explanation, requirements-driven testing, model-based testing, metamorphic testing, security testing, anomaly detection, and fault localization.
Supporting Automated Testing and Safety Analysis of Deep Neural Networks for Autonomous Systems
Deep neural networks are a building block of perception layers for safety-critical systems; therefore, engineers need solutions to cost-effectively identify and characterize the portions of the input space where a DNN may underperform to improve systems’ robustness.
This talk provides an overview of recent research approaches combining simulators, search algorithms, and diffusion models, which can cost-effectively determine worst-case scenarios if driven by appropriate fitness functions. Further, the talk will provide an overview of failure explanation approaches that enable the detection of the situations in which a DNN may fail and, consequently, enable the identification of countermeasures.
Ericsson AB, Sweden; Carleton University, Canada; Mälardalen University, Sweden
Bio
Dr. Sigrid Eldh leads research on Quality and Software Test at Ericsson AB in Stockholm and has worked at Ericsson since 1994.
She is also a Senior Lecturer at Mälardalen University and an Adjunct Professor at Carleton University in Ottawa, Canada.
Testing Agents and Agent Testing in Industry
The future of developing and testing any software (even AI) is AI enabled. Through carefully crafting of Agents, in an orchestration framework, you can now build agents. With the right frameworks and set up you can interactively produce a relative trustworthy test environment – the limit will be your own knowledge and ingenuity. In Industry more of the testing is now handed over to a series of agents aiding in different tasks, with the goal to provide quality systems. We will address several layers of testing and verification approaches throughout the software life cycle.
The Italian Institute of Artificial Intelligence for Industry, Italy
Bio
Dr. Nicola Franco is Research Director and directs the AI Security (AIS) Lab.
He is a former research scientist at the Fraunhofer Institute for Cognitive Systems in Munich, where he worked on adversarial machine learning and collaborated with industry partners and public agencies.
He holds a Ph.D. in Machine Learning, awarded cum laude, from the Technical University of Munich, and received the Best Paper Award in AI Safety at IJCAI 2023.
Hands-on AI Red Teaming: Attacking and Defending LLM Agents with HackAgent
AI agents are changing the way applications interact with users, data, and external tools. At the same time, they introduce new attack surfaces that cannot be fully addressed with traditional software security techniques. Prompt injection, malicious instructions, unsafe tool use, and multimodal manipulation can all influence how these systems behave. In this seminar, participants will learn the fundamentals of adversarial attacks against LLMs and multimodal LLMs, with a strong focus on practical experimentation. After a brief overview of the main concepts and threat models, participants will engage in a hands-on session using HackAgent to explore realistic attack scenarios in a controlled environment.
Wissam Mallouli is a cybersecurity expert, researcher, and technology leader currently serving as Chief Technology Officer (CTO) at Montimage, a Paris-based company specializing in network monitoring and security solutions.
He holds a Ph.D. in Computer Science (cybersecurity) from Télécom & Management SudParis, France, obtained in 2008, and an engineering degree in telecommunications. Mallouli has extensive experience in network security, software testing, and AI-driven cybersecurity.
His work focuses on areas such as intrusion detection, automated security testing, risk management, and resilient systems for 5G, IoT, and cloud environments. He is actively involved in numerous European research and innovation projects, contributing to the design of AI-assisted and trustworthy cybersecurity solutions, particularly for SMEs.
In addition to his industrial role, he contributes to academia through publications, teaching, conference organization, and training activities, helping bridge the gap between research and real-world cybersecurity applications.
Smart Network Fuzzer for Advanced Security Testing
This talk introduces a smart network fuzzer designed to enhance security testing of complex and dynamic networked systems. The approach leverages intelligent traffic generation and adaptive fuzzing strategies to uncover vulnerabilities that traditional testing methods may miss.
By combining protocol awareness, behavioral analysis, and AI-driven techniques, the solution enables efficient exploration of attack surfaces in modern environments such as cloud-native infrastructures and software-defined networks. The presentation will include a demonstration showcasing how the fuzzer can detect anomalies, trigger unexpected behaviors, and support proactive cybersecurity validation.
Dr. Gül Çalikli is a Senior Lecturer (Associate Professor) in Software Engineering at the School of Computing Science, University of Glasgow.
Her research field is empirical software engineering, with a focus on human aspects, data analytics, and machine learning. Her vision is to enhance software practitioners’ decision-making and improve software quality through techniques based on cognitive psychology and human-in-the-loop ML systems.
Context Matters: Building and Evaluating Trustworthy AI for Software Engineering
Large language models are rapidly transforming software engineering, enabling developers to generate code, refactor software, and automatically produce unit tests. Yet two fundamental questions remain: what information should AI systems use to assist developers effectively, and how should we evaluate their capabilities on realistic software projects? This lecture will present recent research on context-aware AI for software engineering. First, it will explain how augmenting AI coding assistants with developers' gaze data enables prompts to adapt to developers' cognitive states, thereby improving code comprehension and readability. This will be followed by a discussion of context-aware LLM-based unit test generation, showing how repository structure, dependency information, and retrieved examples influence the quality of generated tests. Finally, I will present a realistic evaluation framework that benchmarks AI-generated tests on post-cutoff open-source repositories using repository-native execution environments and quality measures such as mutation testing, code coverage, and maintainability, revealing substantial differences between benchmark performance and real-world behaviour.
Ezio Bartocci is a Full Professor at TU Wien, where he leads the Trustworthy Cyber-Physical Systems (TrustCPS) Research Group within the Cyber-Physical Systems Research Unit.
His research focuses on formal methods and computational tools for ensuring the safety, security, energy efficiency, and correctness of AI-based cyber-physical systems, with a strong emphasis on sustainability.
Ken Friedl is an engineer and researcher at BMW Group, Germany.
His work focuses on Data Science, Conversational AI, and Robotic System Integration, with applications in intelligent and autonomous automotive systems.
Benchmarking Generative AI for the In-Car Domain: Methods, Tools, and Hands-On Evaluation (Part 1/3)
Generative AI in the in-car domain, particularly in conversational speech-based assistants, is redefining in-vehicle interactions through natural dialogue, contextual awareness, and proactive assistance within driving contexts.
This talk centers on AI-specific testing and benchmarking for voice-based systems. We will talk about BMW’s evolution from a hardware-driven manufacturer to an integrated software solutions provider. And how the advent of non-deterministic AI paved the way for flexible, scalabe testing solutions. As LLM-based voice assistants become embedded in modern cars, testing strategies must consider synthetic data generation, automated evaluation, and end-to-end benchmarking to ensure a reliable, safe and therefore trustworthy customer experience across navigation, infotainment and safety-relevant domains.
Lev Sorokin is a Researcher at BMW Group, where he currently focuses on testing conversational agents based on Generative AI.
He has a background in formal methods and software engineering, with research interests in the validation of dependable systems. His work also includes simulation-based testing using evolutionary approaches to ensure the safety of autonomous driving systems.
Benchmarking Generative AI for the In-Car Domain: Methods, Tools, and Hands-On Evaluation (Part 2-3)
As generative AI becomes part of customer-facing and potentially safety-relevant workflows, systematic evaluation is essential to assess their capabilities, limitations and robustness. This talk presents scalable and automated testing approaches for evaluating Generative-AI based conversational assistants.
Participants will gain insights into the design of benchmarking tasks, the construction of datasets, and the assessment of system performance using fully automated evaluation of metrics. In the accompanying hands-on session, participants will evaluate an AI-powered in-car assistant on realistic vehicle-related use cases. Through guided experiments, attendees will learn how to design benchmark scenarios, analyze system behavior, and interpret evaluation outcomes.
The session concludes with a discussion of current research challenges and emerging directions for the development and evaluation of trustworthy conversational AI systems.