There are many hurdles to the development, adoption and operation of autonomous vehicles. They include regulatory compliance, customer acceptance and finding viable business models. Additionally, development complexity with its associated risks and cost underlies all these issues. This paper addresses one of the most significant success factors for the development and operation of autonomous cars: Data-driven development and the ability to safely and efficiently collect, process and truly understand vast amounts of data in billions of environment scenarios.

Vehicle sensors (radar, lidar, camera, ultrasonic) and real-time maps are real-world data sources, but they fail to provide sufficient coverage of the possible situations an autonomous car may encounter. Additional synthetic data needs to be created through, for example, generated scenarios and the output of open- and closed-loop simulated systems. This synthetic data will be many orders of magnitude greater than the real-world data, most likely somewhere between hundreds of petabytes to a small number of exabytes.

Figure 1: Real and synthetic data considerations


The automotive industry is moving beyond advanced driver assistance (L2+) to partially, highly and eventually fully automated vehicles (L3 – L5). Data challenges will prevail as the number of scenarios needed for development and validation reaches millions and billions:

  • Data volumes and complexity exceed the capabilities of established IT architectures and even push the public cloud to its limits.
  • Automotive OEMs and suppliers struggle with transforming existing development processes to solve autonomous driving (AD) development challenges. These issues require a data- and AIcentered approach based on a seamless combination of real and synthetic data.
  • New technologies and data streams require new concepts. The automotive industry needs to move not only to a software-centric culture but also to a data-centric one.

To be clear, these are key challenges specific to data-driven development for AD. They come on top of the overall challenges of creating a software-defined vehicle. Furthermore, as the number of AD vehicles on the road increases, this in-service fleet will consume and generate increasing amounts of real-time data that will require intelligent processing at the edge of the vehicle.

Our perspective

Data and AI take a much more significant role when compared to traditional, sequential development processes (V-model). Training and validation of the system based on real-world and synthetic data become essential. Therefore, existing development processes, tools, methods, architectures and even cultures must be adapted to an agile, data-driven development approach.

Figure 2: AD process triangle


From our perspective, three important areas within data-driven development need to be considered to “solve” the AD process triangle efficiently:

  1. Data: Data management and AI at scale – integrating real, hybrid and synthetic data
  2. Software: Continuous integration and deployment (CI/CD) through the complete software and hardware development process, covering the complete life cycle of the vehicle
  3. Automation: Automate everything to establish continuous validation and improvement of driving functions with billions of scenarios

Data and software processes need to be brought together in a continuous loop from the very beginning.

Data and AI loop

Massive amounts of real and synthetic data are continuously processed to generate the ground truth data and to train and retrain an algorithm to cover billions of scenarios. This process is neverending, as reality changes faster than we can write this report, and new corner cases need to be found and filtered as quickly as possible.

Software and hardware loop

The trained AI needs to be brought into the software and the car in parallel. For each system level (component, function, sub-systems, whole system), we need to run a set of tests with real, synthetic and hybrid data for simulation. These tests need to be executed within simulated environments with various hardware-in-the-loop (HIL) steps, down to the vehicle level. This full scope of behavior-/bit-/timing-/binaryequal test execution, multiplied by all available test scenarios, creates the test coverage to prove the correct behavior for all operational design domains. Ultimately, it also fulfills automotive safety requirements (software and hardware loop).

Automate everything

Running and integrating the ‘data and AI loop’ and the ‘software and hardware loop’ requires continuous data management, processing on a new scale and CI/ CD. Build processes and systems must be seamlessly integrated with data-driven development processes and systems. With a development environment capable of automatically triggering, executing and analyzing all the abovedescribed types of tests for each code commit, the feedback cycles will be shortened tremendously.

Additionally, this allows the development experts to focus almost 100% of their time on developing functionality for autonomous vehicles.

Figure 3: The complexity of needing hybrid data in development



Data-driven/AI-based development will accelerate and change the course of software development for autonomous vehicle systems.

It is essential that data and software development processes interact continuously and are not seen as sequential development processes within isolated functional development teams and departments.

This cannot be achieved with legacy in-car software architecture or its evolutionary development; this would just slow down dataprocessing capability and AI requirements of the future. We need true software/hardware separation and more compute power and processor specialization. We also need more memory, centrally accessible sensor data and dynamic payload deployment – all while keeping the integrity of the overall safety-related system.

To make this happen, automotive players urgently need to set up programs to restructure the development toolchain – including all development streams – with automotive, embedded, IT, system testing and data science experts. They need a scalable foundation and system design that can be extended step by step.

Simulation and virtualization are development game changers for realizing fast feedback (CI/CD) while maintaining overall maturity/quality level when implementing much more complex software systems in a shorter time.


Learn more about DXC Automotive and our Data and Analytics solutions. 


About the author

About the author

Matthias Bauhammer is the global offering leader for DXC Robotic Drive, responsible for the end-to-end AD development consulting, services and platform used by automotive OEMs globally to run open, massively scalable, data-driven development and validation workloads that connect cloud, on-premises and embedded environments. Previously, he was head of business development and strategic projects for DXC’s automotive industry AI and analytics programs. Matthias has more than 20 years’ global experience in analytics, data management and AI.


Franz Gaber
Head of program management, Luxoft

Christoph Hennig
Automotive program manager, Luxoft

Guenter Koch
Global solution lead, DXC Robotic Drive


This perspective was originally published by Luxoft, a DXC Technology company, as part of a joint white paper series with EY, entitled Automotive trends: The future of mobility.