Hybrid cloud and data management for advanced driver assistance systems

The vehicle of the future is all about data.

We know about the critical role of data management for autonomous driving development. Today, data and metadata management for autonomous driving (AD) and advanced driver assistance systems (ADAS), is increasingly moving from on-premises systems to the cloud. Over the last two years, we have noticed a clear preference for cloud-based AD/ADAS data processing.

This paper investigates what’s driving the move to cloud, identifies inhibitors and evaluates hybrid solutions and how they could be optimised for “best-of-breed” implementation.

Why the move to cloud

Building AD/ADAS functionality as innovative features in the next generation of premium car models is an ambitious and complex undertaking, requiring the addition of multiple development streams to established processes and schedules. This leads to several unknowns that are relevant for both the data and compute platforms.

Automakers need to make “build versus buy” decisions, and suppliers need to build prototypes to win bids for subsystems of an overall solution in the ADAS vehicle. These activities lead to uncertainties in timelines, volumes and ramp-up curves for the number of cars, duration of their operation and data collection. In many cases the projects are first-of-a-kind implementations with challenges never encountered before.

On-premises implementations of data lakes and compute clusters are often in place too early or too late, require very careful planning, need to be flexible and need to be able to ramp their capacity up or down smoothly. They require committing capital and resources up front, are potentially underutilised and have a large carbon footprint.

In contrast, cloud-based implementations have much quicker implementation times, with shorter budgeting, planning, designing, sizing and delivery times. Ramping up capacity for a variety of cloud services can be done as needed, so you only pay for what you consume. Similarly, it’s easy to experiment with cloud services in a very agile way to identify the best solution. Cloud consumption metering enables billing to departments or individual projects in a very straightforward way, whereas in an on-premises environment additional software and new processes are required.

The move to cloud comes with the expectation to save costs and improve the carbon footprint, which often leads to an overall “cloud first” and datacenter exit strategy. Despite AD/ADAS workload implementations usually having a very large footprint, they are often isolated and not part of a corporate cloud strategy. Due to the difficulty of moving them, AD/ADAS workloads are usually one of the last environments that stay on-prem in an overall cloud migration strategy at the enterprise level.

Let’s examine the technical considerations that affect the move to cloud.

Technical considerations

In AD/ADAS, data is generated and, as a final step, processed outside of the cloud. As a brand new vehicle is developed, fleets of R&D cars typically clock about 1 million miles on test drives. Data that meets quality standards needs to be ingested securely into the cloud at rates of up to 1-2 petabytes a day. DXC advises using a variety of data quality checks before consuming cloud capacity. We have also seen automakers focus on a 100% cloud base for data processing and performing data quality checks only after a cloud upload. It’s important to note, though, that businesses not move forward with a direct upload to the cloud without thoroughly exploring any factors that might affect these implementations.

Hardware-in-the-Loop (HIL) Testing

The ultimate step in the data processing chain before deploying code in real cars is hardware-in-the-loop (HIL), sometimes referred to as “hardware-open-loop” (HOL). As the name implies, this is a step outside of the cloud. In HIL / HOL data needs to be provided to the electronic control unit (ECU) that is tested — called a “device under test” (DUT) — in real time or almost real time. Given the amount of data and the iterative nature of the work to fine-tune ADAS algorithms, large HIL farms with petabyte-sized buffer storage are required. HIL farms constitute a massive on-prem element that need to be operated in a cloud co-location site, as networking throughput requirements are in the terabyte per second (TB/s) range.

Today the ideal environment for AD/ADAS data processing appears to be Docker containers with Kubernetes, also augmented with vendor- or community-supported management products such as Elastic Kubernetes Service (EKS) or Azure Kubernetes Service (AKS). That approach leads to excellent scalability, portability and little overhead to manage those environments. In addition, the container storage interface (CSI) is supported by many storage products; thus interoperability is broadly assured.

Cloud spot instances are very attractive from a cost savings standpoint, but create a technical challenge: Instance eviction (that is, losing an instance to an on-demand resource request) may lead to a disruption of data processing and compute jobs may need to be repeated. But how much repeat work is really required? In a proof of concept for short running simulations (10-30 seconds) not a single repetition was required, as this period is shorter than eviction warning times. Similarly, data reprocessing of time-sliced recordings are also excellent workloads for spot instances, and, in most cases, longer-running applications could be modified to allow almost seamless continuation of processing. Thus, spot-instance based, autoscaling container management systems are favored for AD/ADAS workloads.

On the storage side, cloud object storage excels with the highest values of data durability (99.999999999% or even higher). Let’s reflect on that for a moment: The raw data collected in a driving campaign comes from, as an example, 10 cars each driving for 200 days per year, or 2,000 driving days in total. If one of those cars becomes defective for a day more than expected, the raw data volume shrinks to 99.95% — not significant enough to impact the driving campaign. If the same happened to raw data already stored, again the impact would be minimal. This makes object storage too sophisticated an option for raw ADAS data; instead, low durability, high throughput and cheap storage classes would be more appropriate, but this special combination is not really offered by hyperscalers today. Of course, the outstanding technical properties of object storage are important for metadata and derived data (such as KPIs), but those are only a small fraction of the total volume. In that sense the least expensive storage classes should be selected.

Cost optimization in the cloud

Cloud architectures for AD/ADAS are being recommended by the major hyperscalers such as Amazon Web Services and Microsoft. The major building blocks are always object store services and virtualised compute capacity. Specialised native services are used for data analysis and management, AI workloads and ultimately a variety of auxiliary services around topics such as security, networking, monitoring and logging. Our experience is that costs for auxiliary cloud services and their management services are approximately the same as the discounts for the core services provided by the hyperscalers.

What about storage costs? List prices of object storage in the cloud are in the range of $20-25 per terabyte per month. There are many options to consider — such as frequency of access and transfers out of that store — that could be used to lower the cost. However, many additional factors are hard to predict and control: for example, there is an extra cost for unplanned access to cold storage. But in the end, the standard choices and cost figures lead to a quite reasonable estimation. For 100 petabytes over 3 years — consumption, not overall capacity — the cost would be $74-92 million.

How does this compare to on-prem storage? We have performed a benchmark and collected quotations for storage with high throughput capabilities as required for AD/ADAS use cases, in the range of $150,000 per petabyte, which covers hardware, software and maintenance. Datacenter and service costs could increase that by a factor of 2.5 to 3. So, the total cost of ownership for 100PB over 3 years is estimated to be $37.5 – $45 million (again, this is for overall capacity, not consumption). So on-prem (capacity) versus cloud (consumption) is not an equivalent comparison, though these high-level estimates shed light on the potential costs.

Note, too, that currently most data is not deleted or even archived in large volumes. There are many reasons to keep it readily available for immediate access. DXC Technology maintains, however, that a better understanding of data could lead to a drastic reduction of “active data,” which is a requirement for moving to higher levels of autonomous driving. More complex functions require highly selective data processing to test algorithms based on data for critical driving situations only at affordable cost and within timescales that align with overall automotive planning cycles.

What about compute? We have recent data for servers with 48 physical cores or 96 vCPUs (cloud uses a 1:2 ratio through hyperthreading). On-prem costs for hardware, maintenance and operating systems are in the range of $12,000-15,000 per server. Additional costs increase this to $45,000 per server over 3 years. Similar capacity out of the cloud, based on 3-year reserved instances, almost double those costs. Again, it’s very important to note that this is not the actual cost of consumption; for consumption, the cloud costs depend on the workload pattern.

Our observation is that compute workloads for simulations are subject to fluctuation, so working with reserved instances should be limited. There is variation driven by the development cycles of the AD/ADAS R&D engineers — by month or quarter, for example — but there is also a daily or weekly workload pattern that can introduce variations, due to uncoordinated schedules for job submission, mainly for simulation and data re-processing workloads. Average utilisation is in the range of only 20-50%, and that builds a case for computing in the cloud, where on-demand consumption is more cost effective than reserved or on-prem capacity. Furthermore, utilisation of spot instances, which have discount rates in the range of 50-90%, will create a business case for compute in the cloud.

How DXC can help provide an end-to-end solution

As data is generated and consumed outside of the cloud, any solution for AD/ADAS data processing will be a hybrid implementation — one that either minimises the cloud or maximises it.

Keeping data in one place would seem to be the most sensible solution in many circumstances, in order to eliminate costs and times for petabyte-scale movement of data, but the cloud can serve as an archive, if no technical option exists for this coldest data tier, as part of an on-premises installation. In view of the considerations above, the most sensible on-premises installations are very large — several dozen petabyte-sized static data-volumes that are nevertheless accessed frequently — combined with HIL farms. Quite clearly, the cloud has very strong advantages for compute workloads, such as simulation and software-in-the-loop (SIL) reprocessing, especially for short-running jobs.

In short, two wildly differing options can be considered: Put all highly valuable data into the cloud, or keep it on-premises. Between these scenarios, hybrid solutions could be built; the best way to balance the two will depend on several factors:

Size, technical capability and amount of technical debt on legacy environments related to the use of individual workstations or relational databases
Synergy effects with the corporate cloud strategy
Commercial agreements with hyperscalers, in particular for data transfers
Synergy effects with processing other vehicle data or use cases
Co-operation requirements with partners and their cloud strategy
Predictability of data and compute volume growth
Timewise variability of workloads
HIL processing requirements
Data analysis, reduction, cataloguing and management capabilities

DXC is increasingly focused on the last point, as we engage with multiple automakers and Tier 1 automotive suppliers. Strong data management capabilities will be an important element in building the best possible hybrid environment.

The future of data and compute platforms for AD/ADAS data processing is hybrid on-prem/cloud, making use of technical solutions where they have the best performance/price ratio, while also creating and implementing strong data management solutions.

Learn more about DXC's Automotive and Hybrid Cloud expertise.

About the author

Dr. Günter Koch is the global solution lead for DXC Robotic Drive, heading up an international team of specialized solution architects responsible for the Robotic Drive solution and DXC’s Autonomous Driving Platform and Toolkit. In the development of autonomous driving technology, Günter is focused on both technical innovation and cost-effective delivery. He holds a doctorate in nuclear physics research and is a highly experienced data analyst. Connect with Günter on LinkedIn.