The room was silent. We’d just finished presenting our virtual reality (VR) training pilot results — impressive learning improvements, enthusiastic feedback, clear benefits. Then came the question I’d been dreading: “Great. Now how do we roll this out to locations across the U.S. and Canada?”

Everyone in learning and development (L&D) has been there. You’ve proven the concept. The pilot worked beautifully. Leadership is excited. And then reality hits: scaling innovative training technology from a controlled pilot to enterprise-wide deployment is where most programs die.

Three years later, we’ve done it. At Amazon, we led the deployment of VR-based driver training, a program that now supports facilities across North America. The results have exceeded expectations. But the journey from pilot to production taught us lessons that no vendor presentation could have prepared us for.

If you’re stuck in pilot purgatory with any learning technology, here are the five challenges that nearly ended our program — and how we solved them.

Challenge 1: Hardware Logistics at Scale

Our pilot reality was manageable, but the enterprise reality was complex: many locations with different layouts, varying capabilities and devices shipped across two countries spanning four time zones.

Our first attempt nearly failed within weeks. We bulk-ordered headsets, shipped them to facilities, and then complexities ensued. Headsets arrived damaged. Setup instructions that made sense to us baffled facilities without technical staff. WiFi networks weren’t configured for VR bandwidth. Charging stations became bottlenecks. Maintaining hygiene and personal safety standards was critical, as communities continued to navigate the COVID-19 pandemic.

We completely rethought our approach. First, we created a facility readiness assessment, evaluating network capability and local support before shipping anything. Second, we developed plug-and-play kits with everything needed in standardized packaging. Third, we implemented tiered support. Finally, we established mandatory hygiene protocols with disposable face covers and sanitizing schedules.

The lesson: Logistics kills more scaling efforts than any other factor. We eventually spent more time designing logistical processes than selecting the VR platform. Every operational detail we got wrong could lead to failure.

Challenge 2: Standardization Vs. Localization

One of VR’s promises is consistent delivery — every trainee experiences the same scenarios, right? Wrong, unless you put in significant effort.

Different facilities had different needs. Urban routes needed different scenarios than rural ones. Northern facilities needed winter weather content. We faced a dilemma: standardization, which is easier to manage, or localization, which is more relevant?

Our solution was a hybrid model of standardized and custom content. We implemented standard defensive driving, hazard perception and customer interaction content across all locations. We also addressed custom needs, created by teams focused on case-by-case differences using the standard template.

This preserved consistency while allowing customization where it mattered. Facilities felt ownership while we maintained quality control.

The lesson: Perfect standardization is a myth at scale. Build flexibility into your system from the beginning.

Challenge 3: The Instructor Problem

Introducing VR changed the instructor’s role, and not everyone welcomed that change.

Traditional driver training is instructor-led. The instructor’s expertise and experience are central. VR scenarios are self-directed, with instructors becoming less hands-on facilitators. Some embraced this while others did not.

We quickly realized the need to highlight the new opportunities this technology presented. VR doesn’t replace instructors; in fact, it frees them for higher-value activities like personalized coaching. We invested in them by asking training instructors to become local subject matter experts. We incorporated their feedback into scenario development. We measured overall trainee performance and delivery process errors, not just VR completion rates.

When instructors saw that VR-trained drivers performed better in tests and had fewer errors, it demonstrated that VR supported their goals. We celebrated facility successes publicly, making instructors advocates of the program, not spectators.

The lesson: Technology change is people change. If the people delivering training don’t believe in your solution, the program will fail through passive resistance.

Challenge 4: Measuring What Matters

VR engagement metrics are easy to track — completion rates, time in headset, scenario attempts. But these don’t answer what truly matters: Is this making drivers better?

We tracked immediate metrics like completion rates and performance scores. But outcome metrics were what we needed: service quality, training time to competency, etc.

The critical connection we discovered changed how we thought about VR’s value. The impact wasn’t just skill development. The bigger benefit was pattern recognition. Drivers who experienced scenarios in VR recognized similar situations developing in real life earlier, giving them more time to respond.

The lesson: Define success criteria before you scale. Track outcomes that matter to stakeholders, not just metrics that are easy to collect.

Challenge 5: Sustaining the Program

Eighteen months in, we hit an unexpected obstacle: usage tailed off.

It wasn’t a technology problem. Equipment worked fine. Instructors were trained. But VR sessions were getting skipped, pushed to “when there’s time” or completed without engagement. Novelty wore off. What was exciting in month one became routine by month twelve. And “routine” easily became “optional” when operational pressures mounted.

We made VR training part of standard work rather than optional. We committed to periodic content refreshes, including scenarios based on real recent incidents. We shared success stories monthly. We established continuous improvement processes, showing that the program evolved based on user feedback. And we maintained dedicated program management even when it felt like the program should be self-sustaining.

The lesson: Launch is not success. Sustaining a program at scale requires different strategies than getting to scale.

The Results

After three years supporting many locations, was it worth it? Unequivocally yes.

VR-trained drivers showed better performance on training assessments, faster time-to-competency and improved hazard recognition. We saw reduction in delivery errors and improved customer experience. Active usage continues at all facilities with strong instructor buy-in and full curriculum integration.

But the success didn’t come from VR being magic. It came from solving the scaling challenges that prevent most programs from reaching their potential.

In my next article, I’ll share the five-principle framework we developed for scaling any learning technology — lessons applicable far beyond VR training.