In many factories today, quality checks still happen selectively: only a small share of products is inspected manually. It’s practical, but it also means defects can slip through, and important insights get lost.
AI-powered camera systems offer a different approach. They can monitor every product passing through the line and instantly decide whether something is good or defective. Beyond quality control, this creates valuable data that helps optimize the entire production process and reduce waste.
Training these object detection models that locate the product in a picture faces two fundamental challenges:
The Labeling Bottleneck: To train an object detection model, large amounts of labeled images are needed. Every single image must be annotated by hand: someone has to draw a bounding box around each product and mark whether it is good or defective. This manual process is time-consuming, costly, and scales poorly. In many industrial AI projects, it quickly becomes the main bottleneck.
The Generalization Problem: Real-world conditions are messy and unpredictable. Different lighting, unusual angles, objects stacked on top of each other, the model needs to handle all these variations. But collecting training data that covers every possible scenario is nearly impossible.
A smarter approach: hybrid learning
To avoid months of manual labeling, we used a hybrid learning strategy. Instead of relying solely on real factory images, we combine them with large amounts of automatically generated, photorealistic simulation data.
Using Blender (the same tool used to create animated movies), we built a virtual production setup: a conveyor belt, realistic lighting, camera positions, and 3D models of the products. The advantage of simulation is simple but powerful:
Every generated image already comes with perfect labels.
No drawing boxes, no manual effort, just clean training data at scale.
We can also create scenarios that would be hard or rare to capture in real life:
- unusual lighting conditions,
- background changes,
- product variations,
- rare defect types,
- edge cases a real camera might only see once in a while.
This allows the model to learn a far more diverse set of situations. By mixing real images with simulated ones, YOLO models learn to generalize better, meaning they perform well not only on typical cases but also on unusual situations that occur in day-to-day production.
This way, we only need a much smaller set of real images—the simulated dataset provides the breadth and variation—allowing the model to be trained both efficiently and effectively. To apply this approach to a different production line, only a new 3D model of the product is required. Whether it's a connector, a medical device, or any other manufactured item, the Blender setup and the overall training process remain the same.
This method directly addresses the challenges of labeling and generalization described above and offers a scalable, highly practical solution for modern production line monitoring.
Inspired by progress in other industries
Hybrid learning is already proving its value in areas like autonomous driving, where companies rely on large-scale simulations to train models for rare or difficult-to-capture scenarios.
This work was done at Panda GmbH under the supervision of Michael Welsch.
