Autonomous systems are no longer confined to screens and servers. As artificial intelligence migrates into robots, industrial sensors, and edge devices, the governance frameworks designed for pure software agents are starting to buckle. The core issue isn’t just whether these systems can complete tasks, but how their actions are tested, monitored, and halted when they interact with the physical world. A machine that fumbles a digital classification can be fixed with a patch. A robot that misinterprets a command on a factory floor can cause real damage.
The industrial robotics sector already illustrates the scale of this shift. The International Federation of Robotics reported that 542,000 industrial robots were installed worldwide in 2024, more than double the annual figure from a decade earlier. By 2028, that number is expected to exceed 700,000. Market researchers at Grand View Research estimate the global Physical AI market at $81.64 billion in 2025, with projections reaching nearly $960 billion by 2033. Those numbers, however, depend heavily on how vendors define intelligence in physical systems, a definition that remains slippery and contested.
From Model Output to Machine Movement
The governance challenge for Physical AI is fundamentally different from software only automation. Physical systems operate around workplaces, infrastructure, and people. They connect to equipment that demands clear safety limits. A model output can become a robot trajectory, a machine instruction, or a decision based on sensor data. That makes safety thresholds and escalation paths essential parts of system design, not afterthoughts.
Google DeepMind’s robotics work offers a concrete example of how large AI models are being adapted for this environment. In March 2025, the company introduced Gemini Robotics and Gemini Robotics-ER, both built on the Gemini 2.0 architecture. Gemini Robotics is a vision-language-action model designed to control robots directly, while Gemini Robotics-ER focuses on embodied reasoning, including spatial understanding and task planning. A robot using such a model may need to identify an object, parse a natural-language instruction, plan a sequence of movements, and then assess whether the task was completed correctly. That creates a layered control problem that blends model behavior with the mechanical limits of the system.
Google DeepMind has stated that useful robots require three traits: generality, interactivity, and dexterity. Generality covers unfamiliar objects and environments. Interactivity relates to human input and changing conditions. Dexterity refers to physical tasks that demand precise movement. In its launch materials, the company showed Gemini Robotics performing multi-step manipulations like folding paper, packing items into a bag, and handling objects it had never seen during training. Sound impressive? It is. But it also raises a question: how do you verify that a model understands when to stop, retry, or ask for help?
Success Detection and the Retry Loop
In robotics, success detection matters enormously. The system must decide whether a task is complete, whether it should retry, or whether it should halt entirely. Google DeepMind’s Gemini Robotics-ER 1.6, introduced in April 2026, packages these functions into a newer model. The company describes it as supporting spatial logic, task planning, and success detection, with the ability to reason through intermediate steps and decide whether to move forward or try again. That is a far cry from a standard language model that just predicts the next token.
For developers, Google’s documentation says Gemini Robotics-ER 1.6 is available in preview through the Gemini API. It is described as a vision-language model that brings Gemini’s agentic capabilities to robotics, including visual interpretation, spatial reasoning, and planning from natural language commands. Google AI Studio provides a developer environment for working with these models, putting testing and prompting closer to the engineers building agentic applications. That proximity is both a blessing and a burden: more control, but also more responsibility for safety.
Safety Controls Move into System Design
Governance becomes significantly more complex when these systems can call tools, generate code, or trigger physical actions. Controls must define what data the system can access, which tools it can invoke, which actions require human approval, and how every activity is logged for review. A 2026 McKinsey report on AI trust found that only about one third of organizations reported maturity levels of three or higher in strategy, governance, and agentic AI governance, even as AI systems take on more autonomous functions across industries.
In robotics, safety includes the physical behavior of the machine itself. Google DeepMind has described robot safety as a layered problem, covering lower level controls like collision avoidance, force limits, and stability, as well as higher level reasoning about whether a requested action is safe in context. The company also introduced ASIMOV, a dataset for evaluating semantic safety in robotics and embodied AI. The goal, they say, is to test whether systems can understand safety related instructions and avoid unsafe behavior in physical settings.
The same controls used for software agents, access rights, audit trails, refusal behavior, become much harder to manage when systems are bolted to robots, embedded in sensors, or wired into industrial equipment. A language model can refuse a toxic prompt; a robot might still need to physically stop before it crushes something. That gap between digital refusal and mechanical inertia is where governance for Physical AI will be tested, and where the next generation of safety engineering will need to focus.
The market is moving fast, and the technology is evolving even faster. But governance is not something you retrofit onto a robot after deployment. It has to be designed into the architecture, from the model’s internal reasoning all the way down to the torque limits on a servo motor. As Physical AI systems become more capable and more common, the question won’t be whether they can do the work, but whether we can trust them not to break things while doing it.