Nonlinear MPC for Quadruped Manipulator

Nonlinear MPC for Quadruped Manipulator

The rapid advancement of robotics has revolutionized various industries, particularly in Search and Rescue (SAR), where robots are becoming key tools for hazardous environments. Among these, legged-manipulator platforms have attracted research interest for their ability to combine mobility and manipulation in complex, unstructured terrains. These platforms can traverse rough terrain and interact with objects, making them ideal for SAR operations.

Project Overview

At the Technical University of Madrid, the RobCib research group has been actively developing advanced robotic systems for SAR operations. The primary focus of this project is the development, implementation, and testing of a Nonlinear Model Predictive Control (NMPC) system for the Unitree AlienGo quadruped robot equipped with a Z1 robotic arm. Through this advanced controller, the main objective is to enhance the robot’s capabilities for navigating unstable terrain and performing tasks such as door opening. The project integrates three modules involving computer vision, planning and NMPC control, and is built in ROS Noetic.

Based on OCS2 and ros-control, the developed NMPC-WBC controller facilitates tasks such as whole-body planning, end-effector motion tracking, and maintaining stability under force disturbances. The project builds upon the remarkable open-source contributions by SkyWoodz and QiayuanL.

NMPC Control Module

NMPC Control System

NMPC is an advanced control strategy designed to handle dynamic systems with nonlinear behavior. It predicts future system behavior and optimizes control actions in real-time. In this project, NMPC is used to control both locomotion and manipulation of the quadruped robot. The NMPC optimization problem is formulated as it follows: {minu()[iϕi(x(ti+1))+titi+1li(x(t),u(t),t)dt]s.t.x(t0)=x0Initial statex˙(t)=fi(x(t),u(t),t)System flow mapx(ti+1+)=j(x(ti+1))System jump mapg1i(x(t),u(t),t)=0State-input equality constraintsg2i(x(t),t)=0State-only equality constraintshi(x(t),u(t),t)0Inequality constraintsfor ti<t<ti+1 and i{0,1,,I1}\left\{ \begin{aligned} & \underset{\bm{u}(\cdot)}{\text{min}} & & \left[ \sum_{i}\phi_i(\bm{x}(t_{i+1})) + \int_{t_i}^{t_{i+1}} l_i(\bm{x}(t), \bm{u}(t), t) \, dt \right] & \\ & \text{s.t.} & & \bm{x}(t_0) = \bm{x}_0 & \text{Initial state} \\ & & & \bm{\dot{x}}(t) = \bm{f}_i(\bm{x}(t), \bm{u}(t), t) & \text{System flow map} \\ & & & \bm{x}(t_{i+1}^+) = \bm{j}(\bm{x}(t_{i+1})) & \text{System jump map} \\ & & & \bm{g}_{1i}(\bm{x}(t), \bm{u}(t), t) = 0 & \text{State-input equality constraints} \\ & & & \bm{g}_{2i}(\bm{x}(t), t) = 0 & \text{State-only equality constraints} \\ & & & \bm{h}_i(\bm{x}(t), \bm{u}(t), t) \geq 0 & \text{Inequality constraints} \\ & & & \text{for } t_i < t < t_{i+1} \text{ and } i \in \{0, 1, \cdots, I-1\} \end{aligned} \right.

In the problem, tit_i is the switching time, and tIt_I is the final time. For each mode, the nonlinear cost function consists of a pre-jump cost, ϕi()\phi_i(\cdot), that indicates the desirability of ending states before a switch occurs; and a cumulative cost over each interval, li()l_i(\cdot), that accounts for example for the error from a reference trajectory. The system flow map is driven by the system dynamics fi()\bm{f}_i(\cdot), and the system jump map, j()\bm{j}(\cdot), defines how the state x\bm{x} changes discontinuously at the time of switching ti+1t_{i+1}, reflecting changes due to new foot contact state.

State and Input Definition

The robot has a total of 18 DoF, 12 corresponding to the quadruped and 6 of the Z1 arm. This way, the optimal control problem is formulated according with the system state x\bm{x} and input u\bm{u} defined as: x=[hcomT,qbT,qjT]T, u=[weT,vjT]T,\bm{x} = [\bm{h}_{com}^T, \bm{q}_{b}^T, \bm{q}_{j}^T]^T, \ \bm{u} = [\bm{w}_{e}^T, \bm{v}_{j}^T]^T,

where hcomR6\bm{h}_{com}\in\mathbb{R}^6 is the collection of the normalized centroidal momentum, qbR6\bm{q}_{b}\in\mathbb{R}^6 is the base pose and qjR18\bm{q}_{j}\in\mathbb{R}^{18} are the joint positions. weR18\bm{w}_{e}\in\mathbb{R}^{18} are the contact wrenches acting on the end-effectors (note that it is assumed that the foot contacts are only subjected to 3-DoF forces and the manipulator end-effector is subjected to a 6-DoF wrench since it can also suffer a torque actuation) and vjR18\bm{v}_{j}\in\mathbb{R}^{18} are the joint velocities.

Cost Formulation

The NMPC cost formulation is simple and just defined by the quadratic cost of tracking the error of all states and the input: L(x,u,t)=xxrefQ2+uurefR2Φ(x)=xIxIrefQ2,\begin{aligned} &L(\bm{x}, \bm{u}, t) = \|\bm{x} - \bm{x}^{ref}\|_{\bm{Q}}^2 + \|\bm{u} - \bm{u}^{ref}\|_{\bm{R}}^2 \\ &\Phi(\bm{x}) = \|\bm{x}_I - \bm{x}_I^{ref}\|_{\bm{Q}}^2, \end{aligned}

where QR30×30\bm{Q}\in\mathbb{R}^{30\times30} is the standard weight matrix, and RR36×36\bm{R}\in\mathbb{R}^{36\times36} is the control weight matrix; both positive definite matrices that act as important tuning parameters of the NMPC planner.

Constraints

With respect to the constraints, four types of state-input equality constraints and four types of inequality constraints are implemented. The state-input equality constraints are the following: {vei=0    eiSfooteZero closed foot contact velocitieswei=0Zero open contact wrenchveizvswing(t)=0    eiSfooteSwinging foot contact trajectorypeipdesired=0    eiSarmeEnd-effector position trackingfee(t)+uinput=0End-effector force tracking\left\{ \begin{aligned} & \bm{v}_{e_i} = \bm{0} \;\mid\; e_i\in\mathcal{S}_{foot}^e & \text{Zero closed foot contact velocities} \\ & \bm{w}_{e_i} = \bm{0}& \text{Zero open contact wrench}\\ & v_{e_i}^z - v_{swing}(t) = 0 \;\mid\; e_i\in\mathcal{S}_{foot}^e & \text{Swinging foot contact trajectory} \\ & \bm{p}_{e_i} - \bm{p}_{\text{desired}} = \bm{0} \;\mid\; e_i\in\mathcal{S}_{arm}^e & \text{End-effector position tracking} \\ & -\bm{f}_{\text{ee}}(t) + \bm{u}_{\text{input}} = 0 & \text{End-effector force tracking} \end{aligned} \right.

where vei\bm{v}_{e_i} is the linear velocity of contact point eie_i expressed in the inertial frame I\mathcal{I}, and Sfoote\mathcal{S}_{foot}^e is the set of foot end-effectors. Sarme\mathcal{S}_{arm}^e is the set of arm end-effectors, and pei\bm{p}_{e_i} and pdesired\bm{p}_{\text{desired}} are the current and desired end-effector positions, respectively. fee(t)\bm{f}_{\text{ee}}(t) is the force exerted on the end-effector and uinput\bm{u}_{\text{input}} the force commands generated by the NMPC system. The equality constraints have the following implications: the foot of a stance leg should not separate or slip with respect to the ground, wrenches wei\bm{w}_{e_i} at open contacts vanish, all swing legs should track a reference trajectory vswing(t)v_{swing}(t) along the surface normal, and the end-effector position tracks the desired one, as well as the externally applied force. The inequality constraints are the following: {qjminqjqjmaxJoint positions operational limitsvjminvjvjmaxJoint velocities operational limitsd(x)ϵ1np×10np×1Robot self-collisions avoidanceμsfeizfeix2+feiy2+δ20    End-effector force inside friction cone\left\{ \begin{aligned} & \bm{q}_j^{min}\leq \bm{q}_j\leq \bm{q}_j^{max} & \text{\small Joint positions operational limits} \\ & \bm{v}_j^{min}\leq \bm{v}_j\leq \bm{v}_j^{max} & \text{\small Joint velocities operational limits} \\ & \bm{d}(\bm{x}) - \epsilon\cdot\bm{1}_{n_{p}\times1}\geq\bm{0}_{n_{p}\times1} & \text{\small Robot self-collisions avoidance}\\ & \mu_s f_{e_{i}}^z - \sqrt{{f_{e_{i}}^x}^2 + {f_{e_{i}}^y}^2 + \delta^2} \geq 0 \;\; & \text{\small End-effector force inside friction cone} \end{aligned} \right.

where d(x)Rnp\bm{d}(\bm{x})\in\mathbb{R}^{n_p} are the signed distances between npn_p pairs of links that are represented with primitive collision bodies, and ϵ0\epsilon\geq0 is the minimum allowable distance between collision pairs. μs\mu_s is the static friction coefficient and δ0\delta\neq0 is needed to smoothen the friction constraints. The inequality constraints have the following implications: the joint positions must not overpass their physical limits, the same applies to the joint velocities, robot’s self-collisions are avoided with a certain margin ϵ\epsilon, and finally, the forces acting on the feet lie in their respective friction cones, that is, the maximum static friction is not overpassed.

Optimization Problem Numerical Solving

To solve this optimal control problem, a direct multiple shooting method is formulated to transcribe the optimal control problem to a non linear program problem, which is solved using SQP. The QP subproblem is solved using HPIPM. The outputs of the NMPC planner are the optimized state and control input, x\bm{x}^* and u\bm{u}^*, which are passed to the WBC controller. The time horizon of the NMPC planner is T=1T=1 s, and the loop computes feedforward trajectories at an average update rate of 100 Hz.

Whole-Body Controller (WBC)

The optimal reference plans for the base and limbs are tracked by a WBC controller that tries to fulfill a set of prioritized tasks. These tasks are formulated as a hierarchical QP that optimizes for the generalized accelerations and contact forces. The purpose of the WBC is not to directly track the optimal ground reaction forces but to capture their influence by tracking the reference motion they induce on the base to keep the robot balanced. This is because the WBC adjusts the optimized forces from the NMPC if they violate any high priority objectives. The only exception are the arm contact forces since the WBC has no knowledge of the manipulated object’s dynamics. WBC only considers the current moment, and each task corresponds to equality or inequality constraints dependent on decision variables. The decision variables for the WBC are: xwbc=[q¨T,weT]T,\bm{x}_{wbc} = {\left[\ddot{\bm{q}}^T, \bm{w}_e^T \right]}^T,

where q¨=[q¨b,q¨j]TR24\ddot{\bm{q}}=\left[\ddot{\bm{q}}_b, \ddot{\bm{q}}_j\right]^T\in\mathbb{R}^{24} are the accelerations of the generalized coordinates. The computation of these decision variables is made through a series of calculations from x\bm{x}^* and u\bm{u}^*. The following table shows the WBC prioritized tasks list:

The WBC solves the QP problem in the null space of the higher priority tasks’ linear constraints and tries to minimize the slacking variables of inequality constraints. This approach can consider the full nonlinear rigid body dynamics and ensure strict hierarchy results with good real-time performance. The outputs from the WBC are the optimized joint position, velocity and torque references for the low-level control module: qj\bm{q}_j^*, q˙j\dot{\bm{q}}_j^*, and τj\bm{\tau}_j^*, respectively. The actuator torque commands, τa\bm{\tau}_{a}, are generated by the low-level control module running at 2 kHz, according to the following equation: τa=τj+Kp(qjqj)+Kd(q˙jq˙j),\bm{\tau}_{a} = \bm{\tau}_j^* + \bm{K}_p(\bm{q}_j^* - \bm{q}_j) + \bm{K}_d(\dot{\bm{q}}_j^* - \dot{\bm{q}}_j),

where Kp\bm{K}_p and Kd\bm{K}_d are matrices with control tuning parameters corresponding to the proportional and differential gains of each of the 18 joints of the quadruped manipulator.

State Estimator

The state estimator consists of a Extended Kalman filter (EKF) which fuses the motor encoder readings, the visual odometry from the AlienGo front camera, and the IMU measurements to estimate the base pose and angular velocities, and the joint positions and velocities. This state estimation is fed into both the whole body planner and the controller. The WBC, along with the state estimator constitute the main control loop running at 500 Hz.

Vision Module

An advanced computer vision system allows the robot to autonomously interact with its environment. The system is designed to detect and manipulate door handles, enabling the robot to open push doors without human intervention. It uses information from two depth cameras to accurately position the robot and manipulate objects in controlled environments. Two YOLOv8 models were trained for door detection and handle classification and segmentation, and four ROS nodes were programmed to perform the different vision-related door opening tasks.

Planning Module

The planner module developed in this project is not a general-purpose navigation planner for obstacle avoidance, as this was beyond the scope of this project. Instead, the primary focus was on the control aspects of the robot, leveraging the existing NMPC controller for locomotion tasks, and developing a specific planner for door manipulation. In general locomotive tasks, such as the SAR environments where the robot will be tested, the trajectory publisher integrated within the NMPC controller assumes the role of the planner, generating references for the base from the end-effector commands. For door-opening tasks, a specialized planner was developed within the qm_planner\texttt{qm\_planner} ROS package.

Simulation and Real-World Testing

Simulations were conducted using the Gazebo platform to evaluate the robot’s performance in various SAR scenarios such as navigating uneven terrain, climbing stairs, and opening doors. In addition to simulations, real-world tests were performed to validate the system’s performance. These tests provided insights into the robot’s capabilities in practical SAR scenarios.

Code


© 2024. All rights reserved.