Position, Orientation and Coordinate Transformations

To represent the relative pose (position and orientation) of one rigid body with respect to another, coordinate frames are attached to each body. Then the geometric relationship between these two coordinate frames is specified.

Position

The position of the origin of one frame with respect to another frame can be described with a translation vector (3x1):

\[\begin{split}\boldsymbol{t} = \begin{bmatrix} x \\ y \\ z \end{bmatrix}\end{split}\]

Orientation

The orientation of one coordinate frame relative to another frame can be described with a rotation matrix (3x3):

\[\begin{split}\boldsymbol{R} = \begin{bmatrix} r_{11} & r_{12} & r_{13} \\ r_{21} & r_{22} & r_{23} \\ r_{31} & r_{32} & r_{33} \\ \end{bmatrix}\end{split}\]

Pose

A homogeneous transformation matrix (4x4) can simultaneously represent the position and orientation of one coordinate frame relative to another:

\[\begin{split}\boldsymbol{H} = \begin{bmatrix} \boldsymbol{R} & \boldsymbol{t}\\ 0 & 1\\ \end{bmatrix} = \begin{bmatrix} r_{11} & r_{12} & r_{13} & x\\ r_{21} & r_{22} & r_{23} & y\\ r_{31} & r_{32} & r_{33} & z\\ 0 & 0 & 0 & 1\\ \end{bmatrix}\end{split}\]

Roll-Pitch-Yaw

A rigid body possesses at most three rotational degrees of freedom. This means that an arbitrary rigid body orientation can be represented using only three independent quantities. One such minimal representation of orientation is the roll-pitch-yaw representation, defined by roll (\(\phi\)), pitch (\(\theta\)), and yaw (\(\psi\)) angles:

\[\boldsymbol{R}(\phi, \theta, \psi)\]

The advantages of roll-pitch-yaw representation is that it requires only three values. These values are also geometrically easy to understand compared to other representations. The disadvantages are:

  • The values are not continuous.

  • The final orientation depends on:

    • The order of the rotations.

    • Whether the rotations are applied about moving axes (intrinsic rotations) or fixed axes (extrinsic rotations).

The roll-pitch-yaw representation is common among robot vendors. However, not all assume the same convention, and understanding the convention, i.e. order of rotations and whether they are intrinsic or extrinsic, is necessary for converting roll-pitch-yaw angles to other representations.

Axis-Angle / Rotation Vector

Axis-angle representation describes the following:

  • A rotation by a unit vector (\(\boldsymbol{u}\)) indicating the axis of rotation direction.

  • An angle (\(\theta\)) describing the magnitude of rotation about the axis.

There are four parameters in total. To minimize the number of parameters, it is common to multiply each element of the unit vector with the angle, the result of which is the rotation vector (\(\boldsymbol{r}\)):

\[\boldsymbol{r} = \begin{bmatrix} r_x & r_y & r_z \end{bmatrix} = \begin{bmatrix} u_x \theta & u_y \theta & u_z \theta \end{bmatrix}\]

The advantage of the axis-angle representation over roll-pitch-yaw angles is that it is free from continuity and rotational sequence issues. However, it is hard to match between the physical orientation and the numerical values of the rotation vector. Another disadvantage is that it is impossible to apply the rotation directly on a 3D point; doing this requires conversion to a different representation.

Unit Quaternion

Unit quaternions represent a simple but robust method to encode the axis-angle representation with four parameters.

\[\boldsymbol{q} = \begin{bmatrix} q_{w} & q_{x} & q_{y} & q_{z} \end{bmatrix}\]

Unit quaternions are considered to be the best method of representing orientation between two coordinate frames. This is because they are more compact, more numerically stable, and more efficient than rotation matrices. Compared to roll-pitch-yaw representation, unit quaternions do not suffer from gimbal lock (impossibility to uniquely represent orientation), and unlike axis-angle representation, they can be directly applied to 3D points.

Coordinate Transformations

Any coordinate transformation of a rigid body in 3D can be described with a rotation and a translation. For example, a point (or a point cloud) can be transformed from one coordinate frame to another coordinate frame. This can be performed with a rotation matrix and a translation vector:

\[p_{1} = \boldsymbol{R} p_{0} + \boldsymbol{t}\]

The same can be done with a homogeneous transformation matrix describing the pose between the two frames:

\[p_{1} = \boldsymbol{H} p_{0}\]