Position, Orientation and Coordinate Transformations

Coordinate frames are attached to rigid bodies to represent the relative pose (position and orientation) between the rigid bodies. The geometric relationship between these two coordinate frames is then specified.


The position of the origin of one frame with respect to another frame can be described with a translation vector (3x1):

\[\begin{split}\boldsymbol{t} = \begin{bmatrix} x \\ y \\ z \end{bmatrix}\end{split}\]


The orientation of one coordinate frame relative to another frame can be described with a rotation matrix (3x3):

\[\begin{split}\boldsymbol{R} = \begin{bmatrix} r_{11} & r_{12} & r_{13} \\ r_{21} & r_{22} & r_{23} \\ r_{31} & r_{32} & r_{33} \\ \end{bmatrix}\end{split}\]


A homogeneous transformation matrix (4x4) simultaneously represents position and orientation of one coordinate frame relative to another coordinate frame:

\[\begin{split}\boldsymbol{H} = \begin{bmatrix} \boldsymbol{R} & \boldsymbol{t}\\ 0 & 1\\ \end{bmatrix} = \begin{bmatrix} r_{11} & r_{12} & r_{13} & x\\ r_{21} & r_{22} & r_{23} & y\\ r_{31} & r_{32} & r_{33} & z\\ 0 & 0 & 0 & 1\\ \end{bmatrix}\end{split}\]


A rigid body possesses at most three rotational degrees of freedom. In other words, only three independent quantities are sufficient to describe an arbitrary rigid body orientation. One such minimal representation of orientation is the roll-pitch-yaw representation, defined by roll (\(\phi\)), pitch (\(\theta\)), and yaw (\(\psi\)) angles:

\[\boldsymbol{R}(\phi, \theta, \psi)\]

The advantage of roll-pitch-yaw representation is that it requires only three values. These values are also geometrically easy to understand compared to other representations. The disadvantages are:

  • The values are not continuous.

  • The final orientation depends on:

    • The order of applying the rotations.

    • Applying rotations about moving axes (intrinsic rotations) or fixed axes (extrinsic rotations).

The roll-pitch-yaw representation is common among robot vendors. However, not all robot vendors assume the same convention. Understanding the convention, i.e., the order of rotations and whether they are intrinsic or extrinsic is necessary to convert roll-pitch-yaw angles to other representations.

Axis-Angle / Rotation Vector

Axis-angle representation describes the following:

  • A rotation by a unit vector (\(\boldsymbol{u}\)) indicating the axis of rotation direction.

  • An angle (\(\theta\)) describing the magnitude of rotation about the axis.

There are four parameters in total. Multiplying each element of the unit vector with the angle is a common method to minimize the number of parameters, the result of which is the rotation vector (\(\boldsymbol{r}\)):

\[\boldsymbol{r} = \begin{bmatrix} r_x & r_y & r_z \end{bmatrix} = \begin{bmatrix} u_x \theta & u_y \theta & u_z \theta \end{bmatrix}\]

The advantage of the axis-angle representation over roll-pitch-yaw angles is that it is free from continuity and rotational sequence issues. However, it is hard to match the physical orientation and the numerical values of the rotation vector. Another disadvantage is that it is impossible to apply the rotation directly on a 3D point; this requires conversion to a different representation.

Unit Quaternion

Unit quaternions represent a simple but robust method to encode the axis-angle representation with four parameters.

\[\boldsymbol{q} = \begin{bmatrix} q_{w} & q_{x} & q_{y} & q_{z} \end{bmatrix}\]

Unit quaternions are considered the best method of representing orientation between two coordinate frames because they are more compact, more numerically stable, and more efficient than rotation matrices. Compared to roll-pitch-yaw representation, unit quaternions do not suffer from gimbal lock (impossibility to represent orientation uniquely). Also, unlike axis-angle representation, it is possible to apply unit quaternions directly to 3D points.

Coordinate Transformations

Any coordinate transformation of a rigid body in 3D can be described with a rotation and a translation. For example, it is possible to transform a point (or a point cloud) from one coordinate frame to another coordinate frame. This can be performed with a rotation matrix and a translation vector:

\[p_{1} = \boldsymbol{R} p_{0} + \boldsymbol{t}\]

The same can be done with a homogeneous transformation matrix describing the pose between the two frames:

\[p_{1} = \boldsymbol{H} p_{0}\]