PoSDK Standard Pose Conventions

To ensure consistency in library internal operations and clarity when interacting with external data, PoSDK follows the following standard conventions for camera pose representation.

1. Global Camera Pose (`types::GlobalPoses`)

In PoSDK, the global pose of a single camera (i.e., the camera’s orientation and position relative to the world coordinate system) is defined by a rotation matrix \(R\) and a translation vector \(\mathbf{t}\). By default, PoSDK uses the PoseFormat::RwTw format.

PoseFormat::RwTw (Default)

Meaning: A 3D feature point \(\mathbf{X}_w\) in the world coordinate system is transformed to a normalized image feature point \(\mathbf{x}_c\) in the camera coordinate system using the formula:

\[\mathbf{x}_c \sim R_w^c \cdot (\mathbf{X}_w - \mathbf{t}_w)\]

Description:
- \(R_w^c\) (types::GlobalPoses::rotations[i]):
  - A 3x3 rotation matrix.
  - Represents the rotation transformation from World coordinate system to Camera coordinate system.
- \(\mathbf{t}_w\) (types::GlobalPoses::translations[i]):
  - A 3x1 translation vector.
  - Represents the position of the camera center in the world coordinate system.
  - For the \(i\)-th camera, the pose will be written as [\(R_w^i, \mathbf{t}_w^i\)].

PoseFormat::RwTc

PoSDK also supports another common pose representation format PoseFormat::RwTc and provides conversion functions between the two formats.

Meaning: A 3D feature point \(\mathbf{X}_w\) in the world coordinate system is transformed to a normalized image feature point \(\mathbf{x}_c\) in the camera coordinate system using the formula:

\[\mathbf{x}_c \sim R_w^c \cdot \mathbf{X}_w + \mathbf{t}_c\]

Description:
- \(R_w^c\): Same definition as in RwTw, representing rotation from world to camera.
- \(\mathbf{t}_c\): A 3x1 translation vector representing the coordinates of the world coordinate system origin in the camera coordinate system. Note this differs from \(\mathbf{t}_w\) in RwTw.
Relationship with RwTw: \(\mathbf{t}_c = -R_w^c \cdot \mathbf{t}_w\)

The types::GlobalPoses class provides member function ConvertPoseFormat(target_format, ...), as well as convenience functions RwTw_to_RwTc(...) and RwTc_to_RwTw(...) to convert between these two formats.

2. Relative Camera Pose (`types::RelativePose`)

The relative pose between two cameras (view \(i\) and view \(j\)) is defined by a rotation matrix \(R_{ij}\) and a translation vector \(\mathbf{t}_{ij}\).

Meaning: A normalized image feature observation \(\mathbf{x}_i\) in camera \(i\) coordinate system is transformed to the corresponding point \(\mathbf{x}_j\) in camera \(j\) coordinate system using the formula:

\[\mathbf{x}_j \sim R_{ij} \cdot \mathbf{x}_i + \mathbf{t}_{ij}\]

Description:
- \(R_{ij}\) (types::RelativePose::Rij):
  - A 3x3 rotation matrix.
  - Represents the rotation transformation from Camera \(i\) coordinate system to Camera \(j\) coordinate system.
- \(\mathbf{t}_{ij}\) (types::RelativePose::tij):
  - A 3x1 translation vector.
  - Represents the coordinates of camera \(i\)’s origin in camera \(j\) coordinate system.

3. Convert `GlobalPoses` to `RelativePose`

If the global poses of two cameras are known (using PoSDK’s default RwTw format as an example):

Camera \(i\): \((R_w^i, \mathbf{t}_w^i)\)
Camera \(j\): \((R_w^j, \mathbf{t}_w^j)\)

Then their relative pose \((R_{ij}, \mathbf{t}_{ij})\) is calculated as follows:

Relative rotation \(R_{ij}\):

\[R_{ij} = R_w^j \cdot (R_w^i)^T\]

Relative translation \(\mathbf{t}_{ij}\):

\[\mathbf{t}_{ij} = R_w^j \cdot (\mathbf{t}_w^i - \mathbf{t}_w^j)\]

4. Image Coordinate Forms

4.1 Image Pixel Coordinates (2D)

Pixel coordinates: \(\mathbf{p} = (u, v)^T\), origin at top-left corner of image, \(u\) to the right, \(v\) downward.
- Units are pixels
- In PoSDK typically represented as std::vector<cv::Point2f> or Eigen::Matrix<double, 2, N>

4.2 Homogeneous Pixel Coordinates (3D)

Homogeneous pixel coordinates: \(\mathbf{p}_h = (u, v, 1)^T\), homogeneous representation of pixel coordinates.
- Convenient for matrix multiplication transformations
- After transformation, normalization is required through the third component
\[\begin{split}\mathbf{p}_c \sim K \cdot [R_w^c | -R_w^c \cdot t_w] \cdot \begin{bmatrix} X_w \\ 1 \end{bmatrix}\end{split}\]

Here \(K\) is the camera intrinsic matrix \(K = \begin{bmatrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix}\)

where \(f_x, f_y\) are focal lengths, \(c_x, c_y\) are principal point coordinates.

4.3 Normalized Image Coordinates (3D)

Normalized image coordinates: \(\mathbf{x} = (x, y, 1)^T\), representing a vector from the camera center pointing to a point on the image.
- Units are normalized distances (related to focal length)
- Obtained from pixel coordinates through the inverse transformation of camera intrinsic matrix \(K\):
\[\mathbf{x} \sim K^{-1} \cdot \mathbf{p}_h\]

4.4 Bearing Vector (3D)

Bearing Vector: \(\mathbf{b} = \frac{\mathbf{x}}{||\mathbf{x}||} \)
- Result of normalizing the normalized image coordinates \(\mathbf{x}\)
- Unit vector with constant magnitude of 1
- Represents the unit direction vector from camera center to image feature point
- In PoSDK, types::BearingVectors is an Eigen::Matrix<double, 3, Eigen::Dynamic>, where each column is a 3D bearing vector