PoSDK Standard Pose Conventions
To ensure consistency in library internal operations and clarity when interacting with external data, PoSDK follows the following standard conventions for camera pose representation.
1. Global Camera Pose (types::GlobalPoses)
In PoSDK, the global pose of a single camera (i.e., the camera’s orientation and position relative to the world coordinate system) is defined by a rotation matrix \(R\) and a translation vector \(\mathbf{t}\). By default, PoSDK uses the PoseFormat::RwTw format.
PoseFormat::RwTw (Default)
Meaning: A 3D feature point \(\mathbf{X}_w\) in the world coordinate system is transformed to a normalized image feature point \(\mathbf{x}_c\) in the camera coordinate system using the formula:
Description:
\(R_w^c\) (
types::GlobalPoses::rotations[i]):A 3x3 rotation matrix.
Represents the rotation transformation from World coordinate system to Camera coordinate system.
\(\mathbf{t}_w\) (
types::GlobalPoses::translations[i]):A 3x1 translation vector.
Represents the position of the camera center in the world coordinate system.
For the \(i\)-th camera, the pose will be written as [\(R_w^i, \mathbf{t}_w^i\)].
PoseFormat::RwTc
PoSDK also supports another common pose representation format PoseFormat::RwTc and provides conversion functions between the two formats.
Meaning: A 3D feature point \(\mathbf{X}_w\) in the world coordinate system is transformed to a normalized image feature point \(\mathbf{x}_c\) in the camera coordinate system using the formula:
Description:
\(R_w^c\): Same definition as in
RwTw, representing rotation from world to camera.\(\mathbf{t}_c\): A 3x1 translation vector representing the coordinates of the world coordinate system origin in the camera coordinate system. Note this differs from \(\mathbf{t}_w\) in
RwTw.
Relationship with RwTw: \(\mathbf{t}_c = -R_w^c \cdot \mathbf{t}_w\)
The
types::GlobalPosesclass provides member functionConvertPoseFormat(target_format, ...), as well as convenience functionsRwTw_to_RwTc(...)andRwTc_to_RwTw(...)to convert between these two formats.
2. Relative Camera Pose (types::RelativePose)
The relative pose between two cameras (view \(i\) and view \(j\)) is defined by a rotation matrix \(R_{ij}\) and a translation vector \(\mathbf{t}_{ij}\).
Meaning: A normalized image feature observation \(\mathbf{x}_i\) in camera \(i\) coordinate system is transformed to the corresponding point \(\mathbf{x}_j\) in camera \(j\) coordinate system using the formula:
Description:
\(R_{ij}\) (
types::RelativePose::Rij):A 3x3 rotation matrix.
Represents the rotation transformation from Camera \(i\) coordinate system to Camera \(j\) coordinate system.
\(\mathbf{t}_{ij}\) (
types::RelativePose::tij):A 3x1 translation vector.
Represents the coordinates of camera \(i\)’s origin in camera \(j\) coordinate system.
3. Convert GlobalPoses to RelativePose
If the global poses of two cameras are known (using PoSDK’s default RwTw format as an example):
Camera \(i\): \((R_w^i, \mathbf{t}_w^i)\)
Camera \(j\): \((R_w^j, \mathbf{t}_w^j)\)
Then their relative pose \((R_{ij}, \mathbf{t}_{ij})\) is calculated as follows:
Relative rotation \(R_{ij}\):
Relative translation \(\mathbf{t}_{ij}\):
4. Image Coordinate Forms
4.1 Image Pixel Coordinates (2D)
Pixel coordinates: \(\mathbf{p} = (u, v)^T\), origin at top-left corner of image, \(u\) to the right, \(v\) downward.
Units are pixels
In PoSDK typically represented as
std::vector<cv::Point2f>orEigen::Matrix<double, 2, N>
4.2 Homogeneous Pixel Coordinates (3D)
Homogeneous pixel coordinates: \(\mathbf{p}_h = (u, v, 1)^T\), homogeneous representation of pixel coordinates.
Convenient for matrix multiplication transformations
After transformation, normalization is required through the third component
\[\begin{split}\mathbf{p}_c \sim K \cdot [R_w^c | -R_w^c \cdot t_w] \cdot \begin{bmatrix} X_w \\ 1 \end{bmatrix}\end{split}\]Here \(K\) is the camera intrinsic matrix \(K = \begin{bmatrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix}\)
where \(f_x, f_y\) are focal lengths, \(c_x, c_y\) are principal point coordinates.
4.3 Normalized Image Coordinates (3D)
Normalized image coordinates: \(\mathbf{x} = (x, y, 1)^T\), representing a vector from the camera center pointing to a point on the image.
Units are normalized distances (related to focal length)
Obtained from pixel coordinates through the inverse transformation of camera intrinsic matrix \(K\):
\[\mathbf{x} \sim K^{-1} \cdot \mathbf{p}_h\]
4.4 Bearing Vector (3D)
Bearing Vector: \(\mathbf{b} = \frac{\mathbf{x}}{||\mathbf{x}||} \)
Result of normalizing the normalized image coordinates \(\mathbf{x}\)
Unit vector with constant magnitude of 1
Represents the unit direction vector from camera center to image feature point
In PoSDK,
types::BearingVectorsis anEigen::Matrix<double, 3, Eigen::Dynamic>, where each column is a 3D bearing vector