# PoSDK Standard Pose Conventions To ensure consistency in library internal operations and clarity when interacting with external data, PoSDK follows the following standard conventions for camera pose representation. ## 1. Global Camera Pose (`types::GlobalPoses`) In PoSDK, the global pose of a single camera (i.e., the camera's orientation and position relative to the world coordinate system) is defined by a rotation matrix $R$ and a translation vector $\mathbf{t}$. By default, PoSDK uses the **`PoseFormat::RwTw`** format. ### PoseFormat::RwTw (Default) - **Meaning**: A 3D feature point $\mathbf{X}_w$ in the world coordinate system is transformed to a normalized image feature point $\mathbf{x}_c$ in the camera coordinate system using the formula: $$\mathbf{x}_c \sim R_w^c \cdot (\mathbf{X}_w - \mathbf{t}_w)$$ - **Description**: - $R_w^c$ (`types::GlobalPoses::rotations[i]`): - A 3x3 rotation matrix. - Represents the rotation transformation from **World coordinate system** to **Camera coordinate system**. - $\mathbf{t}_w$ (`types::GlobalPoses::translations[i]`): - A 3x1 translation vector. - Represents **the position of the camera center in the world coordinate system**. - For the $i$-th camera, the pose will be written as [$R_w^i, \mathbf{t}_w^i$]. ### PoseFormat::RwTc PoSDK also supports another common pose representation format **`PoseFormat::RwTc`** and provides conversion functions between the two formats. - **Meaning**: A 3D feature point $\mathbf{X}_w$ in the world coordinate system is transformed to a normalized image feature point $\mathbf{x}_c$ in the camera coordinate system using the formula: $$\mathbf{x}_c \sim R_w^c \cdot \mathbf{X}_w + \mathbf{t}_c$$ - **Description**: - $R_w^c$: Same definition as in `RwTw`, representing rotation from world to camera. - $\mathbf{t}_c$: A 3x1 translation vector representing **the coordinates of the world coordinate system origin in the camera coordinate system**. Note this differs from $\mathbf{t}_w$ in `RwTw`. - **Relationship with RwTw**: $\mathbf{t}_c = -R_w^c \cdot \mathbf{t}_w$ >> The `types::GlobalPoses` class provides member function `ConvertPoseFormat(target_format, ...)`, as well as convenience functions `RwTw_to_RwTc(...)` and `RwTc_to_RwTw(...)` to convert between these two formats. ## 2. Relative Camera Pose (`types::RelativePose`) The relative pose between two cameras (view $i$ and view $j$) is defined by a rotation matrix $R_{ij}$ and a translation vector $\mathbf{t}_{ij}$. - **Meaning**: A normalized image feature observation $\mathbf{x}_i$ in camera $i$ coordinate system is transformed to the corresponding point $\mathbf{x}_j$ in camera $j$ coordinate system using the formula: $$\mathbf{x}_j \sim R_{ij} \cdot \mathbf{x}_i + \mathbf{t}_{ij}$$ - **Description**: - $R_{ij}$ (`types::RelativePose::Rij`): - A 3x3 rotation matrix. - Represents the rotation transformation from **Camera $i$ coordinate system** to **Camera $j$ coordinate system**. - $\mathbf{t}_{ij}$ (`types::RelativePose::tij`): - A 3x1 translation vector. - Represents **the coordinates of camera $i$'s origin in camera $j$ coordinate system**. ## 3. Convert `GlobalPoses` to `RelativePose` If the global poses of two cameras are known (using PoSDK's default `RwTw` format as an example): - Camera $i$: $(R_w^i, \mathbf{t}_w^i)$ - Camera $j$: $(R_w^j, \mathbf{t}_w^j)$ Then their relative pose $(R_{ij}, \mathbf{t}_{ij})$ is calculated as follows: 1. **Relative rotation $R_{ij}$**: $$R_{ij} = R_w^j \cdot (R_w^i)^T$$ 2. **Relative translation $\mathbf{t}_{ij}$**: $$\mathbf{t}_{ij} = R_w^j \cdot (\mathbf{t}_w^i - \mathbf{t}_w^j)$$ ## 4. Image Coordinate Forms ### 4.1 Image Pixel Coordinates (2D) - **Pixel coordinates**: $\mathbf{p} = (u, v)^T$, origin at top-left corner of image, $u$ to the right, $v$ downward. - Units are pixels - In PoSDK typically represented as `std::vector` or `Eigen::Matrix` ### 4.2 Homogeneous Pixel Coordinates (3D) - **Homogeneous pixel coordinates**: $\mathbf{p}_h = (u, v, 1)^T$, homogeneous representation of pixel coordinates. - Convenient for matrix multiplication transformations - After transformation, normalization is required through the third component $$\mathbf{p}_c \sim K \cdot [R_w^c | -R_w^c \cdot t_w] \cdot \begin{bmatrix} X_w \\ 1 \end{bmatrix}$$ Here $K$ is the camera intrinsic matrix $K = \begin{bmatrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix}$ where $f_x, f_y$ are focal lengths, $c_x, c_y$ are principal point coordinates. ### 4.3 Normalized Image Coordinates (3D) - **Normalized image coordinates**: $\mathbf{x} = (x, y, 1)^T$, representing a vector from the camera center pointing to a point on the image. - Units are normalized distances (related to focal length) - Obtained from pixel coordinates through the inverse transformation of camera intrinsic matrix $K$: $$\mathbf{x} \sim K^{-1} \cdot \mathbf{p}_h$$ ### 4.4 Bearing Vector (3D) - **Bearing Vector**: $\mathbf{b} = \frac{\mathbf{x}}{||\mathbf{x}||} $ - Result of normalizing the normalized image coordinates $\mathbf{x}$ - Unit vector with constant magnitude of 1 - Represents the unit direction vector from camera center to image feature point - In PoSDK, `types::BearingVectors` is an `Eigen::Matrix`, where each column is a 3D bearing vector