提交测试

2024-01-16 17:22:21 +08:00
parent 92862c0372
commit 73635fda01
654 changed files with 178015 additions and 2 deletions
--- a/docs/notes/differentiable_camera.rst
+++ b/docs/notes/differentiable_camera.rst
@@ -0,0 +1,237 @@
+Differentiable Camera
+*********************
+
+.. _differentiable_camera:
+
+Camera class
+============
+
+.. _camera_class:
+
+:class:`kaolin.render.camera.Camera` is a one-stop class for all camera related differentiable / non-differentiable transformations.
+Camera objects are represented by *batched* instances of 2 submodules:
+
+    - :ref:`CameraExtrinsics <camera_extrinsics_class>`: The extrinsics properties of the camera (position, orientation).
+      These are usually embedded in the view matrix, used to transform vertices from world space to camera space.
+    - :ref:`CameraIntrinsics <camera_intrinsics_class>`: The intrinsics properties of the lens
+      (such as field of view / focal length in the case of pinhole cameras).
+      Intrinsics parameters vary between different lens type,
+      and therefore multiple CameraIntrinsics subclasses exist,
+      to support different types of cameras: pinhole / perspective, orthographic, fisheye, and so forth.
+      For pinehole and orthographic lens, the intrinsics are embedded in a projection matrix.
+      The intrinsics module can be used to transform vertices from camera space to Normalized Device Coordinates.
+
+.. note::
+    To avoid tedious invocation of camera functions through
+    ``camera.extrinsics.someop()`` and ``camera.intrinsics.someop()``, kaolin overrides the ``__get_attributes__``
+    function to forward any function calls of ``camera.someop()`` to
+    the appropriate extrinsics / intrinsics submodule.
+
+The entire pipeline of transformations can be summarized as (ignoring homogeneous coordinates)::
+
+    World Space                                         Camera View Space
+         V         ---CameraExtrinsics.transform()--->         V'          ---CameraIntrinsics.transform()---
+    Shape~(B, 3)            (view matrix)                  Shape~(B, 3)                                     |
+                                                                                                            |
+                                                                           (linear lens: projection matrix) |
+                                                                                  + homogeneus -> 3D        |
+                                                                                                            V
+                                                                                 Normalized Device Coordinates (NDC)
+                                                                                            Shape~(B, 3)
+    When using view / projection matrices, conversion to homogeneous coordinates is required.
+    Alternatively, the `transform()` function takes care of such projections under the hood when needed.
+
+How to apply transformations with kaolin's Camera:
+    1. Linear camera types, such as the commonly used pinhole camera,
+       support the :func:`view_projection_matrix()` method.
+       The returned matrix can be used to transform vertices through pytorch's matrix multiplication, or even be
+       passed to shaders as a uniform.
+    2. All Cameras are guaranteed to support a general :func:`transform()` function
+       which maps coordinates from world space to Normalized Device Coordinates space.
+       For some lens types which perform non linear transformations,
+       the :func:`view_projection_matrix()` is non-defined.
+       Therefore the camera transformation must be applied through
+       a dedicated function. For linear cameras,
+       :func:`transform()` may use matrices under the hood.
+    3. Camera parameters may also be queried directly.
+       This is useful when implementing camera params aware code such as ray tracers.
+How to control kaolin's Camera:
+    - :class:`CameraExtrinsics`: is packed with useful methods for controlling the camera position and orientation:
+      :func:`translate() <CameraExtrinsics.translate()>`,
+      :func:`rotate() <CameraExtrinsics.rotate()>`,
+      :func:`move_forward() <CameraExtrinsics.move_forward()>`,
+      :func:`move_up() <CameraExtrinsics.move_up()>`,
+      :func:`move_right() <CameraExtrinsics.move_right()>`,
+      :func:`cam_pos() <CameraExtrinsics.cam_pos()>`,
+      :func:`cam_up() <CameraExtrinsics.cam_up()>`,
+      :func:`cam_forward() <CameraExtrinsics.cam_forward()>`,
+      :func:`cam_up() <CameraExtrinsics.cam_up()>`.
+    - :class:`CameraIntrinsics`: exposes a lens :func:`zoom() <CameraIntrinsics.zoom()>`
+      operation. The exact functionality depends on the camera type.
+How to optimize the Camera parameters:
+    - Both :class:`CameraExtrinsics`: and :class:`CameraIntrinsics` maintain
+      :class:`torch.Tensor` buffers of parameters which support pytorch differentiable operations.
+    - Setting ``camera.requires_grad_(True)`` will turn on the optimization mode.
+    - The :func:`gradient_mask` function can be used to mask out gradients of specific Camera parameters.
+
+    .. note::
+        :class:`CameraExtrinsics`: supports multiple representions of camera parameters
+        (see: :func:`switch_backend <CameraExtrinsics.switch_backend()>`).
+        Specific representations are better fit for optimization
+        (e.g.: they maintain an orthogonal view matrix).
+        Kaolin will automatically switch to using those representations when gradient flow is enabled
+        For non-differentiable uses, the default representation may provide better
+        speed and numerical accuracy.
+
+Other useful camera properties:
+    - Cameras follow pytorch in part, and support arbitrary ``dtype`` and ``device`` types through the
+      :func:`to()`, :func:`cpu()`, :func:`cuda()`, :func:`half()`, :func:`float()`, :func:`double()`
+      methods and :func:`dtype`, :func:`device` properties.
+    - :class:`CameraExtrinsics`: and :class:`CameraIntrinsics`: individually support the :func:`requires_grad`
+      property.
+    - Cameras implement :func:`torch.allclose` for comparing camera parameters under controlled numerical accuracy.
+      The operator ``==`` is reserved for comparison by ref.
+    - Cameras support batching, either through construction, or through the :func:`cat()` method.
+
+    .. note::
+        Since kaolin's cameras are batched, the view/projection matrices are of shapes :math:`(\text{num_cameras}, 4, 4)`,
+        and some operations, such as :func:`transform()` may return values as shapes of :math:`(\text{num_cameras}, \text{num_vectors}, 3)`.
+
+Concluding remarks on coordinate systems and other confusing conventions:
+    - kaolin's Cameras assume column major matrices, for example, the inverse view matrix (cam2world) is defined as:
+
+      .. math::
+          \begin{bmatrix}
+              r1 & u1 & f1 & px \\
+              r2 & u2 & f2 & py \\
+              r3 & u3 & f3 & pz \\
+              0 & 0 & 0 & 1
+          \end{bmatrix}
+
+      This sometimes causes confusion as the view matrix (world2cam) uses a transposed 3x3 submatrix component,
+      which despite this transposition is still column major (observed through the last `t` column):
+
+      .. math::
+          \begin{bmatrix}
+              r1 & r2 & r3 & tx \\
+              u1 & u2 & u3 & ty \\
+              f1 & f2 & f3 & tz \\
+              0 & 0 & 0 & 1
+          \end{bmatrix}
+
+    - kaolin's cameras do not assume any specific coordinate system for the camera axes. By default, the
+      right handed cartesian coordinate system is used. Other coordinate systems are supported through
+      :func:`change_coordinate_system() <CameraExtrinsics.change_coordinate_system()>`
+      and the ``coordinates.py`` module::
+
+            Y
+            ^
+            |
+            |---------> X
+           /
+         Z        - kaolin's NDC space is assumed to be left handed (depth goes inwards to the screen).
+
+      The default range of values is [-1, 1].
+
+CameraExtrinsics class
+======================
+
+.. _camera_extrinsics_class:
+
+    :class:`kaolin.render.camera.CameraExtrinsics` holds the extrinsics parameters of a camera: position and orientation in space.
+
+    This class maintains the view matrix of camera, used to transform points from world coordinates
+    to camera / eye / view space coordinates.
+
+    This view matrix maintained by this class is column-major, and can be described by the 4x4 block matrix:
+
+    .. math::
+
+        \begin{bmatrix}
+            R & t \\
+            0 & 1
+        \end{bmatrix}
+
+    where **R** is a 3x3 rotation matrix and **t** is a 3x1 translation vector for the orientation and position
+    respectively.
+
+    This class is batched and may hold information from multiple cameras.
+
+    :class:`CameraExtrinsics` relies on a dynamic representation backend to manage the tradeoff between various choices
+    such as speed, or support for differentiable rigid transformations.
+    Parameters are stored as a single tensor of shape :math:`(\text{num_cameras}, K)`,
+    where K is a representation specific number of parameters.
+    Transformations and matrices returned by this class support differentiable torch operations,
+    which in turn may update the extrinsic parameters of the camera::
+
+                                 convert_to_mat
+            Backend                 ---- >            Extrinsics
+        Representation R                             View Matrix M
+        Shape (num_cameras, K),                    Shape (num_cameras, 4, 4)
+                                    < ----
+                                 convert_from_mat
+
+    .. note::
+
+        Unless specified manually with :func:`switch_backend`,
+        kaolin will choose the optimal representation backend depending on the status of ``requires_grad``.
+    .. note::
+
+        Users should be aware, but not concerned about the conversion from internal representations to view matrices.
+        kaolin performs these conversions where and if needed.
+
+    Supported backends:
+
+        - **"matrix_se3"**\: A flattened view matrix representation, containing the full information of
+          special euclidean transformations (translations and rotations).
+          This representation is quickly converted to a view matrix, but differentiable ops may cause
+          the view matrix to learn an incorrect, non-orthogonal transformation.
+        - **"matrix_6dof_rotation"**\: A compact representation with 6 degrees of freedom, ensuring the view matrix
+          remains orthogonal under optimizations. The conversion to matrix requires a single Gram-Schmidt step.
+
+        .. seealso::
+
+            `On the Continuity of Rotation Representations in Neural Networks, Zhou et al. 2019
+            <https://arxiv.org/abs/1812.07035>`_
+
+    Unless stated explicitly, the definition of the camera coordinate system used by this class is up to the
+    choice of the user.
+    Practitioners should be mindful of conventions when pairing the view matrix managed by this class with a projection
+    matrix.
+
+CameraIntrinsics class
+======================
+
+.. _camera_intrinsics_class:
+
+    :class:`kaolin.render.camera.CameraIntrinsics` holds the intrinsics parameters of a camera:
+    how it should project from camera space to normalized screen / clip space.
+
+    The instrinsics are determined by the camera type, meaning parameters may differ according to the lens structure.
+    Typical computer graphics systems commonly assume the intrinsics of a pinhole camera (see: :class:`PinholeIntrinsics` class).
+    One implication is that some camera types do not use a linear projection (i.e: Fisheye lens).
+
+    There are therefore numerous ways to use CameraIntrinsics subclasses:
+
+        1. Access intrinsics parameters directly.
+        This may typically benefit use cases such as ray generators.
+        2. The :func:`transform()` method is supported by all CameraIntrinsics subclasses,
+        both linear and non-linear transformations, to project vectors from camera space to normalized screen space.
+        This method is implemented using differential pytorch operations.
+        3. Certain CameraIntrinsics subclasses which perform linear projections, may expose the transformation matrix
+        via dedicated methods.
+        For example, :class:`PinholeIntrinsics` exposes a :func:`projection_matrix()` method.
+        This may typically be useful for rasterization based rendering pipelines (i.e: OpenGL vertex shaders).
+
+    This class is batched and may hold information from multiple cameras.
+    Parameters are stored as a single tensor of shape :math:`(\text{num_cameras}, K)` where K is the number of
+    intrinsic parameters.
+
+    currently there are two subclasses of intrinsics: :class:`kaolin.render.camera.OrthographicIntrinsics` and
+    :class:`kaolin.render.camera.PinholeIntrinsics`.
+
+API Documentation:
+------------------
+
+* Check all the camera classes and functions at the :ref:`API documentation<kaolin.render.camera>`.
+