First, to provide a clearer guidance throught this document we provide defintions for some basic concepts arising from differential topology and algebra:
Principal fiber bundle: a fiber bundle attached to a group $G$ whose free and transitive action is applied over each fiber so that each one is a principal homogeneous space. The group is also the structure group of the bundle. Every vector bundle has an associated principal fiber bundle, whose structural group is a subgroup $H$ of the general linear group $GL(r, \mathbb{R})$; in the real case usually, $H$ is a "classical" subgroup of $GL(r, \mathbb{R})$ such as the orthogonal $O(r)$, special linear $SL(r)$ or special orthogonal group $SO(r) = O(r)\cap SL(r)$. However, a principal group is not necessarily a vector bundle, because its fiber can be a group, a quotient of groups (a grassmannian, e.g.) or other objects associated to references (feature vectors, e.g.)
Homeomorphism: a biyective and bicontinous application between two topological groups. It can be viewed just as a continuous homeomorphism between the same two groups. They are mappings that preserve the topological properties of a given space.
There are two different transformation groups:
Before being in more detailed explanation it is necessary to define two common concepts used in recognition and reconstruction.
There are two more concepts commonly used in pattern recognition:
The idea consists of defining the supervised learning by using functions (escalar or vector) and/or distributions (vector or tensor) defined on manifolds. Hence, our approach is a natural extension of Learning Manifolds strategies which allows to incorproate differential methods to the Statistical Learning Theory. Distributions are collections of vector fields which are defined over manifolds which parametrize states of the system; so, the manifold learning framework can be included in our approach. In our case, distributions are estimated from statistical properties evaluated on collections of small displacements (vector fields) arising from tracking control points in video sequences. Fibrations and fiber bundles are mandatory in order to link these heterogeneous elements in a more compact framework. Furthermore, this approach introduces some relevant advantages:
Detectors are functionals over the base space $B$ of a fibration, while descriptors are functionals which are defined onto the fiber $F_b$ of the fibration at each point $b\in B$. In this way, patterns and models share information using morphisms between fibrations and fiber bundles.
A basic hierarchy between patterns and models is formulated in terms of groupoids on the space base (given by homeomorphisms, typically) and transformation groups on the fiber $F_b$, respectively. Gauge groups allow to deal with both of them since they can be applied both continuous mechanics and classical mechanics including Hamilton (PDE systems) and Euler formulations (energy minimization)
It allows to model dynamic phenomena. A vector field is a geometric representation of a linear differential equation; it proveides a model for motion's particle at a control point $b\in B$ in the optical flow of a video; its behavior is extended to the neighbors $b'\in U_b$ for a small open set $U_b$ centered at $b\in B$ vector-field. In a similar way, a PDE (Partial Differential Equations) provides models for deformations and their extensions (propagation or restoration models, e.g.), which can also be viewed as a section of a fibration. Let us remark that a fibration supports discontinuities at topological behavior of the fiber attached to near base points of the fibration, which are not allowed for fiber vector bundles (v.b.). Hence, fibrations are more flexible than v.b.
vector-field. A vector field can be modeled as a local section $S: U \rightarrow E$ of a vector fiber bundle (a local section is a smooth application with the feature vectors $Fb$ as values such that $\pi{\xi}\circ s = 1_{B}$ where $1_B$ denotes the identity map of $B$ (this condition is introduced to guarantee compatibility between data en in the base and the fiber). ↩
Space-temporal variations in observable "geometric quantities"' can be represented as a displacement along a manifold which is modified along the path. The displacement is formally given by a covariant derivative defined on a v.b. or their extensions to fibrations. In order to achieve a feedback between recognition and 3D reconstruction we would need a functor which assigns to each fibration (the locus where live patterns for recognition) corresponding to a collection of observable features in a fiber bundle (ideal model given by configurations of vectors) representing a simplified vision of reality. Inversely, the v.b. structure can be considered as PL-approach to more general fibrations structures. This claim can be extended to principal groups and gauge fibrations, because the linearisation of the group of homeomorphisms fixing a base point $b\in B$ is $GL()$.
Global Optimization procedures based in energy functionals, such as Total Variation (TV) from @chan2005aspects, can be applied over fibrations in a "probabilistic" context and over fiber bundles space in a "deterministic" context. The process consists of convoluting the reconstruction function using the flux divergence of the simultaneous recognition process. Then, the "probabilistic" version is equivalent to minimize an energy functional using a Hausdorff distance over a fibration to prevent uncontrolled variations in shape irregularities. The "deterministic" version is equivalent to minimize an energy functional using a weighted sum of a L1-norm (for scalar component linked to potential energy) and a L2 norm (for vector component linked to kinetic energy) over a vectorial fiber bundle. The use of L2-norm on the tangent space $T_{b}B$ is equivalent to the reduction of the general linear group to the euclidean group in the tangent space at each base point $b\in B$.
This section summarizes the idea behind the general framework using concepts and results from local differential topology. The visualization of a 3D reconstruction can be represented as a transformation belonging to a classical group. For instance,
The use of structural groups for each framework allows to represent the displacement of references linked to feature vectors (elements of a fiber $Fb$) and its interpretation in the euclidean or the affine framework (including apparent deformations due to projections onto the image plane). But these representations provide a support for kinematic aspects, because the tangent space $T{b}B\ := \pi^{-1}_{\xi}(b) \simeq \mathbb{R}^{n}$ for the tangent space to descriptors is a cartesian space which provides the support for euclidean and affine structures. The initial idea of SFM methods is modeled on k-tuples of points in the ordinary space $\mathbb{R}^{3}$ which are tracked along $m$ images giving a spatial configuration and its planar projections. The spatial configuration is represented by $n = 3k$ data giving a point of $\mathbb{R}^{n}$, and planar configurations are represented by $p = 2km$ data giving a point of $\mathbb{R}^{p}$. Hence the correspondence between spatial configurations and their planar projections is ideally represented by a map $\mathbf{f}: \mathbb{R}^n \rightarrow \mathbb{R}^p$ which we suppose locally continuous, i.e. $\mathbf{f} \in C^{0} (n, p)$.
In these terms, the most general groupoids acting on source and target space are given by the homeomorphisms $Homeom{\underline{x}} (\mathbb{R}^{n})$ preserving a base point $\underline{x}\in \mathbb{R}^{n}$ (given by 3D configuration of control points) and the homeomorphisms $Homeom{\underline{y}} (\mathbb{R}^{p})$ preserving the base point $\underline{y}\in \mathbb{R}^{p}$ corresponding to the projection of the configuration $\underline{x}$ on each image plane. The linearization of each one of such homeomorphism is the general lineal group, i.e. $T{e} Homeom{\underline{x}} (\mathbb{R}^{n}) = GL(n; \mathbb{R})$ and similarly for the target space. This simple remark is the key for connecting (apparent or real) deformations with the linear approach based in feature vectors; in other words for connecting recognition with reconstruction issues.
Furthermore, this approach can be extended in a natural way from static to dynamic case. Indeed, motion kinematics associated to the camera motion can be read in terms of differential calculus linked to the map $\mathbf{f} \in C^{0} (n, p)$, including aspects such as optical flow or others. Let us remark that this approach is compatible with rigid motions in the source space and apparent deformations on images for the whole configurations.
To ease the management of the above information is convenient to linearize the above application. This transformation is motivated by a reformulation of the optica flow in terms of control points to be tracked. Following the idea of the optical flow from @tomasi1992shape and extended by @ma2001optimization, we can define a base application
Its differential provides a linearization represented by a transformation using the fiber tangent bundles of the global cartesian spaces
The tangent space $\mathbb{TR}^n$ at each point is the fiber $T\underline{x} \mathbb{R}^{n+}$; in particular, each section on the fiber is represented by the position and the current velocity of the base configuration, given by all the tracked feature points $\mathbf{P}{1}, \ldots, \mathbf{P}_{k}$ with their velocities. The fiber tangent bundle of the cartesian space is trivial:
The above arguments can be easiliy extended for "true" representations (euclidean framework) and to allow apparent deformations (arising from projections). It suffices to consider superimposed structures (euclidean, affine) to the cartesian space and extend the general linear group to the semidirect product by the group of traslations.
In the general case of arbitrary deformations (to compare similar but not identical shapes), it is necessary to work with gauge transformations on fibrations which provide the natural extension of structural groups (classical groups including euclidean, affine, similariity, projective groups) on vector bundles. In the extended case, the tangent space can be viewed as the fiber of a fibration whose base is a general deformation of the configurations space. The latter space can be linearized to recover structure from motion and viceversa in a more general framework including possible deformations. The fibration approach introduces two advantages:
As resume a fibration or fiber bundle extends the notion of a vector fiber bundle including different options such as: the fiber of the fibration can be a variety (not necessarily a manifold, including singularities and/or discrete collections of points, e.g.) of arbitrary dimension, such as the geometric elements recovered from 3D reconstruction (points, lines, planes, curves, etc). The same idea is used extensively in the field of manifold learning for non-linear dimensionality reduction (a generalization of the Principal Component Analysis method). In general, the fibers of a fibration can be varieties of arbitrary dimension and shape which can be patched together by lifting the information arising from the base space.
The introduction of classical groups and their linearization on tangent vector bundles (as linear approach to manifolds) has been already performed by @ma2001optimization with a remarkable simplification for optimization issues in 3D Reconstruction processes. In this work, we propose a similar scheme for gauge transformations acting on varieties which can be "fibered" by locally trivial fibrations. Any classical group can be used to recognize objects from their projections; however, this approach is strongly limited by small variations of objects to be recognized, because data are not related between them through a classical transformation.
Our scheme, allows an extension of the structural viewpoint (compatible with apparent deformations) with a based-appearances approaches linked to configurations of points and/or feature vectors. A key idea is to extend the action of the structural group of the principal fiber bundle for reconstructed objects to gauge group acting on a fibration. So, two simple homeomorphisms acting on source and target spaces convert the space of vector maps $\mathbf{f} \in C^{0}(n, p)$ in a Gauge space as prototype for active recognition and reconstruction from motion.