A camera can be approximated by a projective model. We use the classic pinhole projection which models how a 3D world point is projected over the image plane of the camera. The camera is modeled as a light sensible surface composed by the following elements:
The projection is represented by two set of parameters called intrinsic and extrinsic parameters. Intrinsic parameters allows to model the optic component considering distortions and aberrations introduced by the lens in the image. Extrinsic parameters represent the camera position and orientation. Then, the projection matrix $\mathbf{P}$ can be decomposed into two basic matrices:
In a more compact matrix notation:
The calibration matrix of a camera $\mathbf{K}$ is expressed in terms of the intrinsic parameters of that camera:
The calibration matrix allows to establish a relation between the fundamental matrix $\mathbf{F}$ and the essential matrix $\mathbf{E}$ according to @hartley2003multiple
Intrinsics parameters for a given camera can be computed using the algorithm from @zhang2000flexible implemented by OpenCV. Lens transforms a point $r$ in the theoretical image plane into a point $r'$ over the real image plane according to the FOV-model as it was explained by @devernay2001straight. The next equations describe how distortions affect pixel coordinates:
There is $4 \times 4$ matrix which allows to recover the world 3D point from the image pixels $(x, y)$ combined with the $d$ obtained from the disparity map calculated with stereo correspondence.
where $(c_x, c_y)$ is the principal point of the camera, $T_x$ is the baseline between the two cameras and $f$ is the focal length in mm.