Programming Computer Vision with Python Test 1

李鬆峰發表於2013-09-23

Chapter 5. Multiple View Geometry

This chapter will show you how to handle multiple views and how to use the geometric relationships between them to recover camera positions and 3D structure. With images taken at different view points, it is possible to compute 3D scene points as well as camera locations from feature matches. We introduce the necessary tools and show a complete 3D reconstruction example. The last part of the chapter shows how to compute dense depth reconstructions from stereo images.

5.1 Epipolar Geometry Multiple view geometry is the field studying the relationship between cameras and features when there are correspondences between many images that are taken from varying viewpoints. The image features are usually interest points, and we will focus on that case throughout this chapter. The most important constellation is two-view geometry. With two views of a scene and corresponding points in these views, there are geometric constraints on the image points as a result of the relative orientation of the cameras, the properties of the cameras, and the position of the 3D points. These geometric relationships are described by what is called epipolar geometry. This section will give a very short description of the basic components we need. For more details on the subject, see [13]. Without any prior knowledge of the cameras, there is an inherent ambiguity in that a 3D point, X, transformed with an arbitrary (4 × 4) homography H as HX will have the same image point in a camera PH–1 as the original point in the camera P. Expressed with the camera equation, this is image with no caption Because of this ambiguity, when analyzing two view geometry we can always transform the cameras with a homography to simplify matters. Often this homography is just a rigid transformation to change the coordinate system. A good choice is to set the origin and coordinate axis to align with the first camera so that P1 = K1[I | 0] and P2 = K2[R | t]. Here we use the same notation as in Chapter 4; K1 and K2 are the calibration matrices, R is the rotation of the second camera, and t is the translation of the second camera. Using these camera matrices, one can derive a condition for the projection of a point X to image points x1 and x2 (with P1 and P2, respectively). This condition is what makes it possible to recover the camera matrices from corresponding image points. The following equation must be satisfied: Equation 5-1.

where image with no caption and the matrix St is the skew symmetric matrix Equation 5-2.

Equation (Equation 5-1) is called the epipolar constraint. The matrix F in the epipolar constraint is called the fundamental matrix and as you can see, it is expressed in components of the two camera matrices (their relative rotation R and translation t). The fundamental matrix has rank 2 and det(F) = 0. This will be used in algorithms for estimating F. The equations above mean that the camera matrices can be recovered from F, which in turn can be computed from point correspondences, as we will see later. Without knowing the internal calibration (K1 and K2), the camera matrices are only recoverable up to a projective transformation. With known calibration, the reconstruction will be metric. A metric reconstruction is a 3D reconstruction that correctly represents distances and angles.[13] There is one final piece of geometry needed before we can proceed to actually using this theory on some image data. Given a point in one of the images, for example x2 in the second view, equation (Equation 5-1) defines a line in the first image since image with no caption The equation determines a line with all points x1 in the first image satisfying the equation belonging to the line. This line is called an epipolar line corresponding to the point x2. This means that a corresponding point to x2 must lie on this line. The fundamental matrix can therefore help the search for correspondences by restricting the search to this line. An illustration of epipolar geometry. A 3D point X is projected to x1 and x2, in the two views, respectively. The baseline between the two camera centers, C1 and C2, intersect the image planes in the epipoles, e1 and e2. The lines l1 and l2 are called epipolar lines. Figure 5-1. An illustration of epipolar geometry. A 3D point X is projected to x1 and x2, in the two views, respectively. The baseline between the two camera centers, C1 and C2, intersect the image planes in the epipoles, e1 and e2. The lines l1 and l2 are called epipolar lines. The epipolar lines all meet in a point, e, called the epipole. The epipole is actually the image point corresponding to the projection of the other camera center. This point can be outside the actual image, depending on the relative orientation of the cameras. Since the epipole lies on all epipolar lines, it must satisfy Fe1 = 0. It can, therefore, be computed as the null vector of F, as we will see later. The other epipole can be computed from the relation . The epipoles and the epipolar lines are illustrated in Figure 5-1.

相關文章