《Structure-from-motion.ppt》由会员分享,可在线阅读,更多相关《Structure-from-motion.ppt(51页珍藏版)》请在得力文库 - 分享文档赚钱的网站上搜索。
1、What can we compute from a collection of pictures?- 3D structure- camera poses and parametersOne of the most important / exciting results in computer vision from 90sIt is difficult, largely due to numerical computation in practice.But this is SO powerful!2 SIGGRAPH papers with several sketches this
2、year!show a few demo videosNow lets see how this works!Input: (1) A collection of pictures.Output:(1) camera parameters(2) sparse 3D scene structureConsider 1 camera firstWhats the relation between pixels and rays in space?10100ZYXffZfYfX10101011ZYXffZfYfXPXx 0| I) 1 ,(diagPffC-XRXcamX10RCR110CRRXca
3、mZYXcamX0| IKx XC| IKRxC| IKRPP is a 3x4 Matrix7 degree of freedom:1 from focal length3 from rotation3 from translationt |RKP Simplified projective camera model Px = P X = K R | t Xx = P XConsider 1 cameraP3x4 has 7 degrees of freedomGiven one image, we observe xCan we recover X or P?If P is known,
4、what do we know about X?If X is known, can we recover P?# unknown = 7Each X gives 2 equations2n = 7 i.e. n = 4This is a Camera Calibration ProblemInput: n4 world to image point correspondences Xi xiOutput:camera parameters P = KR|TDirect Linear Transform (DLT)iiPXx iiPXxwhere Xix = 0 -w yw 0 x-y x 0
5、 Direct Linear Transform (DLT)n 4 pointsApminimize subject to constraint 1p use SVDTVUA p is the last column vector of V: p = VnObjectiveGiven n4, 3D to 2D point correspondences Xixi, determine PAlgorithmLinear solution: Normalization: DLTMinimization of geometric error: Iteratively optimization (Le
6、venberg-Marquardt):(i)Denormalization:iiUXXiiTxxUPTP-1Implementation in PracticeCamera centre C is the point for which PC = 0i.e. the right null vector of PObjectiveGiven camera projection matrix P, decompose P = KR|tAlgorithm Perform RQ decomposition of M, so that K is the upper-triangular matrix a
7、nd R is orthonormal matrix.write M = KR, then P = MI|- CHow to recover K, R and t from P?P = KR|t = KR|-RC = KRI|-CThis is what we learn from 1 CameraLets consider 2 camerasCorrespondence geometry: Given an image point x in the first image, how does this constrain the position of the corresponding p
8、oint x in the second image?(ii) Camera geometry (motion): Given a set of corresponding image points xi xi, i=1,n, what are the cameras P and P for the two views?Correspondence geometry: Given an image point x in the first image, how does this constrain the position of the corresponding point x in th
9、e second image?The Fundamental Matrix FxT Fx = 0What does Fundamental Matrix F tell us?xT Fx = 0Fundamental matrix F relates corresponding pixelsIf the intrinsic parameter (i.e. focal length in our camera model) of both cameras are known, as K and K.Then we can derive (not here) that: KTFK = t cross
10、 product Rt and R are translation and rotation for the 2nd camerai.e. P = I|0 and P = R|tGood thing is that xT Fx = 0Fundamental matrix F can be computed:from a set of pixel correspondences: x xCompute F from correspondence:0FxxTseparate known from unknown0333231232221131211fyfxffyyfyxfyfxyfxxfx0,1
11、, , ,T333231232221131211fffffffffyxyyyxyxyxxx(data)(unknowns)(linear)0Af 0f11111111111111nnnnnnnnnnnnyxyyyxyxyxxxyxyyyxyxyxxxHow many correspondences do we need?What can we do now?(1) Given F, K and K, we can estimate the relative translationand rotation for two cameras:(2) Given 8 correspondences:
12、x x, we can compute FP = I | 0 and P = R | tGiven K and K, and 8 correspondences x x, we can compute: P = I | 0 and P = R | t This answers the 2nd questionCorrespondence geometry: Given an image point x in the first image, how does this constrain the position of the corresponding point x in the seco
13、nd image?(ii) Camera geometry (motion): Given a set of corresponding image points xi xi, i=1,n, what are the cameras P and P for the two views?But how to make this automatic?Given K and K, and 8 correspondences x x, we can compute: P = I | 0 and P = R | t (1) Estimating intrinsic K and K (auto-calib
14、ration) will not be discussed here. (involve much projective geometry knowledge)(2) Lets see how to find correspondences automatically. (i.e. Feature detection and matching)Lowes SIFT features invariant to with position, orientation and scaleScale Look for strong responses of DOG filter (Difference-
15、Of-Gaussian) over scale space Only consider local maxima in both position and scale Orientation Create histogram of local gradient directions computed at selected scale Assign canonical orientation at peak of smoothed histogram Each key specifies stable 2D coordinates (x, y, scale, orientation)02Sim
16、ple matchingFor each feature in image 1 find the feature in image 2 that is most similar (compute correlation of two vectors) and vice-versaKeep mutual best matchesCan design a very robust RANSAC type algorithmWhat have we learnt so far?What have we learnt so far?Consider more then 2 camerasKKPPXPOb
17、jectiveGiven N images Q1, , QN with reasonable overlapsCompute N camera projection matrices P1, , PN , where each Pi = KiRi |ti, Ki is the intrinsic parameter, Ri and ti are rotation and translation matrix respectivelyAlgorithm(1) Find M tracks T = T1, T2, , TN (i ) for every pair of image Qi , Qj:
18、detect SIFT feature points in Qi and Qj match feature points robustly (RANSAC)(ii) match features across multiple images, construct tracks.(2) Estimate P1 PN and 3D position for each track X1 XN (i ) select one pair of image Q1 , Q2 (well-conditioned). Let T12 = their associate overlapping track;(ii
19、) Estimate K1 and K2, compute P1 , P2 and 3D position of T12 from fundamental matrix.(iii) incrementally add new camera Pk into the system, estimate its camera matrix by DLT (calibration) (iv) repeat (iii) until all the cameras are estimated.Algorithm(1) Find M tracks T = T1, T2, , TN (i ) for every
20、 pair of image Qi , Qj: detect SIFT feature points in Qi and Qj match feature points robustly (RANSAC)(ii) match features across multiple images, construct tracks.(2) Estimate P1 PN and 3D position for each track X1 XN (i ) select one pair of image Q1 , Q2 (well-conditioned). Let T12 = their associa
21、te overlapping track;(ii) Estimate K1 and K2, compute P1 , P2 and 3D position of T12 from fundamental matrix.(iii) incrementally add new camera Pk into the system, estimate its camera matrix by DLT (calibration) (iv) repeat (iii) until all the cameras are estimated.However, this wont work!Algorithm(
22、1) Find M tracks T = T1, T2, , TN (i ) for every pair of image Qi , Qj: detect SIFT feature points in Qi and Qj match feature points robustly (RANSAC)(ii) match features across multiple images, construct tracks.(2) Estimate P1 PN and 3D position for each track X1 XN (i ) select one pair of image Q1
23、, Q2 (well-conditioned). Let T12 = their associate overlapping track;(ii) Estimate K1 and K2, compute P1 , P2 and 3D position of T12 from fundamental matrix. Then non-linearly minimize reprojection errors (LM).(iii) incrementally add new camera Pk into the system, estimate initial value by DLT, then
24、 non-linearly optimize the system. (iv) repeat (iii) until all the cameras are estimated.Replaces with more robust non-linear optimizationTired?Recall the camera calibration algorithmObjectiveGiven n4, 3D to 2D point correspondences Xixi, determine PAlgorithmLinear solution: Normalization: DLTMinimi
25、zation of geometric error: Iteratively optimization (Levenberg-Marquardt):(i)Denormalization:iiUXXiiTxxUPTP-1We are lucky! 1st time huge amount of visual data is easily accessible. High-level description of these data also become available. How do we explore them? Analysis them? Wisely use them?What
26、s the contribution of this paper?How to extract high-level information?- Computer Vision, Machine Learning Tools. Structure from motion, and more computer vision tools reach a certain robust point for graphics application.- InternetImage search- Human Labelgame with purposeWhat is the space of all t
27、he pictures?in the pastpresentthe future?Whats the space of all the videos?in the pastpresentthe future?What else?Using Search Engine?Using human computation power?Using human computation power?Using human computation power?What else?What else?Book:“Multiple View Geometry in Computer Vision” Hartley and ZissermanOnline Tutorial:http:/www.cs.unc.edu/marc/tutorial.pdfhttp:/www.cs.unc.edu/marc/tutorial/Matlab Toolbox:http:/homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/TORR1/index.html结束结束
限制150内