Computer Vision (Fall 2024)

Administrative Matters

Instructor: Lin ZHANG

TA: Yuanyu Zheng (wechat: zheng18702147239, email: zheng19845@qq.com)

Evaluation: attendance 10%, homework (3 times) 30%, final project 20%, final paper exam 40%, extra bonus 5%.

Introduction to Computer Vision: Principles, Algorithms, and Practices (Oct. 06, 2024 updated)

กก

Lecture Slides

Slides

Reading Materials

Introduction to Computer Vision

1. Computer Vision, Wiki Page, https://en.wikipedia.org/wiki/Computer_vision

2. Erlangen Program, https://en.wikipedia.org/wiki/Erlangen_program

3. Xinyu Huang et al., The ApolloScape Open Dataset for Autonomous Driving and Its Application, IEEE Trans. PAMI, 2020.

4. Lin Zhang et al., "Towards contactless palmprint recognition: A novel Device, a new benchmark, and a collaborative representation based identification approach", Pattern Recognition, 2017.

5. Yingyi Zhang, Lin Zhang et al., Pay by showing your palm: A study of palmprint verification on mobile platforms, in: Proc. ICME, 2019.

6. Tianjun Zhang, Nlong Zhao, Ying Shen, Xuan Shao, Lin Zhang*, and Yicong Zhou, ROECS: A robust semi-direct pipeline towards online extrinsics correction of the surround-view system, in: Proc. ACM Int'l Conf. Multimedia, 2021.

7. Lin Zhang et al., "Simulation of atmospheric visibility impairment", IEEE Trans. Image Processing, 2021.

8. Zhong Wang, Lin Zhang et al., D-LIOM: Tightly-coupled direct LiDAR-inertial odometry and mapping, IEEE Trans. Multimedia, 2023.

9. Xuan Shao, Lin Zhang et al., "MOFISSLAM: A multi-object semantic SLAM system with front-view, inertial and surround-view sensors for indoor parking", IEEE Trans. Circuits and Systems for Video Technology, vol.32, no.7, 2022.

10. Tianjun Zhang, Lin Zhang*, Yang Chen, and Yicong Zhou, "CVIDS: A collaborative localization and dense mapping framework for multi-agent based visual-inertial SLAM," IEEE Transactions on Image Processing, vol. 31, pp. 6562-6576, 2022

Local Interest Point Detectors

1. 01-harrisCornerDetector. This program implements the Harris corner detector and generates an example for "corner detection" mentioned in our lecture.

2. C. Harris and M. Stephens, A combined corner and edge detector, 1988

5. D.G. Lowe, Distinctive image features from scale-invariant keypoints, IJCV' 04

Local Feature Descriptors and Matching

1. 02-harrisCornerDescriptorMatching. This program implements the Harris corner detection and matching.

2. 03-openSIFTVS. This program implements the SIFT interest point detection, descriptor construction and matching in C++. It is a project with Visual Studio 2017.

3. PanoramaStichingUsingSIFTRANSAC. This matlab program implements the SIFT based panorama stitching with the RANSAC framework.

4. Sift Implementation (Matlab)

Math Prerequsit I: Projective Geometry

Introduction of Projective Geometry in Wiki

Math Prerequsit II: Nonlinear Least-squares

K. Madsen et al., Methods for nonlinear least-squares problems, Technical Univ. Denmark, 2004

Measurement Using a Single Camera

1. Z. Zhang, A Flexible New Technique for Camera Calibration, IEEE T-PAMI, 2000

2. Rodrigues' rotation formula

3. Why do we need at least two calibration board images?

4. cameraCalibratorImgs. A set of calibration board images captured by a camera are provided. Also, the checkerboard pattern (in PDF form) used in our lecture is provided.

5. imageUndistortUsingIntrinsicsMatlab. This demo shows how to perform image un-distortion using camera intrinsics.

6. monoCalib. This demo is based on the openCV source code, totally complying with the theoretical discussions in our lectures. The code is complied by VS2017+opencv4.5.5+Win11. Since it is a pure C++ project, it can be straightforwardly ported to another platform (MacOS or Ubutu) if you like. 

7. fisheyeCameraCalib. This demo shows how to use opencv routines to perform fisheye camera calibration and how to use camera intrinsics to perform online video un-distortion. The code is compiled by VS2017+opencv4.5.5+Win11.

8. surround-view. This demo shows how to synthesize a surround-view from four fisheye videos. The camera intrinsic parameters, the homography transforms between the four views and the road, and raw videos captured from four cameras are provided. The code is compiled by VS2017+opencv4.5.5+Win11.

Basics for Machine Learning and A Special Emphasis on CNN

1. K. He et al., Deep Residual Learning for Image Recognition, CVPR 2016
2. G. Huang et al., Densely Connected Convolutional Networks, CVPR 2017
3. J. Redmon et al., Yolo: 9000 better, faster, stronger, CVPR 2017

4. J. Redmon et al., YOLOv3: An Incremental Improvement, arXiv, 2018
5. Github for YoloV4, https://github.com/AlexeyAB/darknet

6. Ultralytics YOLOv8, https://ultralytics.com/yolov8

7. J.R. Terven et al., A comprehensive review of YOLO: From YOLOv1 to YOLOv8 and beyond, arXiv:2304.0050, 2023.
8. The envolvement of CNN architectures

Visual Perception Practices in Autonomous Driving

1. Lin Zhang et al., Vision-based parking-slot detection: A DCNN-based approach and a large-scale benchmark dataset, IEEE Trans. Image Processing, 2018. (project website)
2.
Xuan Shao, Lin Zhang et al., "MOFISSLAM: A multi-object semantic SLAM system with front-view, inertial and surround-view sensors for indoor parking", IEEE Trans. Circuits and Systems for Video Technology, 2022. (project website)

3. Tianjun Zhang, Nlong Zhao, Ying Shen, Xuan Shao, Lin Zhang*, and Yicong Zhou, ROECS: A robust semi-direct pipeline towards online extrinsics correction of the surround-view system, in: Proc. ACM Int'l Conf. Multimedia, 2021. (project website)

Introduction to Numerical Geometry

1. Example to demonstrate fast marching (FastMarching.rar)
2. Example to show Euclidean isometry removal by PCA (EuclideanIsometryRemoval.rar)
3. Code for ICP-based 3D shape matching (ICP.rar)
4. Lin Zhang et al., 3D face recognition based on multiple keypoint descriptors andsparse representation, PLoS ONE, 2014

5. Lin Zhang et al., 3D palmprint identification using block-wise features and collaborative representation, IEEE Trans. Pattern Analysis and Machine Intelligence, 2015.

Assignments

Notes:

1. Compress all files into a .rar file and name it as "CV_studentID_yourName_Assignment2.rar"; the title of the email should be of the format "CV_studentID_yourName_Assignment2". If you want to resubmit, please add "_R1" or "_R2" to the .rar file and the email title, similar as "CV_studentID_yourName_Assignment2_R1".

2. For the programming assignments, please make sure your program can successfully run on TA's machine.

3. All the documents you hand in, including comments in the source codes, should be in English.

4. Please send your solutions to TA (zheng19845@qq.com) and confirm with TA that he has received your email successfully.

1. Assignment 1. (Due: Oct. 27, 2024) scores, ref solution Kai LAN, ref solution from Juekai LIN

2. Assignment 2. (Due: Dec. 10, 2024) scores, ref solutions

3. Assignment 3. (Due: Dec. 29, 2024), Experiment instructions, Template for the experimental report, test video for speed-bump detection

Projects

Notes:

1. 2 or 3 persons form a group to deal with a selected topic.

2. At the end of this semester, you need to hand in the source code of the project and a related report (PPT form); and then, you need to give a presentation about your fruit. All the documents should be in English, including the comments in the program. The style of the source code should be neat and clear; and you should provide clear comments to the key components, functions, or statements. The report should contain at least the following parts: background introduction, system structure design, key algorithms used, experimental results, and references.

3. Try your best to make the system perfect. Creative ideas are highly encouraged. If the innovation is critical, we could prepare some conference papers!

Topics

1. Panorama Stitching (<=3 groups)

2. Camera Calibration Tool (<=4 groups)

3. Payment by Scanning Palmprint (<=3 groups)

4. Speedbump Detection and Distance Measurement (<=3 groups)

5. Binocular Stereo (<=4 groups)

6. 3D Gaussian Splatting on Mobile Devices (<=4 groups)

7. LiDAR-Inertial-Camera Calibration and SLAM (<=3 groups)

8. Depth Estimation and Dense Reconstruction with the Monocular Camera (<=4 groups)

9. Robot Navigation (<=1 group)

Main References

D. Forsyth and J. Ponce
Computer Vision -- A Modern Approach (2nd Edition),
Prentice Hall, 2013
Online version available here

Richard Hartley and Andrew Zisserman

Multiple View Geometry in Computer Vision  (2nd Edition)

Cambridge University Press, 2004

Online version available here

Created on: Aug. 30, 2024

Last updated on: Dec. 16, 2024