Introduction: Conventional single-shot object detection neural networks, such as YOLO, have achieved remarkable success in identifying and localizing objects within 2D images using axis-aligned rectangular bounding boxes. While effective for many applications, these 2D representations lack crucial information about the object's true 3D pose, dimensions, and orientation in the real world. This limitation becomes significant in applications requiring a deeper understanding of the scene, such as autonomous driving, robotics, and augmented reality.
The goal of this thesis is to extend the capabilities of single-shot object detection networks by developing, training, and evaluating a model that directly predicts oriented 3D bounding boxes from a single monocular image. This includes estimating the object's 3D location, its dimensions (length, width, height), and its 3D orientation.
Motivation: Accurate and efficient monocular 3D object detection is a crucial task in various computer vision applications. Relying on a single camera offers advantages in terms of cost, simplicity, and ease of deployment compared to multi-camera or LiDAR-based systems. This thesis aims to contribute to the advancement of monocular 3D object detection by exploring and implementing a single-shot approach capable of predicting oriented 3D bounding boxes.
Tasks:
Literature Review on Monocular 3D Object Detection CNNs:
- Conduct a comprehensive review of existing research in monocular 3D object detection using Convolutional Neural Networks (CNNs).
- Address the issue of camera calibration: Thoroughly examine how different methods handle camera calibration parameters (intrinsic and extrinsic) and their impact on the accuracy of 3D object detection.
- Analyze the strengths and weaknesses of different network architectures.
Investigation of System Constraints:
- Identify and analyze the inherent challenges and constraints of monocular 3D object detection compared to methods utilizing depth information.
- Consider factors such as:
- Scale ambiguity: The difficulty in determining the absolute size and distance of an object from a single 2D image.
- Occlusion: How occluded objects can affect the accuracy of 3D bounding box prediction.
- Viewpoint variation: The impact of different viewing angles on the perceived shape and size of objects.
- Computational resources: Consider the computational complexity and real-time requirements for potential applications.
Design and Development of an Oriented 3D Bounding Box CNN:
- Based on the literature review and the identified system constraints, design a novel or adapt an existing single-shot object detection CNN architecture to predict oriented 3D bounding boxes.
- This will involve:
- Choosing an appropriate backbone network.
- Designing the output layers to predict the parameters of the 3D bounding box (e.g., center coordinates, dimensions, Euler angles or quaternions for orientation).
- Defining a suitable loss function that incorporates the different aspects of 3D bounding box prediction.
Training and Evaluation on Real-World Data:
- Select a suitable real-world dataset(s) with 3D object annotations (both Kapsch proriatory and public).
- Implement and train the model
- Evaluate the performance of the trained model using appropriate 3D object detection metrics (e.g., Average Precision with different IoU thresholds in 3D space).
- Analyze the results, identify limitations, and discuss potential future improvements.
Expected Deliverables:
A comprehensive literature review on monocular 3D object detection This thesis provides an excellent opportunity to delve into the challenging and rapidly evolving field of monocular 3D object detection. The student will gain practical experience in literature review, deep learning model design, implementation, training, and evaluation on real-world data.
- CNNs.
- A detailed description of the designed and implemented model architecture.
- A thorough evaluation of the model's performance on real-world data.
- A written thesis document summarizing the research process, findings, and conclusions.
- Potentially, a working implementation of the developed model.
Your Profile:
Required Background Studies in Computer Science, Software Engineering, Information Technology, Geoinformatics or related fields
Fluent English skills
Interest in technology
Willingness and ability to work independently
Excellent communication and teamwork skills
Conscientiousness and reliability
Strong analytical skills with a precise and structured approach
Start: Immediately
Duration: 3–6 months
Successful completion of the master’s thesis will be rewarded with €3,000.
Contact:
Edwin Frühwirth