MC-VEO

E²-VINS: An Event-Enhanced Visual-Inertial SLAM Scheme for Dynamic Environments

Jiafeng Huang¹, Shengjie Zhao¹, Lin Zhang¹, and Hongwei Dai²

¹School of Software Engineering, Tongji University, Shanghai, China

²Jiangsu Ocean University, Lianyungang, China

Introduction

This is the website for our paper E²-VINS: An Event-Enhanced Visual-Inertial SLAM Scheme for Dynamic Environments.

Simultaneous Localization and Mapping (SLAM) technology has garnered significant interest in the robotic vision community over the past few decades. The rapid development of SLAM technology has resulted in its widespread application across various fields, including autonomous driving, robot navigation, and virtual reality. Although SLAM, especially Visual-Inertial SLAM (VI-SLAM), has made substantial progress, most classic algorithms in this field are designed based on the assumption that the observed scene is static. In complex real-world environments, the presence of dynamic objects such as pedestrians and vehicles can seriously affect the robustness and accuracy of such systems. Event cameras, recently introduced motion-sensitive biomimetic sensors, efficiently capture scene changes (referred to as "events") with high temporal resolution, offering new opportunities to enhance VI-SLAM performance in dynamic environments. Integrating this kind of innovative sensors, we propose the first event-enhanced Visual-Inertial SLAM framework specifically designed for dynamic environments, termed E²-VINS.

Overall Framwork

The overall pipeline of our proposed E²-VINS system is shown as Figure 1. "Preprocessing" involves IMU pre-integration, event motion compensation, and feature tracking of RGB frames. Compensated events are used to generate event-based dynamicity metrics that measure the dynamicity of each pixel. Based on these metrics, weights for visual residuals of different pixels are adaptively assigned, referred to as dynamicity weights. Finally, E²-VINS optimizes the system state (camera poses and map points) and dynamicity weights.

Figure 1. The overall pipeline of the E²-VINS.

Performances

- Qualitative Results

Figure 2. Visualization of some typical qualitative results.

The weighted feature image frames obtained by E²-VINS on three test sequences from the VIODE dataset (city_day, city_night, and parking_lot) and four sequences from the ECMD dataset (Dense_street_day_easy_a, Dense_street_day_easy_b, Dense_street_night_easy_a, and Urban_road_day_easy_a) are shown on the left side. Features with low weights (from dynamic objects) are plotted as red points while features with high weights (from static objects) are shown as green ones. The trajectories of the compared algorithms and E²-VINS on the Dense_street_day_medium_a and Dense_street_night_easy_a sequences of the ECMD dataset are illustrated on the right side.

- Quantitative Results

Table 1. Mean Position Error (%) of E²-VINS and compared SLAMs on the VIODE dataset.

Table 2. Mean Position Error (%) of E²-VINS and compared SLAMs on the ECMD dataset.

Considering all metrics comprehensively, the performence of the E²-VINS is the best among all compared schemes.

Source Codes

E²-VINS Code