SuperEvent Project Page

Qualitative Results

Unseen sequence rec1499023756 of the DAVIS Driving Dataset 2020 (DDD20).

Unseen sequence rec1501614399 of the DAVIS Driving Dataset 2020 (DDD20).

Unseen sequence Indoor 45° downward facing 14 of the UZH-FPV Drone Racing dataset.

Unseen sequence Varying Robust of the Vision for Visibility Dataset (ViViD++).

Abstract

Event-based keypoint detection and matching holds significant potential, enabling the integration of event sensors into highly optimized Visual SLAM systems developed for frame cameras over decades of research. Unfortunately, existing approaches struggle with the motion-dependent appearance of keypoints and the complex noise prevalent in event streams, resulting in severely limited feature matching capabilities and poor performance on downstream tasks. To mitigate this problem, we propose SuperEvent, a data-driven approach to predict stable keypoints with expressive descriptors. Due to the absence of event datasets with ground truth keypoint labels, we leverage existing frame-based keypoint detectors on readily available event-aligned and synchronized gray-scale frames for self-supervision: we generate temporally sparse keypoint pseudo-labels considering that events are a product of both scene appearance and camera motion. Combined with our novel, information-rich event representation, we enable SuperEvent to learn robust keypoint detection and description in event streams. Finally, we demonstrate the usefulness of SuperEvent by its integration into a modern sparse keypoint and descriptor-based SLAM framework originally developed for traditional cameras, surpassing the state-of-the-art in event-based SLAM by a wide margin.

Data processing pipeline for training and inference of SuperEvent.

Advantages of Event Camera Matching

Frame with motion blur (UZH-FPV, Outdoor 45° downward facing 2).

Frame with motion blur (UZH-FPV, Indoor 45° downward facing 14).

Frame with motion blur (UZH-FPV, Outdoor 45° downward facing 1).

HDR: frame overexposed (DDD20, rec1501614399).

HDR: frame underexposed (DDD20, rec1502599151).

HDR: frame underexposed since light is turned off, many negative events (ViViD++, Varying Robust).

Stereo Event VI-SLAM (OKVIS2 Integration)

Small scale trajectory estimation on the sequence mocap-desk of the TUM-VIE dataset.

Large scale trajectory estimation with loop closure on the sequence loop-floor0 of the TUM-VIE dataset.

Quantitative results on the TUM-VIE mocap-sequences.

@misc{burkhardt2025superevent, title={SuperEvent: Cross-Modal Learning of Event-based Keypoint Detection}, author={Yannick Burkhardt and Simon Schaefer and Stefan Leutenegger}, year={2025}, url={https://arxiv.org/abs/2504.00139}}

SuperEvent: Cross-Modal Learning of Event-based Keypoint Detection for SLAM