ICCV 2021 Multi-camera Multiple People Tracking Workshop

We propose to organize this Multi-camera Multiple People Tracking workshop, aiming to gather academic and industry community together to tackle indoor multiple people tracking using multiple RGB cameras.


Multiple object tracking is one of the most basic and most important tasks in computer vision. It is one of the fundamental research topics in understanding visual content. It has numerous applications in indoor navigation, motion capture, human computer interaction, robotics etc.

For single-camera tracking, there are several datasets and benchmarks, which stimulate novel ideas of tracking models using sequential information. For multi-camera tracking, data collection and labeling are much more difficult. Thus, there are only a few small datasets available and no common benchmarks.

To support the community to develop more efficient and novel tracking algorithms, we construct a multi-camera multiple people tracking dataset. We expect our datasets can also be served as an evaluation benchmark for this task. This challenge invites all research teams to participate, and dataset will be released to the research community. We invite academic and industrial researchers to participate in this multi-camera multiple people tracking contest.

Multi-camera Multiple People Tracking (MMP-Tracking) Challenge

MMP-Tracking Challenge aims to push the state of the art multi-camera multiple people tracking algorithms forward. The participants will receive an annotated training set and a test set without annotations.

There are two subtracks in this challenge. (1) Evaluating tracking results from topdown view; (2) Evaluating tracking results from each camera view, then aggregate together for final metric.

For subtrack (1), using camera calibration files provided, one can mapping the ground plane in each camera view into a world coordinate shared by all cameras. Then discretize the world coordinate with voxel size 20mm to get topdown view map. We provide ground truth label of person footpoint coordinates in topdown view map. During evaluation, we compute false positive(FP), false negative(FN) and true positive(TP) by assigning detected person footpoint to ground truth footpoint using Hungarian matching. We impose that a detected footpoint can be assigned to ground truth only if they are less than 0.5m away (25pixel distance in topdown map). After getting FP, FN and TP, we will compute IDF1 and MOTA as our final score.

For subtrack (2), following the same evaluation procedure widely used in MOT, we compute FP, FN and TP for each camera view independently (detected bounding box and ground truth bounding box can be matched only their IOU>0.5). After getting results from each camera view, we average over all these results to computer our final metrics. We will use IDF1 and MOTA as our final score.

Our evaluation codes are based on py-motmetrics . The evaluation codes can be downloaded here for local usage by participants. We use CodaLab as our evaluation server link .


  • July 18th: Training and validation data available
  • July 25th: Testing phase begins
  • Sep 30th: Competition ends (challenge paper submission - optional)

Invited Speakers

Dr. Xin Wang

Dr. Xin Wang is a Ph.D. student at UC Berkeley, working with Prof. Trevor Darrell and Prof. Joseph E. Gonzalez. She is part of the BAIR Lab, RISE Lab, and BDD Lab. Her research interest lies at the intersection of computer vision, machine learning and learning systems.

Prof. Haibin Ling

SUNY Empire Innovation Professor at Stony Brook University. His research interests include computer vision, augmented reality, medical image analysis, visual privacy protection, and human computer interaction. He received Best Student Paper Award of ACM UIST in 2003 and NSF CAREER Award in 2014. He serves as associate editors for IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), Pattern Recognition (PR), and Computer Vision and Image Understanding (CVIU). He has served as Area Chairs various times for CVPR and ECCV.

Prof. Mubarak Shah

Trustee Chair Professor of Computer Science, is the founding director of the Center for Research in Computer Vision at UCF. His research interests include: video surveillance, visual tracking, human activity recognition, visual analysis of crowded scenes, video registration, UAV video analysis, etc. Dr. Shah is a fellow of the National Academy of Inventors, IEEE, AAAS, IAPR and SPIE. In 2006, he was awarded a Pegasus Professor award, the highest award at UCF.

Advisory Committee

Dr. Tatjana Chavdarova

Postdoctoral researcher in the Machine Learning and Optimization (MLO) lab at EPFL.

Prof. Haibin Ling

SUNY Empire Innovation Professor, Dept of Computer Science, Stony Brook University.

Prof. Ying Wu

Full professor in the Department of Electrical Engineering and Computer Science and the Department of Computer Science at Northwestern University.

Prof. Jiebo Luo

Fully Professor at Department of Computer Science, University of Rochester.

Prof. Ming-Hsuan Yang

Professor in Electrical Engineering and Computer Science at University of California, Merced.

Prof. Mubarak Shah

UCF Trustee Chair Professor, Director.


Xiaotian Han, Microsoft

Quanzeng You, Microsoft

Peng Chu, Microsoft

Will Boyd, Microsoft

Jia Li, DawnLigth

Houdong Hu, Microsoft

Jiang Wang, Microsoft

Zicheng Liu, Microsoft

Plain Academic