ICCV 2021 Multi-camera Multiple People Tracking Workshop

We propose to organize this Multi-camera Multiple People Tracking workshop, aiming to gather academic and industry community together to tackle indoor multiple people tracking using multiple RGB cameras.


Multiple object tracking is one of the most basic and most important tasks in computer vision. It is one of the fundamental research topics in understanding visual content. It has numerous applications in indoor navigation, motion capture, human computer interaction, robotics etc.

For single-camera tracking, there are several datasets and benchmarks, which stimulate novel ideas of tracking models using sequential information. For multi-camera tracking, data collection and labeling are much more difficult. Thus, there are only a few small datasets available and no common benchmarks.

To support the community to develop more efficient and novel tracking algorithms, we construct a multi-camera multiple people tracking dataset. We expect our datasets can also be served as an evaluation benchmark for this task. This challenge invites all research teams to participate, and dataset will be released to the research community. We invite academic and industrial researchers to participate in this multi-camera multiple people tracking contest.

Multi-camera Multiple People Tracking (MMP-Tracking) Challenge

MMP-Tracking Challenge aims to push the state of the art multi-camera multiple people tracking algorithms forward. The participants will receive an annotated training set and a test set without annotations.

There are two subtracks in this challenge. (1) Evaluating tracking results from topdown view; (2) Evaluating tracking results from each camera view, then aggregate together for final metric.

For subtrack (1), using camera calibration files provided, one can mapping the ground plane in each camera view into a world coordinate shared by all cameras. Then discretize the world coordinate with voxel size 20mm to get topdown view map. We provide ground truth label of person footpoint coordinates in topdown view map. During evaluation, we compute false positive(FP), false negative(FN) and true positive(TP) by assigning detected person footpoint to ground truth footpoint using Hungarian matching. We impose that a detected footpoint can be assigned to ground truth only if they are less than 0.5m away (25pixel distance in topdown map). After getting FP, FN and TP, we will compute IDF1 and MOTA as our final score.

For subtrack (2), following the same evaluation procedure widely used in MOT, we compute FP, FN and TP for each camera view independently (detected bounding box and ground truth bounding box can be matched only their IOU>0.5). After getting results from each camera view, we average over all these results to computer our final metrics. We will use IDF1 and MOTA as our final score.

Our evaluation codes are based on py-motmetrics . The evaluation codes can be downloaded here for local usage by participants. We use CodaLab as our evaluation server link .


  • Track 1 - Topdown View:
  • Rank Team
    1st Alibaba Group Fei Du, Jiasheng Tang, et al.
    2nd Hikvision Research Institute Jie Zhang, et al.
    3rd University of Science and Technology of China Size Wu
    4th Fujitsu Research Shoichi Masui, Fan Yang, et al.
  • Track 2 - Camera View:
  • Rank Team
    1st Hikvision Research Institute Jie Zhang, et al.
    2nd Wyze Labs, Inc, AI Team Hung-Min Hsu, et al.
    3rd Information Processing Lab, University of Washington Cheng-Yen Yang, et al.


  • July 18th: Training and validation data available
  • July 25th: Testing phase begins
  • Sep 30th: Competition ends (challenge paper submission - optional)

Time (EDT) Oct 16th Session Note
10:30am ~ 10:35am Opening Remark Dr. Zicheng Liu
10:35am ~ 11:20am Keynote: Multi-Object and Multi-Camera Tracking Prof. Mubarak Shah
11:20am ~ 11:35am Dataset introduction and Winner Announcement
11:35am ~ 12:05pm Special Session: Ethics, privacy and diversity in people tracking Host: Dr. Zicheng Liu, Panelist: Dr. Houdong Hu, Prof. Ming-Hsuan Yang, Prof. Haibin Ling
12:05pm ~ 12:10pm break
12:10am ~ 12:55pm Keynote: Towards robust cross-domain object detection Dr. Xin Wang
12:55pm ~ 13:10pm Winner Talk: From Camera-view Tracklets to Topdown-view Trajectories: Information Aggregation at Different Levels Fei Du, Alibaba Group
13:10pm ~ 13:25pm Winner Talk Size Wu, University of Science and Technology of China
13:25pm ~ 13:40pm Winner Talk: Head-based Multi-camera Multiple People 3D Tracking Fan Yang, Fujitsu Research
13:40pm ~ 13:45pm break
13:45pm ~ 14:00pm Winner Talk: Multi-camera Multiple People Tracking Jie Zhang, Hikvision Research Institute
14:00pm ~ 14:15pm Winner Talk: Multi-Expert Multiple Object Tracking (ME-MOT) Hung-Min Hsu, Wyze Labs, Inc. AI Team
14:15pm ~ 14:30pm Winner Talk Cheng-Yen Yang, Information Processing Lab, University of Washington
14:30pm ~ 15:15pm Keynote: Benchmarking and Diagnosis of Visual Tracking Algorithms Prof. Haibin Ling

Workshop Meeting Recordings

Invited Speakers

Dr. Xin Wang

Dr. Xin Wang is a Ph.D. student at UC Berkeley, working with Prof. Trevor Darrell and Prof. Joseph E. Gonzalez. She is part of the BAIR Lab, RISE Lab, and BDD Lab. Her research interest lies at the intersection of computer vision, machine learning and learning systems.

Prof. Haibin Ling

SUNY Empire Innovation Professor at Stony Brook University. His research interests include computer vision, augmented reality, medical image analysis, visual privacy protection, and human computer interaction. He received Best Student Paper Award of ACM UIST in 2003 and NSF CAREER Award in 2014. He serves as associate editors for IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), Pattern Recognition (PR), and Computer Vision and Image Understanding (CVIU). He has served as Area Chairs various times for CVPR and ECCV.

Prof. Mubarak Shah

Trustee Chair Professor of Computer Science, is the founding director of the Center for Research in Computer Vision at UCF. His research interests include: video surveillance, visual tracking, human activity recognition, visual analysis of crowded scenes, video registration, UAV video analysis, etc. Dr. Shah is a fellow of the National Academy of Inventors, IEEE, AAAS, IAPR and SPIE. In 2006, he was awarded a Pegasus Professor award, the highest award at UCF.

Advisory Committee

Dr. Tatjana Chavdarova

Postdoctoral researcher in the Machine Learning and Optimization (MLO) lab at EPFL.

Prof. Haibin Ling

SUNY Empire Innovation Professor, Dept of Computer Science, Stony Brook University.

Prof. Ying Wu

Full professor in the Department of Electrical Engineering and Computer Science and the Department of Computer Science at Northwestern University.

Prof. Jiebo Luo

Fully Professor at Department of Computer Science, University of Rochester.

Prof. Ming-Hsuan Yang

Professor in Electrical Engineering and Computer Science at University of California, Merced.

Prof. Mubarak Shah

UCF Trustee Chair Professor, Director.


Xiaotian Han, Microsoft

Quanzeng You, Microsoft

Peng Chu, Microsoft

Will Boyd, Microsoft

Jia Li, DawnLigth

Houdong Hu, Microsoft

Jiang Wang, Microsoft

Zicheng Liu, Microsoft

Plain Academic