Submission Procedure and Metrics

Submission Procedure

To participate to this contest, a registration is necessary. Participants can register by sending an email to Gustavo Fernandez.

Participants have to send the results in a text file. This text file contains necessary information (basically camera number, frame number, object id, object's position, etc.) in a pre-defined format defined by the organisers. A description of this file with some examples will be available for download. The submission will consist of either a zip or a tar file containing:

  • Text files with results obtained by running the algorithm using the datasets of the contest, and
  • a pdf file (maximum 2 pages) describing the algorithm.

The compressed file should be sent not later than May 31th, 2012 (23:59:59, UTC).

All received submissions will be evaluated using the performance metrics listed here below and a comparison of them will be done by the organisers. Authors of the best results will be invited to present their work during the workshop (oral presentation of 20 minutes plus 5-10 minutes for discussion). They will also be invited to write a paper for a journal special issue (Computer Journal to be confirmed). A paper submission to the main ICPR conference in parallel to the contest submission is encouraged.

After the contest, the organisers will publish a summary of it describing tasks, challenges, datasets, participants, methods, results, and possible future work.

Performance Metrics

The performance metrics are based on “state‐of‐the‐art” metrics in the area of object tracking and multi‐camera systems [1, 2, 3, 6, 7, 8]. It might be possible to introduce new performance metrics which extend the previous ones.
The ground‐truth is defined at frame level using bounding box enclosing the object. The mapping between ground‐truth and output result will be done using an overlap measure between both bounding boxes, ground truth GT and algorithm result AR. Given a ground‐truth track GTi , and an algorithm result track ARi , we use same definitions as in [7] to define spatial overlap ratio and temporal overlap ratio as follows:

  • Spatial overlap is defined as the overlapping ratio SO(GTik , ARjk) between both tracks in a specific frame k:

     SO \left ( GT_{ik}, AR_{jk} \right ) = \frac {\left | GT_{ik} \bigcap AR_{jk} \right |}{\left | GT_{ik} \bigcup AR_{jk} \right |}

  • Temporal overlap TO(GTi , ARj) is a number indicating the frame span where an overlapping between both tracks GTi and ARj occurs:

     TO \left ( GT_{i}, AR_{j} \right ) = \begin{cases} TO_l - TO_f, & TO_l > TO_f \\ 0, & TO_l \le TO_f \end{cases}


    where TOf is the maximum of the first frame indexes of both tracks GTi and ARj, and TOl is the minimum of the last frame indexes.

Per Camera

  • Correct detected track (CDT) [3, 7]: It is so‐called true positive because it considers whether the GT track has been correctly detected. Following the definition given in [7], a GT track is considered to have been detected correctly if it satisfies both of the following conditions:
    1. Condition: The temporal overlap between both tracks ground truth GTi and result ARj, is larger than an arbitrary threshold Thr:

       \frac {Length \left ( GT_i \bigcap AR_j \right )}{Length \left ( GT_i\right )} \ge Thr

  • Condition: The result track ARj has sufficient spatial overlap with GTi track:

     \exists i\; such \; \frac{\sum_{k=1}^{N} SO \left ( GT_{ik}, AR_{jk} \right ) }{N} \ge Thr

  • False alarm track (FAT) [3, 7]: FAT number counts the false positive rate of tracks. Following the definition given in [7], a result track is considered as false alarm if such track meets any of the following conditions:
    1. Condition: The temporal overlap between both tracks ground truth GTi and result ARj, is smaller than an arbitrary threshold Thr with any GTi track:

       \frac {Length \left ( GT_{i}\bigcap AR_{j} \right )}{Length \left ( AR_{j}\right )} < Thr

    2. Condition: The result track ARj does not have sufficient spatial overlap with any GTi track, although the temporal overlap with ground truth track GTi is enough large:

       \forall i \; \frac{\sum_{k=1}^{N}SO \left ( GT_{ik}, AR_{jk} \right )}{N} < Thr

  • Track detection failure (TDF) [3, 7]: TDF measures the quantity of GT tracks which has not been detected. Again, following the definition given in [7], a GT track is considered to have not been detected, if it satisfies any of the following conditions:
    1. Condition: The temporal overlap between both tracks GTi and result ARj, is smaller than an arbitrary threshold Thr:

       \frac {Length \left ( GT_{i}\bigcap AR_{j} \right )}{Length \left ( GT_{i} \right ) } < Thr

    2. Condition: A GTi track does not have any sufficient spatial overlap with any algorithm result track ARj, although the temporal overlap between both tracks is large enough:

       \forall j \; \frac{\sum_{k=1}^{N}SO \left ( GT_{ik}, AR_{jk} \right )}{N} < Thr

  • Track fragmentation (TF, also called IDS in [1]): Given a single Ground Truth track GTi, TF refers the lack of continuity of a result track ARj:

     TF = \sum_{k} AR_{jk} \rightarrow GT

  • ID Change (IDC) [7]: For each result ARj track, IDC counts the number of ID changes:

     IDC_{j} = \sum ID \left ( AR_{j} \right )

  • Then, the total number of IDC changes on a camera will be calculated by:

     IDC = \sum_{j} IDC_{j}

Across Cameras

Given two cameras c1 and c2 and assuming we have GT tracks GTc1, i GTc2, j and result tracks ARc1, p ARc2, q , we define:

  • Crossing fragments (XFrag) [2]: Crossing fragments is the total number of times that there is a link between two ground‐truth trajectories in two different cameras but the link is missing in the tracking result. Basically, XFrag reflects the amount of false negatives across different cameras:

     XFrag = \left \lfloor \left ( GT_{c1, i} \rightarrow GT_{c2, j} \right ) \; and \; \left ( AR_{c1, p} \nrightarrow AR_{c2, q} \right )\right \rfloor

  • Crossing ID‐switches (XIDS) [2]: Crossing ID‐switches is the total number of times that there is no link between two ground‐truth trajectories in two different cameras but the link exists in the tracking result. XIDS is the total of false positives across different cameras.

     XIDS = \left \lfloor \left ( GT_{c1, i} \nrightarrow GT_{c2, j} \right ) \; and \; \left ( AR_{c1, p} \rightarrow AR_{c2, q} \right )\right \rfloor

  • Trajectory length (TL): Percentage of completed trajectory which was correctly tracked.

     TL = \frac {\sum_k Length_k \left ( GT_i \bigcap AR_j \right )}{\sum_k Length_k \left( GT_i \right )} \cdot 100.00


    where the index k runs across all cameras.

Computational Complexity

  • CPU times per frame (CPUTF): Average computing time necessary to process each frame:

     CPUTF = \frac {Total \; computing \; time}{\# \; frames}

  • Variance of CPU times per frame (VCPUTF):

     VCPUTF = Variance \left ( CPUTF \right )

  • CPU times per object per frame (CPUTOF): Average computing time necessary to process each object of each frame:

     CPUTOF = \frac {Total \; computing \; time}{\sum_f \# \; Objects \; per \; frame \; f}

  • Variance of CPU times per object per frame (VCPUTOF):

     VCPUTOF = Variance \left ( CPUTOF \right )

Topology

  • Percentage of correct topology (TOPO): Percentage of correct camera network topology detected. This measure is still to be defined based on concepts of graph theory and networks [4, 5].

References

[1] Baumann A., Boltz M., Ebling J., Koenig M., Loos H., Merkel M., Niem W., Warzelhan J., Yu J. “A review and comparison of measures for automatic Video Surveillance Systems”. EURASIP Journal on Image and Video Processing, Volume 2008, Article ID 824726, June 2008.
[2] Kuo C. H., Huang C., Nevatia R. “Inter‐camera Association of Multi‐target Tracks by On‐Line Learned Appearance Affinity Models”. ECCV 2010, pp. 383–396, 2010.
[3] Porikli F., Bashir F. “A complete performance evaluation platform including matrix‐based measures for joint object detector and tracker systems”. IEEE PETS Workshop 2006), New York, USA, 2006.
[4] Strogatz S. “Exploring complex networks”. Nature, 410:268‐276, 2001.
[5] West D. “Introduction to Graph Theory”. Prentice Hall Upper Saddle River, NJ 2001.
[6] Wu B., Nevatia R. “Detection and tracking of multiple, partially occluded humans by bayesian combination of edgelet based part detectors”. IJCV, pp. 247–266, November 2007.
[7] Yin F., Makris D., Velastin S. “Performance Evaluation of Object Tracking Algorithms”. 10th IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS2007), Rio de Janeiro, Brazil, October 2007.
[8] Yin F., Makris D., Velastin S., Orwell J. “Quantitative evaluation of different aspects of motion trackers under various challenges”. Annals of the BMVA, 2010(5) The British Machine Vision Association and Society for Pattern Recognition, pp. 1–11, 2010. DOI http://www.bmva.org/annals/2010/2010‐0005.pdf