RIASSUNTO
We use video metadata to perform activity detection from videos in the wild, particularly the TRECVID dataset. Unlike previous activity datasets (KTH, Weiz-mann, UCF sports, etc.), this test set is assembled from videos captured with a wide range of cameras, resulting in videos with different frame rates, audio/video bitrates, and resolutions. Because these measures correlate with the quality of the camera, and because different camera hardware may be used to capture different events (e.g., people likely bring nicer cameras to weddings than on fishing trips), we expect that usable correlations exist between metadata and events. Using SVM-based classification of a feature vector of metadata features, we demonstrate that such correlations do exist. While the performance of this method is worse than traditional visual features, we demonstrate that they compliment such approaches using score fusion.