These are the example dialogue video segments extracted from a movie file in the corpus.
The original entire movie is "Deep Red (1975)".
Please make sure the file contains 1722 rows.
You need Python and wget installed. Python module BeautifulSoup4 should be installed.
After downloading above files, in command line, you can use them as after creating a download directory (named Dir for example).
download_MovieDS.py movlist_part1.csv Dir
Note that the size of original video data is large (764.1GB) and download takes time (three days in our case). Please do not run the download script over mobile network or shared network such as at conference venue or at hotels. Movie files are in MP4 format. Please check that the size of the downloaded file is not zero. The program create "miss_downloads.log" file in the current directory. If download goes without any problem, the file should be empty. The above script use wget with " --no-check-certificate" option. The option may be removed depending on your environment.
The segmentation is done automatically and the data contains some errors. Overall accuracy is about 90% but for some movie genres (music and musical) the accuracies are lower. Please see the poster files below that were presented at ICMI. The first page contains a table for the error rate based on the sampling.
Inquiries and comments are welcome. Please email me (m.inoue-at-acm.org).