Mobility datasets
While Accio does not provide for now an integrated datasets support, we provide here for convenience a list of well-known mobility datasets that have been used in numerous papers. Several initiatives have been conducted to publicly provide datasets coming from real-life data collections, each of them being summarized in the table below.
Dataset | Location | Time span | #users | #events |
---|---|---|---|---|
Cabspotting | San Fransisco, USA | 1 month | 536 | 11 million |
Geolife | Beijing, China | 5.5 years | 178 | 25 million |
MDC | Geneva region, Switzerland | 3 years | 185 | 11 million |
T-Drive | Beijing, China | 1 week | 10,357 | 15 million |
Brightkite | World | 1.5 years | 58,228 | 4 million |
Gowalla | World | 1.5 years | 196,591 | 6 million |
Cabspotting
Download the dataset (free registration required).
The Cabspotting dataset contains GPS traces of taxi cabs in San Francisco (USA), collected in May 2008.
Geolife
Download the dataset and its user guide.
The Geolife dataset gathers GPS trajectories collected from April 2007 to August 2012 in Beijing (China). The large majority of traces were collected with a high sampling rate, around 1 events every 1~5 seconds. It was collected by Microsoft Research.
Mobile Data Challenge (MDC)
Register to get access to the dataset (universities and non-profits are eligible).
The MDC dataset involves 182 volunteers equipped with smartphones running a data collection software in the Lake Geneva region (Switzerland), collected between 2099 and 2011. A privacy protection scheme based on k-anonymity has been performed on the raw data before releasing the MDC dataset. This privacy preserving operation includes many manual operations which have obviously an impact on the outcome of LPPMs, but these impacts are difficult to fully understand. It includes not only locations coming from the GPS sensor, but also data from various other sensors (e.g., accelerometer, battery).
T-Drive
Download a sample of the dataset (part 6, part 7, part 8, part 9, part 10, part 11, part 12, part 13, part 14) and its user guide.
T-Drive is another dataset collected in Beijing and featuring taxi drivers. It features a high number of users (more than 10,000) over a very short period of time (one week). Only a sample of the whole dataset, that was collected by Microsoft Research, is released.
Brightkite
Download the dataset.
Brightkite is a dataset exposing “check-ins” leaved by users of the social network of the same name. Such a dataset is sparser than whole mobility datasets, because we only have places at which users deliberately checked in. But in addition to check-in locations, it also comes with friendship relationships between users.
Gowalla
Download the dataset.
Gowalla is a dataset exposing “check-ins” leaved by users of the social network of the same name. Like Brightkite, in addition to check-in locations, it also comes with friendship relationships between users.