Mobility datasets

While Accio does not provide for now an integrated datasets support, we provide here for convenience a list of well-known mobility datasets that have been used in numerous papers. Several initiatives have been conducted to publicly provide datasets coming from real-life data collections, each of them being summarized in the table below.

Dataset Location Time span #users #events
Cabspotting San Fransisco, USA 1 month 536 11 million
Geolife Beijing, China 5.5 years 178 25 million
MDC Geneva region, Switzerland 3 years 185 11 million
T-Drive Beijing, China 1 week 10,357 15 million
Brightkite World 1.5 years 58,228 4 million
Gowalla World 1.5 years 196,591 6 million

Cabspotting

Download the dataset (free registration required).

The Cabspotting dataset contains GPS traces of taxi cabs in San Francisco (USA), collected in May 2008.

Geolife

Download the dataset and its user guide.

The Geolife dataset gathers GPS trajectories collected from April 2007 to August 2012 in Beijing (China). The large majority of traces were collected with a high sampling rate, around 1 events every 1~5 seconds. It was collected by Microsoft Research.

Mobile Data Challenge (MDC)

Register to get access to the dataset (universities and non-profits are eligible).

The MDC dataset involves 182 volunteers equipped with smartphones running a data collection software in the Lake Geneva region (Switzerland), collected between 2099 and 2011. A privacy protection scheme based on k-anonymity has been performed on the raw data before releasing the MDC dataset. This privacy preserving operation includes many manual operations which have obviously an impact on the outcome of LPPMs, but these impacts are difficult to fully understand. It includes not only locations coming from the GPS sensor, but also data from various other sensors (e.g., accelerometer, battery).

T-Drive

Download a sample of the dataset (part 6, part 7, part 8, part 9, part 10, part 11, part 12, part 13, part 14) and its user guide.

T-Drive is another dataset collected in Beijing and featuring taxi drivers. It features a high number of users (more than 10,000) over a very short period of time (one week). Only a sample of the whole dataset, that was collected by Microsoft Research, is released.

Brightkite

Download the dataset.

Brightkite is a dataset exposing “check-ins” leaved by users of the social network of the same name. Such a dataset is sparser than whole mobility datasets, because we only have places at which users deliberately checked in. But in addition to check-in locations, it also comes with friendship relationships between users.

Gowalla

Download the dataset.

Gowalla is a dataset exposing “check-ins” leaved by users of the social network of the same name. Like Brightkite, in addition to check-in locations, it also comes with friendship relationships between users.