Activity knowledge discovery: Modelling and unfolding activity space and types of social media and mobile users

Dec 6, 2022| By Qunying Huang

Digital footprints or trajectory datasets collected from social media platforms are typically recorded as sequences of locations with timestamps to represent individual trajectories. Using them to study human movement and patterns presents two major challenges: (1) Digital footprints record not only regular activities at sparse and irregular time intervals, but also random movements over space and time . The captured individual travel patterns are thus not immediately visible and identifiable; and (2) Digital footprints do not carry semantic information that can describe people’s activities, such as the purpose of people’s visit to a location (e.g., going to work), and context of the location (e.g., office), which is paramount for interpreting these data. As such, out lab has developed various methods and models to address these two challenges.

RA#1. Modeling and visualizing regular human mobility patterns with uncertainty: An example using Twitter data

Traditional space–time paths show the spatiotemporal trajectories of individuals in one to several days. Based on data for such short periods, these space–time paths might not be able to show regular activity patterns, which are pertinent to various types of planning and policy analysis. Based on data for such short periods, these space–time (S-T) paths might not be able to show regular activity patterns, which are pertinent to various types of planning and policy analysis. Travel data gathered for longer periods from social media might capture regular activity patterns, but footprints captured by these data also include irregular activities, introducing noises or uncertainty. We therefore introduced a methodology to construct individual representative space–time paths using various aggregation and spatiotemporal clustering techniques. To depict and visualize spatiotemporal trajectories with uncertain information, we propose space–time cones of variable sizes to reflect the spatial precision of the paths and use colors on the cones to represent the confidence level.

Figure 1. Comparison of traditional S-T line paths and the proposed S-T cone paths with one hour as time window interval

For details about this work, please refer to our paper here

Huang Q., Wong D., 2015. Modeling and Visualizing Regular Human Mobility Patterns with Uncertainty: An Example Using Twitter Data. Annals of the Association of American Geographers, 105(6): 1179-1197. Download

RA#2. Detecting collective and individual activities with digital footprints and open source geographic data

Digital footprints collected from social media platforms are often clustered using methods such as the density-based spatial clustering of applications with noise (DBSCAN) and its variants to identify daily travel activities (e.g., dwelling, working, entertainment, and eating). However, these clustering methods mostly only consider the spatial distribution of travel activity points while ignoring their geographic context, resulting in the aggregation of digital footprints representing different activity types into one cluster. In addition, existing works only focus on examining people’s travel activities at either the collective (i.e., macro) or individual (i.e., micro) level. To this end, this study utilizes geographic context information and develops a novel activity knowledge discovery framework to better detect frequent travel activities at both levels. First, we develop a multi-level spatial clustering method to aggregate digital footprints of a group of users into collective clusters (i.e., activity zones) by inferring and integrating the underlying activities performed at each zone with OpenStreetMap (OSM) datasets that can inform geographic context of the activity zones. Next, we introduce a location-aware clustering method to detect activity zones and associate activity types at the individual level by aggregating individual footprints based on the collective results.

Figure 2. Visualization of semantic collective activity zones within a UW-Madison campus

For details about this work, please refer to our paper here

Liu X., Huang Q., Gao S. and Xia J., 2020. Activity knowledge discovery: Detecting collective and individual activities with digital footprints and open source geographic data. Computers, Environment and Urban Systems, 85, p.101551. DOI: 1016/j.compenvurbsys.2020.101551. Download

RA#3. Inferring individual travel activities with graph neural networks

Individual daily travel activities (e.g., work, eating) are identified with various machine learning models (e.g., Bayesian Network, Random Forest) for understanding people’s frequent travel purposes. However, labor-intensive engineering work is often required to extract effective features. Additionally, features and models are mostly calibrated for individual trajectories with regular daily travel routines and patterns, and therefore suffer from poor generalizability when applied to new trajectories with more irregular patterns. Meanwhile, most existing models cannot extract features to explicitly represent regular travel activity sequences. Therefore, this paper proposes a graph-based representation of spatiotemporal trajectories and point-of-interest (POI) data for travel activity type identification, defined as Gstp2Vec. Specifically, a weighted directed graph is constructed by connecting regular activity areas (i.e., zones) detected via clustering individual daily travel trajectories as graph nodes, with edges denoting trips between pairs of zones. Statistics of trajectories (e.g., visit frequency, activity duration) and POI distributions (e.g., percentage of restaurants) at each activity zone are encoded as node features. Next, trip frequency, average trip duration, and average trip distance are encoded as edge weights. Then a series of feedforward neural networks are trained to generate low-dimensional embeddings for activity nodes through sampling and aggregating spatiotemporal and POI features from their multihop neighborhoods. Activity type labels collected via travel surveys are used as ground truth for backpropagation.

The experiment results with real-world GPS trajectories show that Gstp2Vec significantly reduces feature engineering efforts by automatically learning feature embeddings from raw trajectories with minimal prepossessing efforts. It not only enhances model generalizability to receive higher identification accuracy on test individual trajectories with diverse travel patterns, but also obtains better efficiency and robustness. In particular, our identification of the most common daily travel activities (e.g., Dwelling and Work) for people with diverse travel patterns outperforms state-of-the-art classification models.

Figure 3. Graph based representation learning for individual travel activity type identification

For details about this work, please refer to our paper here

Liu X., Wu M., Peng B., and Huang Q., 2022. Graph-based representation for identifying individual travel activities with spatiotemporal trajectories and POI data. Scientific Report. DOI : 10.1038/s41598-022-19441-9. Download