Activity Knowledge Discovery: Detecting Collective and Individual Activities with Digital Footprints and Open Source Geographic Data

Digital footprints collected from social media platforms are often clustered using methods such as the density-based spatial clustering of applications with noise (DBSCAN) and its variants to identify daily travel activities (e.g., dwelling, working, entertainment, and eating). However, these clustering methods mostly only consider the spatial distribution of travel activity points while ignoring their geographic context, resulting in the aggregation of digital footprints representing different activity types into one cluster. In addition, existing works only focus on examining people’s travel activities at either the collective (i.e., macro) or individual (i.e., micro) level. To this end, this study utilizes geographic context information and develops a novel activity knowledge discovery framework to better detect frequent travel activities at both levels. First, we develop a multi-level spatial clustering method to aggregate digital footprints of a group of users into collective clusters (i.e., activity zones) by inferring and integrating the underlying activities performed at each zone with OpenStreetMap (OSM) datasets that can inform geographic context of the activity zones. Next, we introduce a location-aware clustering method to detect activity zones and associate activity types at the individual level by aggregating individual footprints based on the collective results. As case studies, digital footprints from 49 selected users are analyzed to evaluate the proposed framework. The results reveal that: (1) The multi-level spatial clustering method can often detect significant collective activity zones; and (2) The location-aware clustering method can aggregate individual digital footprints into activity zones more effectively compared with existing density-based spatial clustering methods (e.g., DBSCAN and multi-scaled DBSCAN).

File: 2020_CEUS_Activity-Knowledge-Discovery_Detecting-Collective_and_Individual_Activities_with_Digital_Footprints_and_Open_Source_Geographic_Data.docx

Activity patterns, socioeconomic status and urban spatial structure: what can social media data tell us?

Individual activity patterns are influenced by a wide variety of factors. The more important ones include socioeconomic status (SES) and urban spatial structure. While most previous studies relied heavily on the expensive travel-diary type data, the feasibility of using social media data to support activity pattern analysis has not been evaluated. Despite the various appealing aspects of social media data, including low acquisition cost and relatively wide geographical and international coverage, these data also have many limitations, including the lack of background information of users, such as home locations and SES. A major objective of this study is to explore the extent that Twitter data can be used to support activity pattern analysis. We introduce an approach to determine users’ home and work locations in order to examine the activity patterns of individuals. To infer the SES of individuals, we incorporate the American Community Survey (ACS) data. Using Twitter data for Washington, DC, we analyzed the activity patterns Twitter users with different SESs. The study clearly demonstrates that while SES is highly important, the urban spatial structure, particularly where jobs are mainly found and the geographical layout of the region, plays a critical role in affecting the variation in activity patterns between users from different communities.

File: Activity-patterns-socioeconomic-status-and-urban-spatial-structure-what-can-social-media-data-tell-us.docx

Exploring the Uncertainty of Activity Zone Detection Using Digital Footprints with Multi-Scaled DBSCAN

While exploring mobility patterns based on digital footprints captured from social networks, the density-based spatial clustering of applications with noise (DBSCAN) method is often used to identify activity zones which an individual regularly visits. However, DBSCAN is sensitive to the two parameters, including the search radius of a cluster (eps), and the minimum number of points (minpts). This paper first discusses the uncertainty while detecting an individual’s activity zones through digital footprints. An improved density-based clustering algorithm for mobility analysis known as Multi-Scaled DBSCAN (M-DBSCAN), is then presented to mitigate the detection uncertainty of clusters produced by DBSCAN at different scales of density and cluster size. Next, we demonstrate that M-DBSCAN iteratively calibrates suitable local eps and minpts values instead of using one global parameter setting as DBSCAN for detecting clusters of varying densities, and proves to be very effective for detecting potential activity zones (clusters) with the historic geo-tagged tweets of selected users. Besides, M-DBSCAN can significantly reduce the noise ratio (the proportion of trajectory points not included in any cluster) by identifying all points capturing the activities performed in each zone. Using the historic geo-tagged tweets of a large number of users in Madison, Wisconsin and Washington, D.C., the results of M-DBSCAN and DBSCAN with a minpts value of 4 and varying eps values reveal that: 1) M-DBSCAN can capture dispersed clusters with low density of points, and therefore detecting more activity zones for each user and resulting in a lower noise ratio; 2) A value of 40m or higher should be used for eps in order to reduce the possibility of collapsing distinctive activity zones, and ensure a relatively low noise ratio during the clustering process; and 3) A value between 200m to 300m is recommended for eps while using DBSCAN for detecting activity zones.

File: manuscript-final.docx

Mining Online Footprints to Predict User’s Next Location

Social media applications are widely deployed in mobile platforms equipped with built-in GPS tracking devices, and these devices have led to an unprecedented collection of geolocated data (geo-tags). Geo-tags, along with place names, offer new opportunities to explore the trajectory and mobility patterns of social media users. However, trajectory data captured by social media are sparsely and irregularly spaced and therefore have varying degrees of resolution in both space and time. Previous studies on next location prediction are mostly applicable for detecting the upcoming location of a moving object using dense GPS trajectories where locations are recorded at regular time intervals (e.g., one minute). Additionally, point features are commonly used to represent the locations of visits, but using point features cannot capture the variability of human mobility. This paper introduces a new methodology to predict an individual’s next location based on sparse footprints accumulated over a long time period using social networks, and uses polygons to represent the location corresponding to the physical activity area of individuals. First, the DBSCAN clustering algorithm is employed to discover the most representative activity zones that an individual frequently visits on a daily basis, and a polygon-based region is then derived for each representative activity zone. A Sparse Mobility Markov Chain Model (SMMC) considering both the movements and online behaviors of the social media user is trained and used to predict the user’s next location. Initial experiments with a group of Washington DC Twitter users demonstrate that the proposed methodology successfully discovers the activity regions and predicts the user’s next location with accuracy approaching 78.94%.

File: Mining-Online-Footprints-to-Predict-Users-Next-Location.doc

Modeling and Visualizing Regular Human Mobility Patterns with Uncertainty: An Example Using Twitter Data

Traditional space-time paths show the spatiotemporal trajectories of individuals in one to several days. Based on data for such short periods, these space-time paths may not be able to show regular activity patterns, which are pertinent to various types of planning and policy analysis. Travel data gathered for longer periods may capture regular activity patterns, but footprints captured by these data also include irregular activities, introducing noises or uncertainty. Our objective is to determine the representative spatiotemporal trajectories of individuals, accounting for stochastic disturbances and spatiotemporal variability, but using activity data with longer duration. Therefore, we explore using Twitter data which have relatively low and irregular spatial and temporal resolutions. This article introduces a methodology to construct individual representative space-time paths using various aggregation and spatiotemporal clustering techniques. To depict and visualize spatiotemporal trajectories with uncertain information, we propose "space-time cones" of variable sizes to reflect the spatial precision of the paths and use colors on the cones to represent the confidence level. To illustrate the proposed methodology, we use the geo-tagged tweets for an extended period. Our analysis indicates that the representative space-time path reasonably describes an individual’s regular activity patterns. As visual elements, cones and cone colors effectively show the varying geographical precision along the path and changing certainty levels across different path segments, respectively.

File: Modeling-and-Visualizing-Regular-Human-Mobility-Patterns-with-Uncertainty.docx