Reference: Scheele C., Yu M. and Huang Q., 2021. Geographic context-aware text mining: enhance social media message classification for situational awareness by integrating spatial and temporal features. International Journal of Digital Earth, pp.1-23. DOI: https://doi.org/10.1080/17538947.2021.1968048. Download
Social media data are widely used in disaster management for event detection, response, and recovery. To find disaster relevant social media messages and automatically categorize them into different classes (e.g. damage or donation), current approaches utilize natural language processing methods based on keywords, or machine learning algorithms relying on text only. However, these classification approaches have not been perfected due to the variability and uncertainty in language used on social media and ignoring the geographic context of the messages when posted. Meanwhile, a disaster relevant social media message is highly sensitive to its posting location and time. Thus, additional features related to space and time could be useful for differentiating relevant posts by informing its geographic context, and therefore improving purely text-based approaches. However, limited studies exist to explore what spatial features and the extent of how temporal, and especially spatial features can aid text classification. To fill the research gap, this paper proposes a context-aware text mining method to incorporate spatial and temporal information derived from social media and authoritative datasets (e.g., Earth observations, physical model output, official reports), along with the text information, for classifying disaster relevant social media posts. With the 2012 Hurricane Sandy as a case study, we designed and demonstrated how diverse types of spatial features, such as wind, flooding, and proximity, and temporal features can be derived from spatial data, and then used to enhance text mining. The deep learning based method, convolution neural networks, and commonly used machine learning algorithms (e.g., support vector machine), assessed the accuracy of the enhanced text-mining method. The performance results of different classification models generated by various combinations of textual, spatial, and temporal features indicate that additional spatial and temporal features help improve the overall accuracy of the classification by 4 percentage points on average. This study demonstrates the need and provides a guidance for the incorporation of geographic data sources to improve data retrieval while leveraging social media for disaster applications.
- First, the paper introduces a methodology for integrating geographic context into classifying disaster relevant social media datasets by fusing spatial data with social media. The method addresses the shortcomings of utilizing only text to identify and extract disaster relevant social media data when considering geographic context is necessary.
- Second, this paper demonstrates how to best process spatial and temporal data to derive associated features for classifying and identifying disaster relevant information.
- Third, this paper assesses the types of spatial and temporal features necessary for the classification of disaster relevant social media data. Both domain and non-domain specific features are included in the assessment.
- Finally, this work evaluates both traditional machine learning algorithms (e.g., SVM), and the state-of-the-art work deep learning based method, CNNs, on SA information classification with spatial, temporal and textual features.
Read the full article at: https://www.tandfonline.com/doi/full/10.1080/17538947.2021.1968048