Sensing the city with Instagram-Clustering geolocated data for outlier detection. Expert Systems with Applications.pdf
《Sensing the city with Instagram-Clustering geolocated data for outlier detection. Expert Systems with Applications.pdf》由会员分享,可在线阅读,更多相关《Sensing the city with Instagram-Clustering geolocated data for outlier detection. Expert Systems with Applications.pdf(15页珍藏版)》请在得力文库 - 分享文档赚钱的网站上搜索。
1、Expert Systems With Applications 78 (2017) 319333 Contents lists available at ScienceDirect Expert Systems With Applications journal homepage: Sensing the city with Instagram: Clustering geolocated data for outlier detection Daniel Rodrguez Domnguez , Rebeca P. Daz Redondo , Ana Fernndez Vilas , Moh
2、amed Ben Khalifa Information Zheng, Xie, online relationships; trajectories and mobility patterns ( Cheng, Caverlee, Lee, friend-based location recom- Corresponding author. E-mail addresses: danieldet.uvigo.es (D.R. Domnguez), rebecadet.uvigo.es (R.P. Daz Redondo), avilasdet.uvigo.es (A.F. Vilas), m
3、bkdet.uvigo.es (M.B. Khal- ifa). mendations ( Ye, Yin, or crowd detection and analysis ( Sprake (ii) does not assume a specifi c number of crowds or a initial position of crowds in the area; and (ii) does not only focus on the detection of unusually big crowds, but on the detection of any kind of ab
4、normal behavior (small crowds, absence or presence of crowds in areas where the normal behavior is the opposite). Finally, our approach was validated using a dataset of geo-tagged posts obtained from Instagram in New York City for almost six months with good results. Actually, not only all the alrea
5、dy previously known events where detected, but also other unknown events where discovered during the experiment. The remainder of this paper is organized as follows. Section 2 provides an overview of other proposals in the crowd and events detection fi eld, especially those which are based on social
6、 media data. After summarizing the main techniques used in our approach (clustering and outlier detection) in Section 3 , our methodology is detailed in Section 4 . The criteria which lead us to use Instagram as data source are explained in Section 5 , whereas the experiment is detailed in Section 6
7、 . The obtained results are discussed in Section 7 and, fi nally, Section 8 is devoted to conclusions and future work. 2. Related work In the crowd analysis fi eld, multiple crowd detection systems have been proposed for different applications, although those for smart cities clearly stand out. Some
8、 works are based in the use of video, like low quality infra-red videos, which are used in Nanda and Davis (2002) to detect pedestrians; visible light video, used in Reisman, Mano, Avidan, and Shashua (2004) to detect groups of people from moving vehicles; and sequences of images, like applying opti
9、cal fl ow and density-based clustering to detect crowds, their movement and their evolution in Santoro, Pedro, Tan, and Moeslund (2010) . Also based in sequences of images, Andrade, Blunsden, and Fisher (2006) proposes applying spectral clustering to fi nd the optimal number of models to represent n
10、ormal motion patterns, whereas similar techniques are applied in Hamid et al. (2005) and Zhang, Gatica-Perez, Bengio, and McCowan (2005) for unusual events detection. Focused as well in the rare events detec- tion, in Xu, Denman, Fookes, and Sridharan (2016) a weakly super- vised approach using Kull
11、backLeibler (KL) divergence is applied. However, the analysis of data gathered from LBSNs have be- come an interesting option. Proactive LBSNs users can be seen as sensors, constantly providing high volume of data of different nature (text, images, temperature, location, etc.) from the same de- vice
12、 (mobile devices). Humans as sensors constitute an alternative or supplement to the costly sensing systems which are tradition- ally deployed all around urban areas. Additionally, and due to the ubiquity of social media, these applications may be easily used in new areas, without the installation an
13、d maintenance costs of ded- icated sensor networks. This advantages have led to the use of this new data source for crowd analysis as well. In Adedoyin-Olowe, Gaber, Dancausa, Stahl, and Gomes (2016) , for instance, tweets are analyzed to extract newsworthy content in sports and politics. Other appr
14、oaches try to detect natural disasters, like earthquakes ( Sakaki, Okazaki, to handle noise, as points in sparse regions should not be considered to belong in any cluster; and to work without knowing the number of clusters in advance, because the number of crowds can vary depending on several extern
15、al variables. Taking this considerations into account, density-based algorithms present the best characteristics to our purposes. First, they can discover clusters with arbitrary shapes, while distance- based methods are limited to spherical-shaped clusters. Second, the number of clusters is not nee
16、ded as a parameter of the algo- rithm. Finally, density-based methods consider sparse regions as noise. Therefore all the demanded characteristics are fulfi lled. A review of different approaches to density-based meth- ods is performed in NafeesAhmed and Abdul Razak (2014) . We mainly considered two
17、 of these algorithms: (i) DBSCAN ( Ester, Kriegel, Sander, and (ii) OPTICS ( Ester, Kriegel, Sander, and in univariate or multivariate techniques, according to the number of variables used. In our case, an outlier is a cluster whose number of points differs from the number of points of other cluster
18、s that were found in a similar location, day and hour. Therefore, we need a univariate technique, since we only consider the number of points as a variable, and non-parametric, since we do not know the distribution of points. There are two main options in these cases ( Acuna ben Khalifa et al., 2016
19、; Kumar et al., 2011 ; Sakaki et al., 2010 ). The Twitter Search API provides the endpoints to re- cover tweets that were published in the previous two weeks, with the possibility of fi ltering according to several criteria, including location. On the other hand, Twitter Streaming API returns 1% of
20、the tweets that match some search parameters in real time. Finally, Twitter Firehose provide access to the 100% of the tweets, but it is not a free-access API. Furthermore, according to Morstatter, Pfeffer, Liu, and Carley (2013) , the geo-located tweets returned by the Streaming API cover up to the
21、 90% of the geo-located tweets extracted from Firehose API. However, this study also reveals that the number of geo-located tweets is low, being only a 1.45% of the tweets obtained from Firehose API and a 3.17% of the tweets obtained from Streaming API. Therefore, although all the facilities provide
22、d by the APIs and its popularity make Twitter a good candidate to be our data source, we considered other OSNs needed to be analysed in order to check if some of them provide a higher number of geo-tagged data that can be easily extracted through an Application Program- ming Interface (API). Foursqu
23、are and Instagram were our main options, since some studies have been conducted using data from this two platforms. In the case of Instagram, the behavior of the users, as well as the content of their posts has been analysed in Hu, Manikonda, Kambhampati et al. (2014) and Hochman and Manovich (2013)
24、 . The latter also includes an analysis of different publication patterns taking into account the location and the time Fig. 4. Comparative of the number of posts in the area of NYC during October 2015. of the posts. The studies conducted about Foursquare follow a similar line: analysing the emergen
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 数字图像处理
链接地址:https://www.deliwenku.com/p-4062482.html
限制150内