书签分享收藏举报版权申诉 / 15

立即下载

当前位置：首页 > 研究报告 > 医学相关 > Query-aware sparse coding for web multi-video summarization.pdf

Query-aware sparse coding for web multi-video summarization.pdf

上传人：恋****泡

文档编号：789511

上传时间：2019-07-12

格式：PDF

页数：15

大小：3.07MB

( 4.5 )

《Query-aware sparse coding for web multi-video summarization.pdf》由会员分享，可在线阅读，更多相关《Query-aware sparse coding for web multi-video summarization.pdf（15页珍藏版）》请在得力文库 - 分享文档赚钱的网站上搜索。

1、Information Sciences 478 (2019) 152166 Contents lists available at ScienceDirect Information Sciences journal homepage: Query-aware sparse coding for web multi-video summarization Zhong Ji a , Yaru Ma a , Yanwei Pang a , , Xuelong Li b a School of Electrical and Information Engineering, Tianjin Uni

2、versity, Tianjin 30 0 072, China b Xian Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xian 710119, China a r t i c l e i n f o Article history: Received 28 March 2018 Revised 20 September 2018 Accepted 23 September 2018 Available online 8 November 2018 Keywords: Video sum

3、marization Sparse coding Query-aware Multi-video a b s t r a c t Given the explosive growth of online videos, it is becoming increasingly important to re- lieve the tedious work of browsing and managing the video content of interest. Video sum- marization aims at providing such a technique by transf

4、orming one or multiple videos into a compact one. However, conventional multi-video summarization methods often fail to produce satisfying results as they ignore the users search intents. To this end, this pa- per proposes a novel query-aware approach by formulating the multi-video summariza- tion i

5、n a sparse coding framework, where the web images searched by a query are taken as the important preference information to reveal the query intent. To provide a user- friendly summarization, this paper also develops an event-keyframe presentation structure to present keyframes in groups of specific

6、events related to the query by using an unsu- pervised multi-graph fusion method. Moreover, we release a new public dataset named MVS1K, which contains about 10 0 0 videos from 10 queries and their video tags, manual annotations, and associated web images. Extensive experiments on the MVS1K and TVSu

7、m datasets demonstrate that our approaches produce competitively objective and subjective results. 2018 Published by Elsevier Inc. 1. Introduction The rapid growth of video data has steadily occupied the vast majority of network traffic. For example, YouTube, as one of the primary online video shari

8、ng website, serves over 300 h video upload per minute in April 2018. This massive amount of video has increased the demand for efficient ways to browse and manage desired video content 17,24,29,30,37 . However, given an event query, search engines usually return thousands or even more videos, which

9、are quite noisy, redundant, and even irrelevant. This makes it difficult for users to grasp the focus of the whole event, forcing them to spend a lot of time and effort to explore the main content of the returned videos. Multi-Video Summarization (MVS) is one of the effective ways to tackle this pro

10、blem. It extracts the essential information of multi-video frames as keyframes to produce a condensed and informative version. That is to say, its goal is to generate a single summary to describe a large number of retrieved videos. In this way, it empowers the users to quickly browse and comprehend

11、a large amount of video content. One key challenge of MVS is to accurately access the users search intents, that is, to generate query-aware summa- rization. Consequently, a surge of effort s have been carried out along this thread. These effort s can be divided into three Corresponding author. E-ma

12、il addresses: (Z. Ji), (Y. Ma), (Y. Pang), xuelong_ (X. Li). https:/doi.org/10.1016/j.ins.2018.09.050 0020-0255/ 2018 Published by Elsevier Inc. Z. Ji, Y. Ma and Y. Pang et al. / Information Sciences 478 (2019) 152166 153 Fig. 1. The MVS pipeline of the proposed QUASC and MGF approaches. categori

13、es: searching-based approach 1,13,45 , learning-based approach 16,29,30,37 , and fusion-based approach 14,20,34 . Specifically, the searching-based one prefers to select those video frames with high similarities to the searched web images as the keyframes in summarization 1,13,45 . The idea behind i

14、t is that the searched web images returned by the search engines generally reflect the search intents for a specific query, thus the generated MVS is query-aware. However, this type of approach tends to produce several redundant keyframes in a summarization since there are always some frames having

15、high similarity in multiple videos. The learning-based one selects the keyframes by building a learning model 16,29,30,37 . For example, Besiris et al. 2 apply a multiple instance learning model to localize the tags into video shots and select the query-aware keyframes in accordance with the tags. I

16、t achieves satisfactory performance on query-video dataset. However, there is an obstacle to scale such N-way discrete classifiers beyond a limited number of discrete query categories 20 . Recently, there are considerable interests on fusing the ideas of the above two types of approaches to overcome

17、 their re- spective drawbacks. Some pioneering fusion-based approaches formulate the MVS problem in a graph model 14 , concept learning model 34 , and multi-task learning model 20 , respectively. On the other hand, sparse coding technique is effective and widely used in Single Video Summarization (S

18、VS) 6,21 . It formulates the keyframe selection problem as a coefficient selection one, which guarantees the general properties of SVS, such as conciseness and representativeness. However, it is inappropriate for MVS to directly utilize sparse coding since there are plenty of irrelevant or less rele

19、vant content to the query in multiple videos. Otherwise, the summarization will contain several noisy or unimportant keyframes, which weakens the conciseness and representativeness. A natural idea is taking advantage of the searched web images to emphasize the important content in the sparse coding

20、framework. However, it is still an unsolved challenging problem. To deal with this challenge, we present a QUery-Aware Sparse Coding (QUASC) method that generates the query- dependent MVS by fusing the ideas of sparse coding technique and searching-based MVS approach. Moreover, to present the summar

21、ization in a friendly manner, we also develop a novel Event-Keyframe Presentation (EKP) structure with a novel Multi-Graph Fusion (MGF) approach to present keyframes in groups of specific events related to the query. The MVS frame- work of the proposed QUASC and MGF is illustrated in Fig. 1 . It is

22、worthwhile to highlight several aspects of the proposed methods: (1) A novel QUery-Aware Sparse Coding (QUASC) method for multi-video summarization is proposed. It formulates the multi-video summarization in a sparse coding framework, where the web images searched by the query are taken as the impor

23、tant preference information to reveal the query intent. 154 Z. Ji, Y. Ma and Y. Pang et al. / Information Sciences 478 (2019) 152166 (2) A user-friendly summarization representation structure is developed, which presents the keyframes in groups of specific events related to the query. (3) A new publ

24、ic dataset named MVS1K is released. 1 It contains about 1, 0 0 0 videos from 10 queries and their video tags, manual annotations, and associated web images. To the best of our knowledge, it is the largest public multi-video sum- marization dataset. Both our data and code will be made available. The

25、rest of the paper is organized as follows. Previous work on video summarization and sparse coding-based video summarization methods are discussed in the following section. The proposed QUASC method is introduced in Section 3 . Section 4 describes the proposed keyframe presentation method in detail,

26、followed by a description of the MVS1K dataset in Section 5 . Section 6 concludes the paper. 2. Related work 2.1. Video summarization Video summarization has received much attention in recent years due to the urgent demand to digest a long video or a considerable amount of short videos for users eff

27、icient browsing and understanding. Although great progress has been made, creating relevant and compelling summary for many arbitrary length of videos with a small number of keyframes or clips is still a challenging task. Generally, a good summarization should satisfy three properties: (1) concisene

28、ss, (2) representativeness, and (3) infor- mativeness. In particular, conciseness is also called minimum information redundancy, which refers to there should be few duplicate or similar content in a video summarization. It guarantees that the video summary is not only easy to be browsed, but also re

29、duces the requirements for storage. Representativeness is also known as maximum information coverage, which refers to that the summarization should represent the video content as much as possible, so that it is conducive to the overall understanding of the video. Informativeness means the criterion

30、of important information preference, which refers to the most important and relevant information is preferred in the summarization. According to the number of videos to be summarized, there are Single-Video Summarization (SVS) and Multi-Video Sum- marization (MVS). Although they share similar goals,

31、 MVS differs SVS in the following aspects. (1) MVS should be query- aware to accurately reflect the users search intents, whereas SVS does not require to consider the users intents. (2) The multiple videos have mutual influence on each other since their contents are about the same query. (3) The fin

32、al summa- rization presentation in MVS is harder than that in SVS, since there is no chronological order for multiple videos. SVS has a relatively long research history, and a detailed review can be referred to 28,36 . In the following, we will introduce the recent work on MVS in detail. Recently, m

33、any studies address their attentions to MVS. For example, Lu and Grauman 22 propose a saliency based approach by training a linear regression model to predict the importance score for each frame in egocentric videos. Mo- tivated by the observation that important visual concepts tend to appear repeat

34、edly across videos of the same topic, Chu et al. 4 propose a Maximal Biclique Finding (MBF) algorithm that is optimized to find sparsely co-occurring patterns across videos collected by using a topic keyword. Nie et al. 29 propose a novel MVS method for handheld videos. They first de- sign a weakly

35、supervised video saliency model to select those frames with semantically important regions as keyframes, and then develop a probabilistic model to fit the keyframes into MVS by jointly optimizing multiple attributes of aesthetics, coherence, and stability. Besides the visual information, Li and Meri

36、aldo 18 also exploit acoustic information in the videos to assist the construction of MVS with the idea of Maximal Marginal Relevance borrowed from text summarization domain. However, these approaches neglect the users search intents, which may not be adequate to satisfy their requirement. Consequen

37、tly, several researches tend to study the methods associated with query to cater to the search intents. One of the promising trends is fusing the idea of searching-based and learning-based approaches. For example, Kim et al. 14 ad- dress the problem of jointly summarizing large sets of Flickr images

38、 and YouTube videos, where the video summarization is achieved by diversity ranking on the similarity graphs between images and video candidate frames. The reconstruction of storyline graphs is formulated as the inference of sparse time-varying directed graphs from a set of photo streams with the as

39、sistance of videos. Observed that images related to the query can serve as a proxy for important visual concepts of the main topic, CAA method 34 uses title-based image search results to find the visually important keyframes as video summarization. Specifically, it learns canonical visual concepts s

40、hared between video and images, by finding a joint-factorial representation of two data sets. Motivated by the idea of zero-shot learning 8,12 , Liu et al. 20 adopt a large-scale click- through based video and image data to learn a visual-semantic embedding model to bridge a mapping between the visu

41、al information and the textual query. Thus, it could predict the relevance between unseen textual and visual information. In this way, only those frames related to the query can be chosen as keyframes. 1 http:/ . Z. Ji, Y. Ma and Y. Pang et al. / Information Sciences 478 (2019) 152166 155 Fig. 2. Th

42、e diagram of the QUASC approach. 2.2. Sparse coding approaches in video summarization Spare coding has been widely used in data representation 15,19,35,41 . There are several methods that formulate the single video summarization as a sparse coding problem. That is to say, using the sparse coding met

43、hod builds a learning model to obtain the video summarization 6,21,23,25,26 . It satisfies the properties of representativeness and conciseness. For example, Cong et al. 6 propose a summarization method for consumer videos, which uses an L 2, 1 norm to regulate the coefficient matrix. Liu et al. 21

44、adopt a similar method with 6 to generate a summarization for user-generated-video. To overcome the weakness of L 1 norm and L 2,1 norm, Mei et al. use L 2,0 norm 25 and L 0 norm 26 in the sparse coding framework to generate video summarization, respectively. All the above sparse coding-based method

45、s focus on single video summarization, in which the keyframes are taken as the base vectors in the dictionary model. In addition, they consider little about criterion of informativeness, i.e., the most important and relevant information should be preferably chosen in the summarization. Recently, spa

46、rse coding is also used for MVS. Panda et al. 30 propose a new approach for mutilple tour video summarization by adding an interestingness and a diversity regularizers in a sparse coding framework. Different from them, QUASC focuses on query-based multi-video summarization, and introduces the web im

47、ages searched from Internet to the sparse coding model to put more emphasis on the important content, thus criterion of informativeness can be guaranteed. Therefore, from the aspects of data source (single video or multiple videos) and the learning model, QUASC is quite different from existing spars

48、e coding-based approaches. 3. The proposed QUASC method This section presents the proposed QUASC method, in which both the candidate keyframes and the searched web images are employed to reconstruct the semantic topic space in a space coding framework. In this way, each candidate keyframe will be assigned an important score to denote its contribution in the semantic topic space. Therefore, the summarization can be generated by selecting those candidate keyframes with higher important scores. The diagram is depicted in Fig. 2 . Let X = x 1 , . . . , x i , . . . , x N R d

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 金币

版权申诉 word格式文档无特别注明外均可编辑修改；预览文档经过压缩，下载后原文更清晰！ 立即下载

配套讲稿：: 如PPT文件的首页显示word图标，表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
特殊限制：: 部分文档作品中含有的国旗、国徽等图片，仅作为作品整体效果示例展示，禁止商用。设计者仅对作品中独创性部分享有著作权。
关键词：: Query awaresparsecodingforwebmulti videosummarization

得力文库 - 分享文档赚钱的网站所有资源均是用户自行上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作他用。

限制150内

关于本文

本文标题：Query-aware sparse coding for web multi-video summarization.pdf
链接地址：https://www.deliwenku.com/p-789511.html