hive实战.pdf
《hive实战.pdf》由会员分享,可在线阅读,更多相关《hive实战.pdf(9页珍藏版)》请在得力文库 - 分享文档赚钱的网站上搜索。
1、1.案例 1.1.zebra 的大数据实现 1.1.1.zebra 的大数据实现 使用 flume 收集数据-落地到 hdfs 系统中-创建 hive 的外部表管理 hdfs 中收集到的日志-利用 hql 处理 zebra 的业务逻辑-使用 sqoop 技术将 hdfs 中处理完成的数据导出到 mysql 中 1.2.使用 flume hive hadoop sqoop 来处理 zebra 业务。1.2.1.使用 flume hive hadoop sqoop 来处理 zebra 业务。hadoop02 hadoop03:命名 Agent a1 的组件 a1.sources=r1 a1.sin
2、ks=k1 a1.channels=c1#描述/配置 Source a1.sources.r1.type=spooldir a1.sources.r1.spoolDir=/root/work/zebradata a1.sources.r1.interceptors=i1 a1.sources.r1.interceptors.i1.type=timestamp#描述 Sink a1.sinks.k1.type=avro a1.sinks.k1.hostname=192.168.242.201 a1.sinks.k1.port=41414#描述内存 Channel a1.channels.c1.t
3、ype=memory a1.channels.c1.capacity =100000 a1.channels.c1.transactionCapacity =100#为 Channel 绑定 Source 和 Sink a1.sources.r1.channels=c1 a1.sinks.k1.channel=c1 hadoop01:命名 Agent a1 的组件 a1.sources =r1 a1.sinks =k1 a1.channels =c1 描述/配置 Source a1.sources.r1.type =avro a1.sources.r1.bind =0.0.0.0 a1.sou
4、rces.r1.port =41414 描述 Sink a1.sinks.k1.type =hdfs a1.sinks.k1.hdfs.path=hdfs:/CentOS01:9000/zebra/reportTime=%Y-%m-%d-%H-00-00 a1.sinks.k1.hdfs.fileType=DataStream a1.sinks.k1.hdfs.filePrefix=%Y-%m-%d-%H-%M-%S a1.sinks.k1.hdfs.fileSuffix=.data a1.sinks.k1.hdfs.rollInterval=10 a1.sinks.k1.hdfs.rollS
5、ize=0 a1.sinks.k1.hdfs.rollCount=0 描述内存 Channel a1.channels.c1.type =memory a1.channels.c1.capacity =1000 a1.channels.c1.transactionCapacity =100 为 Channle 绑定 Source 和 Sink a1.sources.r1.channels =c1 a1.sinks.k1.channel =c1 ./flume-ng agent-conf./conf -Xms128m-Xmx256m -conf-file./conf/zebra.conf-nam
6、e a1-Dflume.root.logger=INFO,console 1.3.zebra 的业务处理步骤 1.3.1.zebra 的业务处理步骤 77 个字段-去除多余字段-根据业务规则进行业务处理得到业务字段-再根据最终的业务需求进行处理得到结果 create database zebra;use zebra;1.4.导入原始数据 1.4.1.导入原始数据 create EXTERNAL table zebra(a1 string,a2 string,a3 string,a4 string,a5 string,a6 string,a7 string,a8 string,a9 string
7、,a10 string,a11 string,a12 string,a13 string,a14 string,a15 string,a16 string,a17 string,a18 string,a19 string,a20 string,a21 string,a22 string,a23 string,a24 string,a25 string,a26 string,a27 string,a28 string,a29 string,a30 string,a31 string,a32 string,a33 string,a34 string,a35 string,a36 string,a3
8、7 string,a38 string,a39 string,a40 string,a41 string,a42 string,a43 string,a44 string,a45 string,a46 string,a47 string,a48 string,a49 string,a50 string,a51 string,a52 string,a53 string,a54 string,a55 string,a56 string,a57 string,a58 string,a59 string,a60 string,a61 string,a62 string,a63 string,a64 s
9、tring,a65 string,a66 string,a67 string,a68 string,a69 string,a70 string,a71 string,a72 string,a73 string,a74 string,a75 string,a76 string,a77 string)partitioned by(reportTime string)row format delimited fields terminated by|stored as textfile location/zebra;ALTER TABLE zebra add PARTITION(reportTime
10、=2016-12-18)location/zebra/reportTime=2016-12-18;1.5.清洗数据 1.5.1.清洗数据 从原来的 77 个字段变为 23 个字段 create table dataclear(reporttime string,appType bigint,appSubtype bigint,userIp string,userPort bigint,appServerIP string,appServerPort bigint,host string,cellid string,appTypeCode bigint,interruptType String,
11、transStatus bigint,trafficUL bigint,trafficDL bigint,retranUL bigint,retranDL bigint,procdureStartTime bigint,procdureEndTime bigint)row format delimited fields terminated by|;insert overwrite table dataclear select reportTime,a23,a24,a27,a29,a31,a33,a59,a17,a19,a68,a55,a34,a35,a40,a41,a20,a21 from
12、zebra;insert overwrite table dataclear select reportTime,a23,a24,a27,a29,a31,a33,a59,a17,a19,a68,a55,a34,a35,a40,a41,a20,a21 from zebra;1.6.处理业务逻辑 1.6.1.处理业务逻辑 得到 dataproc 表 create table dataproc(reporttime string,appType bigint,appSubtype bigint,userIp string,userPort bigint,appServerIP string,appS
13、erverPort bigint,host string,cellid string,attempts bigint,accepts bigint,trafficUL bigint,trafficDL bigint,retranUL bigint,retranDL bigint,failCount bigint,transDelay bigint)row format delimited fields terminated by|;insert overwrite table dataproc SELECT reporttime,appType,appSubtype,userIp,userPo
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- hive 实战
限制150内