博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
面试笔试题之hql
阅读量:3962 次
发布时间:2019-05-24

本文共 6438 字,大约阅读时间需要 21 分钟。

1.现有这么一批数据,现要求出:每个用户截止到每月为止的最大单月访问次数和累计到该月的总访问次数

三个字段的意思:用户名uid,月份month,访问次数countA,2015-01,5A,2015-01,15B,2015-01,5A,2015-01,8B,2015-01,25A,2015-01,5A,2015-02,4A,2015-02,6B,2015-02,10B,2015-02,5A,2015-03,16A,2015-03,22B,2015-03,23B,2015-03,10B,2015-03,11

最后结果展示:

用户    月份        最大访问次数    总访问次数        当月访问次数A    2015-01        33            33          33A    2015-02        33            43          10A    2015-03        38            81          38B    2015-01        30            30          30B    2015-02        30            45                15B    2015-03        44            89          44

建表与导入数据

create table if not exists interview01(userId string,month string,count int) row format delimited fields terminated by ","; load data local inpath "/home/hadoop/hive_data/interview01" into table interview01;

辅助SQL:

先按userid和month分组,求出总次数counts

select userid,month,sum(count)counts from interview01 group by userid,month

最终SQL:

使用窗口函数求截止到当月的最大访问次数,直接从上表里面查

select userid,month,max(counts) over(distribute by userid sort by month)maxcount,sum(counts) over(distribute by userid sort by month)sumcount,countsfrom (select userid,month,sum(count)counts from interview01 group by userid,month)tmp;

2.求:所有数学课程成绩 大于 语文课程成绩的学生的学号

建表语句:

CREATE TABLE `course` (  `id` int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,  `sid` int(11) DEFAULT NULL,  `course` varchar(255) DEFAULT NULL,  `score` int(11) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

插入数据

字段解释:id, 学号, 课程, 成绩

INSERT INTO `course` VALUES (1, 1, 'yuwen', 43);INSERT INTO `course` VALUES (2, 1, 'shuxue', 55);INSERT INTO `course` VALUES (3, 2, 'yuwen', 77);INSERT INTO `course` VALUES (4, 2, 'shuxue', 88);INSERT INTO `course` VALUES (5, 3, 'yuwen', 98);INSERT INTO `course` VALUES (6, 3, 'shuxue', 65);

辅助SQL:

这题可以看做为一到mysql题,因为每个字段都只有一个成绩,所以这里我们要对没有的字段进行补0操作

selectsid,case coursewhen "yuwen" then score else 0 end yuwen,case course when "shuxue" then score else 0 end shuxuefrom course

然后我们从上表中操作语文和数学的最大值,然后作为一个表,直接进行查询

selectsid,shuxue,yuwenfrom(selectsid,sum(yuwen) yuwen,max(shuxue) shuxuefrom(selectsid,case coursewhen "yuwen" then score else 0 end yuwen,case course when "shuxue" then score else 0 end shuxuefrom course) agroup by sid) b where b.shuxue>b.yuwen;

结果

+------+--------+-------+| sid  | shuxue | yuwen |+------+--------+-------+|    1 |     55 |    43 ||    2 |     88 |    77 |+------+--------+-------+

3.比如:2010012325表示在2010年01月23日的气温为25度。现在要求使用hive, 计算每一年出现过的最大气温的日期+温度。

2014010114201401021620140103172014010410201401050620120106092012010732201201081220120109192012011023200101011620010102122001010310200101041120010105292013010619201301072220130108122013010929201301102320080101052008010216200801033720080104142008010516200701061920070107122007010812200701099920070110232010010114201001021620100103172010010410201001050620150106492015010722201501081220150109992015011023

建表与导入数据

create table if not exists interview03(data string);load data local inpath "/home/hadoop/hive_data/interview03" into table interview03;

辅助SQL:

先求每一年的最大温度

selectsubstr(data,1,4) year,max(cast(substr(data,-2,2) as int))max_tempfrom interview03group by substr(data,1,4)

然后将上表与原表连接,连接条件是年份与气温符合的,然后再找出那天是哪一天

selecta.year year,a.max_temp max_temp,substr(b.data,1,8) dayfrom(selectsubstr(data,1,4) year,max(cast(substr(data,-2,2) as int))max_tempfrom interview03group by substr(data,1,4))a joininterview03 b ona.year=substr(b.data,1,4) and a.max_temp=cast(substr(b.data,-2,2) as int);

结果

year    max_temp        day2001    29      200101052007    99      200701092008    37      200801032010    17      201001032012    32      201201072013    29      201301092014    17      201401032015    99      20150109

4.现有一份以下格式的数据:表示有id为1,2,3的学生选修了课程a,b,c,d,e,f中其中几门

id course 1,a 1,b 1,c 1,e 2,a 2,c 2,d 2,f 3,a 3,b 3,c 3,e

编写Hive的HQL语句来实现以下结果:

表中的1表示选修,表中的0表示未选修

id    a    b    c    d    e    f1    1    1    1    0    1    02    1    0    1    1    0    13    1    1    1    0    1    0

建表与导入数据

create table if not exists interview04(id int,course string) row format delimited fields terminated by ","; load data local inpath "/home/hadoop/hive_data/interview04" into table interview04;

首先得到所有备选课程

select collect_set(trim(course))courses from interview04;

然后求每一个学员选修的课程

selectid,collect_set(trim(course)) id_course    from interview04group by id;

这里要使用笛卡尔积关联(a表中的每一天数据关联b表所有数据),从原表中查询course字段放入set集合中,然后与上表关联,得到已选课程与备选课程

首先修改笛卡尔积参数

set hive.strict.checks.cartesian.product=false;set hive.mapred.mode=nonstrict;辅助SQL:
selectb.id id,b.id_course id_course,a.courses coursesfrom(select collect_set(trim(course))courses from interview04) a join(selectid,collect_set(trim(course)) id_course    from interview04group by id) b;
id      id_course       courses1       ["a","b","c","e"]       ["a","b","c","e","d","f"]2       ["a","c","d","f"]       ["a","b","c","e","d","f"]3       ["a","b","c","e"]       ["a","b","c","e","d","f"]

最终SQL:

判断a表与b表中元素的包含关系即可

selectid,if(array_contains(id_course,courses[0]),1,0) a,if(array_contains(id_course,courses[1]),1,0) b,if(array_contains(id_course,courses[2]),1,0) c,if(array_contains(id_course,courses[3]),1,0) e,if(array_contains(id_course,courses[4]),1,0) d,if(array_contains(id_course,courses[5]),1,0) ffrom(selectb.id id,b.id_course id_course,a.courses coursesfrom(select collect_set(trim(course))courses from interview04) a join(selectid,collect_set(trim(course)) id_course    from interview04group by id) b) c;

最终结果

id      a       b       c       e       d       f1       1       1       1       1       0       02       1       0       1       0       1       13       1       1       1       1       0       0

5.现有如下格式的一份数据,需求:编写Hive的HQL语句求出每个店铺的当月销售额和累计到当月的总销售额

uid month salea,01,150a,01,200b,01,1000b,01,800c,01,250c,01,220b,01,6000a,02,2000a,02,3000b,02,1000b,02,1500c,02,350c,02,280a,03,350a,03,250

建表与导入数据

create table if not exists interview05(uid string,month string,sale int) row format delimited fields terminated by ",";load data local inpath "/home/hadoop/hive_data/interview05" into table interview05;

最终SQL:

与第一题类似,先求总销售额,再进行开窗算出累计到当月的总销售额

select uid,month,sum(sales) over(distribute by uid sort by month)sumsales,salesfrom (select uid,month,sum(sale)sales from interview05 group by uid,month)tmp;

结果:

uid     month   sumsales        salesa       01      350     350a       02      5350    5000a       03      5950    600b       01      7800    7800b       02      10300   2500c       01      470     470c       02      1100    630

转载地址:http://owgzi.baihongyu.com/

你可能感兴趣的文章
用Gradle 构建你的android程序
查看>>
Android监听应用程序安装和卸载实现程序
查看>>
Android 监听apk安装替换卸载广播的实现代码
查看>>
Android 使用android-support-multidex解决Dex超出方法数的限制问题,让你的应用不再爆棚
查看>>
Android下拉刷新上拉加载控件,对所有View通用!
查看>>
Android自定义控件实战——仿多看阅读平移翻页
查看>>
Android自定义控件实战——仿淘宝商品浏览界面
查看>>
Android自定义控件实战——水流波动效果的实现WaveView
查看>>
Android自定义控件实战——水波纹标签云TagCloud
查看>>
Android自定义控件实战——滚动选择器PickerView
查看>>
Android自定义控件实战——下拉刷新控件终结者:PullToRefreshLayout
查看>>
Android事件分发、View事件Listener全解析
查看>>
Eclipse下使用Ant多渠道批量打包
查看>>
Eclipse下Ant自动打包,混淆和签名
查看>>
android 集成第三方静态库的编译方法
查看>>
linux环境下编译不成功
查看>>
Android系统时间制式的获取(24钟头制式/12小时制式)及UTC与本地时间的转换
查看>>
Android WebView Long Press长按保存图片到手机
查看>>
How To Install Java on Ubuntu with Apt-Get
查看>>
Setting up a Linux build environment
查看>>