hive 输出csv的三种方式

博主： zmx
发布时间：2021 年 09 月 14 日
727 次浏览
暂无评论
1286字数
分类： Python hadoop 生态圈

# 引言

最近项目需要导出大数据平台的数据到csv文件，目前我有三种方式，但是走python的，需要把数据读到内存，然后生成csv，写入硬盘。而通过hive命令，通过输出流的效率就比python更快，因环境和数据量而异吧。

# pandas

```python
query_sql = f"select * from db.table where condition= '{condition}'"
res = self._dbh.query(query_sql) #  self._dbh是我封装的类，pandas.read_sql_query(query_sql, conn),conn是hive的连接对象
if not os.path.exists(file_path):
    os.makedirs(file_path)  # 创建文件存储路径
res.to_csv(file_path + file_name, encoding='utf-8', header=True, index=None) 导出csv
commonUnit.saveMD5ValueByFile(file_path + file_name)
```

# 通过hive命令

hive导出CSV

```python
sh = f"""hive -S -e "set hive.cli.print.header=true;set hive.cli.print.current.db=false;set hive.resultset.use.unique.column.names=false; select * from db.table where condition= '{condition}'" |sed 's/[\t]/,/g' | grep -vE '(INFO|WARNING)' """
os.system(sh + " > " + file_path+file_name)
```

-S  安静点儿
set hive.cli.print.header=true;  显示字段名
set hive.cli.print.current.db=false; 不显示当前库名
set hive.resultset.use.unique.column.names=false; 字段名不显示表名
select * from db.table where condition='{condition}'" HQL语句
|sed 's/[\t]/,/g'  制表符转换为逗号分隔符
|grep -vE '(INFO|WARNING)'  屏蔽INFO和WARNING信息

</div>

# impala

```shell
impala-shell -q  "SQL语句" -B --output_delimiter "," -o 保存路径
```