关键词:
sqoop1
1、使用Sqoop导入MySQL数据到HDFS
[[email protected] ~]# sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password root --table user --columns ‘uid,uname‘ -m 1 -target-dir ‘/sqoop/user‘; //-m 指定map进程数,-target-dir指定存放目录
2、使用Sqoop导入MySQL数据到Hive中
[[email protected] ~]# sqoop import --hive-import --connect jdbc:mysql://localhost:3306/test --username root --password root --table user --columns ‘uid,uname‘ -m 1
3、使用Sqoop导入MySQL数据到Hive中,并且指定表名
[[email protected] ~]# sqoop import --hive-import --connect jdbc:mysql://localhost:3306/test --username root --password root --table user --columns ‘uid,uname‘ -m 1 --hive-table user1; //如果hive中没有这张表,则创建这张表保存对应数据
4、使用Sqoop导入MySQL数据到Hive中,并使用where条件
[[email protected] ~]# sqoop import --hive-import --connect jdbc:mysql://localhost:3306/test --username root --password root --table user --columns ‘uid,uname‘ -m 1 --hive-table user2 where uid=10;
5、使用Sqoop导入MySQL数据到Hive中,并使用查询语句
[[email protected] ~]# sqoop import --hive-import --connect jdbc:mysql://localhost:3306/test --username root --password root -m 1 --hive-table user6 --query ‘select * from user where uid<10 and $conditions‘ --target-dir /sqoop/user5; //and $conditions 必须加在查询语句中,不加报错
6、使用Sqoop将Hive中的数据导出到MySQL中
[[email protected] ~]# sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password root -m 1 --table user5 --export-dir /sqoop/user5 //两张表的列的个数和类型必须相同
sqoop2
sqoop-shell
启动sqoop-shell
1 jjzhu:bin didi$ sqoop2-shell 2 Setting conf dir: /opt/sqoop-1.99.7/bin/../conf 3 Sqoop home directory: /opt/sqoop-1.99.7 4 Sqoop Shell: Type ‘help‘ or ‘\h‘ for help. 5 6 sqoop:000> set server --host localhost --port 12000 --webapp sqoop 7 Server is set successfully 8 sqoop:000> show version --all 9 client version: 10 Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb 11 Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016 12 0 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 13 server version: 14 Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb 15 Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016 16 API versions: 17 [v1] 18 sqoop:000>
配置sqoop server
sqoop:000> set server --host localhost --port 12000 --webapp sqoop Server is set successfully
查看server连接是否可用
sqoop:000> show version --all client version: Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016 0 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable server version: Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016 API versions: [v1] sqoop:000>
创建链接
查看sqoop server上可用的链接
1 sqoop:000> show connector 2 +------------------------+---------+------------------------------------------------------------+----------------------+ 3 | Name | Version | Class | Supported Directions | 4 +------------------------+---------+------------------------------------------------------------+----------------------+ 5 | generic-jdbc-connector | 1.99.7 | org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO | 6 | kite-connector | 1.99.7 | org.apache.sqoop.connector.kite.KiteConnector | FROM/TO | 7 | oracle-jdbc-connector | 1.99.7 | org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO | 8 | ftp-connector | 1.99.7 | org.apache.sqoop.connector.ftp.FtpConnector | TO | 9 | hdfs-connector | 1.99.7 | org.apache.sqoop.connector.hdfs.HdfsConnector | FROM/TO | 10 | kafka-connector | 1.99.7 | org.apache.sqoop.connector.kafka.KafkaConnector | TO | 11 | sftp-connector | 1.99.7 | org.apache.sqoop.connector.sftp.SftpConnector | TO | 12 +------------------------+---------+------------------------------------------------------------+----------------------+ 13 sqoop:000>
- generic-jdbc-connector
依赖于java JDBC的connector,可以作为数据导入的数据源和目标源 - hdfs-connector
以hdfs作为数据源或者目标源的connector
用如下命令创建一个generic-jdbc-connector的链接
1 sqoop:002> create link -c generic-jdbc-connector 2 Creating link for connector with name generic-jdbc-connector 3 Please fill following values to create new link object 4 Name: mysql_weibouser_link 5 6 Database connection 7 8 Driver class: com.mysql.jdbc.Driver 9 Connection String: jdbc:mysql://127.0.0.1:3306/spider 10 Username: root 11 Password: **** 12 Fetch Size: 13 Connection Properties: 14 There are currently 0 values in the map: 15 entry# protocol=tcp 16 There are currently 1 values in the map: 17 protocol = tcp 18 entry# 19 20 SQL Dialect 21 22 Identifier enclose: **注意 这里不能直接回车!要打一个空格符号!因为如果不打,查询mysql表的时候会在表上加上“”,导致查询出错! 23 ** 24 New link was successfully created with validation status OK and name mysql_weibouser_link
创建hdfs link
1 sqoop:002> create link -c hdfs-connector 2 Creating link for connector with name hdfs-connector 3 Please fill following values to create new link object 4 Name: hdfs_weibouser_link 5 6 HDFS cluster 7 8 URI: hdfs://localhost:9000 9 Conf directory: /opt/hadoop-2.7.3/etc/hadoop 10 Additional configs:: 11 There are currently 0 values in the map: 12 entry# 13 New link was successfully created with validation status OK and name hdfs_weibouser_link
查看link
1 sqoop:002> show link 2 +----------------------+------------------------+---------+ 3 | Name | Connector Name | Enabled | 4 +----------------------+------------------------+---------+ 5 | mysql_weibouser | generic-jdbc-connector | true | 6 | mysql_weibouser_link | generic-jdbc-connector | true | 7 | hdfs_link | hdfs-connector | true | 8 | hdfs_link2 | hdfs-connector | true | 9 | hdfs_weibouser_link | hdfs-connector | true | 10 +----------------------+------------------------+---------+
创建job
1 sqoop:002> create job -f "mysql_weibouser_link" -t "hdfs_weibouser_link" 2 Creating job for links with from name mysql_weibouser_link and to name hdfs_weibouser_link 3 Please fill following values to create new job object 4 Name: job_weibouser 5 6 Database source 7 8 Schema name: spider 9 Table name: spiders_weibouser 10 SQL statement: 11 Column names: 12 There are currently 0 values in the list: 13 element# 14 Partition column: 15 Partition column nullable: 16 Boundary query: 17 18 Incremental read 19 20 Check column: 21 Last value: 22 23 Target configuration 24 25 Override null value: 26 Null value: 27 File format: 28 0 : TEXT_FILE 29 1 : SEQUENCE_FILE 30 2 : PARQUET_FILE 31 Choose: 0 32 Compression codec: 33 0 : NONE 34 1 : DEFAULT 35 2 : DEFLATE 36 3 : GZIP 37 4 : BZIP2 38 5 : LZO 39 6 : LZ4 40 7 : SNAPPY 41 8 : CUSTOM 42 Choose: 0 43 Custom codec: 44 Output directory: hdfs://localhost:9000/usr/jjzhu/spider/spiders_weibouser 45 Append mode: 46 47 Throttling resources 48 49 Extractors: 2 50 Loaders: 2 51 52 Classpath configuration 53 54 Extra mapper jars: 55 There are currently 0 values in the list: 56 element# 57 New job was successfully created with validation status OK and name job_weibouser
各参数意义:
1 以下是各个属性 2 Name:一个标示符,自己指定即可。 3 Schema Name:指定Database或Schema的名字,在MySQL中,Schema同Database类似,具体什么区别没有深究过,但官网描述在创建时差不多。。 4 Table Name:自己指定导出的表。 5 SQL Statement:就是sql查询语句,文档上说需要指定一个$condition,但我一直没有创建成功,貌似是一个条件子句。 6 配置完以上几项,又回出现element#提示符,提示输入一些hash值,直接回车过。 7 Partition column: 8 Partition column nullable: 9 Boundary query 10 Last value 11 后面需要配置数据目的地各项值: 12 Null alue:大概说的是如果有空值用什么覆盖 13 File format:指定在HDFS中的数据文件是什么文件格式,这里使用TEXT_FILE,即最简单的文本文件。 14 Compression codec:用于指定使用什么压缩算法进行导出数据文件压缩,我指定NONE,这个也可以使用自定义的压缩算法CUSTOM,用Java实现相应的接口。 15 Custom codec:这个就是指定的custom压缩算法,本例选择NONE,所以直接回车过去。 16 Output directory:指定存储在HDFS文件系统中的路径,这里最好指定一个存在的路径,或者存在但路劲下是空的,貌似这样才能成功。 17 Append mode:用于指定是否是在已存在导出文件的情况下将新数据追加到数据文件中。 18 Extractors:2 19 Loaders:2 20 最后再次出现element#提示符,用于输入extra mapper jars的属性,可以什么都不写。直接回车。 21 22 至此若出现successful则证明已经成功创建。
查看创建的job
1 sqoop:002> show job 2 +----+---------------+-----------------------------------------------+--------------------------------------+---------+ 3 | Id | Name | From Connector | To Connector | Enabled | 4 +----+---------------+-----------------------------------------------+--------------------------------------+---------+ 5 | 1 | spider_job | mysql_weibouser (generic-jdbc-connector) | hdfs_link (hdfs-connector) | true | 6 | 2 | job_weibouser | mysql_weibouser_link (generic-jdbc-connector) | hdfs_weibouser_link (hdfs-connector) | true | 7 +----+---------------+-----------------------------------------------+--------------------------------------+---------+ 8 sqoop:002>
启动job
1 start job -n job_weibouser 2 sqoop:002> start job -n job_weibouser 3 Submission details 4 Job Name: job_weibouser 5 Server URL: http://localhost:12000/sqoop/ 6 Created by: didi 7 Creation date: 2017-04-11 14:37:46 CST 8 Lastly updated by: didi 9 External ID: job_1491888730134_0003 10 http://jjzhu:8088/proxy/application_1491888730134_0003/ 11 2017-04-11 14:37:46 CST: BOOTING - Progress is not available
查看job运行状态
1 sqoop:002> status job -n job_weibouser 2 Submission details 3 Job Name: job_weibouser 4 Server URL: http://localhost:12000/sqoop/ 5 Created by: didi 6 Creation date: 2017-04-11 14:37:46 CST 7 Lastly updated by: didi 8 External ID: job_1491888730134_0003 9 http://jjzhu:8088/proxy/application_1491888730134_0003/ 10 2017-04-11 14:38:41 CST: SUCCEEDED 11 Counters: 12 org.apache.hadoop.mapreduce.FileSystemCounter 13 FILE_LARGE_READ_OPS: 0 14 FILE_WRITE_OPS: 0 15 HDFS_READ_OPS: 2 16 HDFS_BYTES_READ: 290 17 HDFS_LARGE_READ_OPS: 0 18 FILE_READ_OPS: 0 19 FILE_BYTES_WRITTEN: 51361466 20 FILE_BYTES_READ: 25115854 21 HDFS_WRITE_OPS: 2 22 HDFS_BYTES_WRITTEN: 24652721 23 org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter 24 BYTES_WRITTEN: 0 25 org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter 26 BYTES_READ: 0 27 org.apache.hadoop.mapreduce.JobCounter 28 TOTAL_LAUNCHED_MAPS: 2 29 VCORES_MILLIS_REDUCES: 20225 30 MB_MILLIS_MAPS: 27120640 31 TOTAL_LAUNCHED_REDUCES: 2 32 SLOTS_MILLIS_REDUCES: 20225 33 VCORES_MILLIS_MAPS: 26485 34 MB_MILLIS_REDUCES: 20710400 35 SLOTS_MILLIS_MAPS: 26485 36 MILLIS_REDUCES: 20225 37 OTHER_LOCAL_MAPS: 2 38 MILLIS_MAPS: 26485 39 org.apache.sqoop.submission.counter.SqoopCounters 40 ROWS_READ: 109408 41 ROWS_WRITTEN: 109408 42 org.apache.hadoop.mapreduce.TaskCounter 43 MAP_OUTPUT_MATERIALIZED_BYTES: 25115866 44 REDUCE_INPUT_RECORDS: 109408 45 SPILLED_RECORDS: 218816 46 MERGED_MAP_OUTPUTS: 4 47 VIRTUAL_MEMORY_BYTES: 0 48 MAP_INPUT_RECORDS: 0 49 SPLIT_RAW_BYTES: 290 50 FAILED_SHUFFLE: 0 51 MAP_OUTPUT_BYTES: 24762129 52 REDUCE_SHUFFLE_BYTES: 25115866 53 PHYSICAL_MEMORY_BYTES: 0 54 GC_TIME_MILLIS: 1648 55 REDUCE_INPUT_GROUPS: 109408 56 COMBINE_OUTPUT_RECORDS: 0 57 SHUFFLED_MAPS: 4 58 REDUCE_OUTPUT_RECORDS: 109408 59 MAP_OUTPUT_RECORDS: 109408 60 COMBINE_INPUT_RECORDS: 0 61 CPU_MILLISECONDS: 0 62 COMMITTED_HEAP_BYTES: 1951399936 63 Shuffle Errors 64 CONNECTION: 0 65 WRONG_LENGTH: 0 66 BAD_ID: 0 67 WRONG_MAP: 0 68 WRONG_REDUCE: 0 69 IO_ERROR: 0 70 Job executed successfully
查看hdfs的相关路径,看是否有输出文件
1 jjzhu:~ didi$ hdfs dfs -ls /usr/jjzhu/spider 2 Found 4 items 3 drwxr-xr-x - didi supergroup 0 2017-04-11 14:38 /usr/jjzhu/spider/spiders_weibouser 4 drwxr-xr-x - 777 supergroup 0 2017-04-11 10:58 /usr/jjzhu/spider/weibouser 5 drwxr-xr-x - 777 supergroup 0 2017-04-11 13:33 /usr/jjzhu/spider/weobouser 6 drwxr-xr-x - didi supergroup 0 2017-04-11 13:39 /usr/jjzhu/spider/weobouser2 7 jjzhu:~ didi$ hdfs dfs -ls /usr/jjzhu/spider/spiders_weibouser 8 Found 2 items 9 -rw-r--r-- 1 didi supergroup 12262783 2017-04-11 14:38 /usr/jjzhu/spider/spiders_weibouser/33b56441-b638-48cc-8d0d-37a808f25653.txt 10 -rw-r--r-- 1 didi supergroup 12389938 2017-04-11 14:38 /usr/jjzhu/spider/spiders_weibouser/73b20d50-de72-4aea-8c8c-d97cdc48e667.txt
转自:https://yq.aliyun.com/articles/73582
sqoop导入导出
sqoop产生背景 什么是sqoop sqoop的优势 sqoop1与sqoop2的比较 为什么选择sqoop1 sqoop在hadoop生态体系中的位置 sqoop基本架构 sqoopimport原理 sqoopimport详细流程 sqoopexport原理 sqoop安 查看详情
sqoop数据迁移
...Hadoop的文件系统中导出数据到关系数据库mysql等 1.2、sqoop1与sqoop2架构对比sqoop1架构 sqoop2架构&nbs 查看详情
springboot集成sqoop1.4.7实现从mysql导入数据到hdfs(代码片段)
这里写自定义目录标题背景介绍问题1:Sqoop1和Sqoop2有什么区别问题2:可以在Windows下运行吗?需要在Windows下安装Haddop吗问题3:无法下载org.apache.sqoop:sqoop:1.4.7问题4:log4j的依赖冲突问题5:执行本地MapReduce... 查看详情
sqoop环境搭建01sqoop-1.4.7安装配置centoslinuxrelease7.5.1804(附sqoop1最新版+sqoop2最新版安装包+mysql驱动包资源)(代码片段)
目前Sqoop有Sqoop1和Sqoop2两个版本,但是截至到目前,官方并不推荐使用Sqoop2,因为其与Sqoop1并不兼容,且功能还没有完善,所以这里优先推荐使用Sqoop1。本次安装的文件是【Sqoop1的最新版本】sqoop-1.4.7.bin-hadoop-2... 查看详情
sqoop使用入门(代码片段)
...执行。 基本架构目前sqoop提供了两个版本,1.4.x的为sqoop1,1.99x的为sqoop2,前者因为安装简单,得到了大量使用,后者虽然引进了安全机制、webui,restapi等更加方便使用的特性,但是安装过程繁琐暂时不记录。以下是sqoop1的结... 查看详情
sqoop1.99.3操作,导入数据全纪录
sqoop的1.99.7版本有问题,请千万不要用需求:将hive数据库中的TBLS表导出到HDFS之上; $SQOOP2_HOME/bin/sqoop.sh clientsqoop:000> set server --host 127.0.0.1 --port 12000 --webapp s 查看详情
sqoop2安装
下载http://www-us.apache.org/dist/sqoop/ 打开以上链接,开始下载sqoop2 下载后得到:sqoop-1.99.7-bin-hadoop200.tar.gz文件其中sqoop1和sqoop2区别可以参考以下链接 重拾初心——Sqoop1和Sqoop2的刨析对比 http://blog.csdn.ne 查看详情
javascript导出和导入模块(代码片段)
javascript导入,要求和导出(代码片段)
大数据之非常详细sqoop安装和基本操作(代码片段)
大数据大数据之非常详细Sqoop安装和基本操作目录大数据Sqoop1、上传解压2、配置环境变量配置sqoop环境变量配置sqoop-env.sh3、加入mysql的jdbc驱动包4、验证验证是否安装成功验证启动5、导入mysql表数据到HDFS6、导出HDFS数据到mysql总结... 查看详情
sqoop安装及导入sqlserver数据(代码片段)
如何使用Sqoop导出sqlserver的数据1.下载Sqoop1.4.7到清华镜像去下载sqoop-1.4.7就行了2.连接sqlserver用的jar包使用sqoop导出sqlserver的数据,需要使用对应的sqljdbc.jar包,这里用到的是sqljdbc4.jar点我下载3.安装sqoop将下载好的压缩包解压到指... 查看详情
phpwordpressblogroll或链接导入和导出(代码片段)
docker镜像的导入和导出(代码片段)
docker镜像的导入和导出前言一、images练习:1.拉取centos:7和Ubuntu的镜像2.在里面安装软件tree,nginx3.镜像导出4.在另外一台机器里导入*二.拓展:1.导出镜像分类:2.查看路由器详情:总结3.如果你看到这里了,麻烦👍+关注哈,感... 查看详情
php[wordpress插件]导航菜单导出器和导入器/导出和导入导航菜单。需要wordpress导入器插件。(代码片段)
javascript模块化(导入和导出文件)(代码片段)
vue模块化导入和导出(代码片段)
目录标题CommonJSES6导入和导出匿名函数CommonJSES6导入和导出例子:index.html<!DOCTYPEhtml><htmllang="en"><head><metacharset="UTF-8"><title>Title</title>< 查看详情
hbase表的数据导出和导入(代码片段)
1.表数据导出hbaseorg.apache.hadoop.hbase.mapreduce.Exporttestfile:///home/hadoop/test(导入到本地)hbaseorg.apache.hadoop.hbase.mapreduce.Exporttest/user/hadoop/test(导入到hdfs上)#创建一个test表,一个列簇infohbase(main):004:0* 查看详情
docker镜像和容器的导入导出(代码片段)
镜像导入load命令dockerload[options]eg:dockerload-inginx.tarnginx.tar导入的文件名导出save命令dockersave[options]images[images...]eg:dockersave-onginx.tarnginx:latestnginx.tar:保存的目标文件名nginx:latest是镜像名容器导入import命令docker 查看详情