- 浏览: 474202 次
- 性别:
- 来自: 南阳
文章分类
最新评论
-
yuanhongb:
这么说来,感觉CGI和现在的JSP或ASP技术有点像啊
cgi -
draem0507:
放假了还这么勤啊
JXL操作Excel -
chenjun1634:
学习中!!
PHP/Java Bridge -
Jelen_123:
好文章,给了我好大帮助!多谢!
hadoop安装配置 ubuntu9.10 hadoop0.20.2 -
lancezhcj:
一直用job
Oracle存储过程定时执行2种方法(转)
SQOOP是一款开源的工具,主要用于在HADOOP与传统的数据库间进行数据的传递,下面从SQOOP用户手册上摘录一段描述
Sqoop is a tool designed to transfer data between Hadoop andrelational databases. You can use Sqoop to import data from arelational database management system (RDBMS) such as MySQL or Oracleinto the Hadoop Distributed File System (HDFS),transform the data in Hadoop MapReduce, and then export the data backinto an RDBMS.
这里我主要描述一下安装过程
1、下载相应软件
我使用的HADOOP版本是APACHE官方版本0.20.2,但是后来在使用的过程中报错,查阅了一些文章,发现SQOOP是不支持此版本的,一般都会推荐你使用CDH3。不过后来通过拷贝相应的包到sqoop-1.2.0-CDH3B4/lib下,依然还是可以使用的。当然,你可以选择直接使用CDH3。
下面是CDH3和SQOOP 1.2.0的下载地址
http://archive.cloudera.com/cdh/3/hadoop-0.20.2-CDH3B4.tar.gz
http://archive.cloudera.com/cdh/3/sqoop-1.2.0-CDH3B4.tar.gz
其中sqoop-1.2.0-CDH3B4依赖hadoop-core-0.20.2-CDH3B4.jar,所以你需要下载hadoop-0.20.2-CDH3B4.tar.gz,解压缩后将hadoop-0.20.2-CDH3B4/hadoop-core-0.20.2-CDH3B4.jar复制到sqoop-1.2.0-CDH3B4/lib中。
另外,sqoop导入mysql数据运行过程中依赖mysql-connector-java-*.jar,所以你需要下载mysql-connector-java-*.jar并复制到sqoop-1.2.0-CDH3B4/lib中。
2、修改SQOOP的文件configure-sqoop,注释掉hbase和zookeeper检查(除非你准备使用HABASE等HADOOP上的组件)
#if [ ! -d "${HBASE_HOME}" ]; then
# echo “Error: $HBASE_HOME does not exist!”
# echo ‘Please set $HBASE_HOME to the root of your HBase installation.’
# exit 1
#fi
#if [ ! -d "${ZOOKEEPER_HOME}" ]; then
# echo “Error: $ZOOKEEPER_HOME does not exist!”
# echo ‘Please set $ZOOKEEPER_HOME to the root of your ZooKeeper installation.’
# exit 1
#fi
3、启动HADOOP,配置好相关环境变量(例如$HADOOP_HOME),就可以使用SQOOP了
下面是个从数据库导出表的数据到HDFS上文件的例子
[wanghai01@tc-crm-rd01.tc sqoop-1.2.0-CDH3B4]$ bin/sqoop import --connect jdbc:mysql://XXXX:XX/crm --username crm --password 123456 --table company -m 1
11/09/21 15:45:25 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
11/09/21 15:45:26 INFO tool.CodeGenTool: Beginning code generation
11/09/21 15:45:26 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `company` AS t LIMIT 1
11/09/21 15:45:26 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `company` AS t LIMIT 1
11/09/21 15:45:26 INFO orm.CompilationManager: HADOOP_HOME is /home/wanghai01/hadoop/hadoop-0.20.2/bin/..
11/09/21 15:45:26 INFO orm.CompilationManager: Found hadoop core jar at: /home/wanghai01/hadoop/hadoop-0.20.2/bin/../hadoop-0.20.2-core.jar
11/09/21 15:45:26 ERROR orm.CompilationManager: Could not rename /tmp/sqoop-wanghai01/compile/2bd70cf2b712a9c7cdb0860722ea7c18/company.java to /home/wanghai01/cloudera/sqoop-1.2.0-CDH3B4/./company.java
11/09/21 15:45:26 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-wanghai01/compile/2bd70cf2b712a9c7cdb0860722ea7c18/company.jar
11/09/21 15:45:26 WARN manager.MySQLManager: It looks like you are importing from mysql.
11/09/21 15:45:26 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
11/09/21 15:45:26 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
11/09/21 15:45:26 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
11/09/21 15:45:26 INFO mapreduce.ImportJobBase: Beginning import of company
11/09/21 15:45:27 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `company` AS t LIMIT 1
11/09/21 15:45:28 INFO mapred.JobClient: Running job: job_201109211521_0001
11/09/21 15:45:29 INFO mapred.JobClient: map 0% reduce 0%
11/09/21 15:45:40 INFO mapred.JobClient: map 100% reduce 0%
11/09/21 15:45:42 INFO mapred.JobClient: Job complete: job_201109211521_0001
11/09/21 15:45:42 INFO mapred.JobClient: Counters: 5
11/09/21 15:45:42 INFO mapred.JobClient: Job Counters
11/09/21 15:45:42 INFO mapred.JobClient: Launched map tasks=1
11/09/21 15:45:42 INFO mapred.JobClient: FileSystemCounters
11/09/21 15:45:42 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=44
11/09/21 15:45:42 INFO mapred.JobClient: Map-Reduce Framework
11/09/21 15:45:42 INFO mapred.JobClient: Map input records=8
11/09/21 15:45:42 INFO mapred.JobClient: Spilled Records=0
11/09/21 15:45:42 INFO mapred.JobClient: Map output records=8
11/09/21 15:45:42 INFO mapreduce.ImportJobBase: Transferred 44 bytes in 15.0061 seconds (2.9321 bytes/sec)
11/09/21 15:45:42 INFO mapreduce.ImportJobBase: Retrieved 8 records.
查看一下数据
[wanghai01@tc-crm-rd01.tc sqoop-1.2.0-CDH3B4]$ hadoop fs -cat /user/wanghai01/company/part-m-00000
1,xx
2,eee
1,xx
2,eee
1,xx
2,eee
1,xx
2,eee
到数据库中查一下验证一下
mysql> select * from company;
+------+------+
| id | name |
+------+------+
| 1 | xx |
| 2 | eee |
| 1 | xx |
| 2 | eee |
| 1 | xx |
| 2 | eee |
| 1 | xx |
| 2 | eee |
+------+------+
8 rows in set (0.00 sec)
OK,是没有问题的。仔细看执行命令时打出的信息,会发现一个ERROR,这是因为之前我执行过此命令失败了,而再次执行的时候相关的临时数据没有清理。
本篇文章来源于 Linux公社网站(www.linuxidc.com) 原文链接:http://www.linuxidc.com/Linux/2011-10/45081.htm
===================================================
SQOOP是一款开源的工具,主要用于在HADOOP与传统的数据库间进行数据的传递,下面从SQOOP用户手册上摘录一段描述
Sqoopis a tool designed to transfer data between Hadoop and relational databases.You can use Sqoop to import data from a relational database management system(RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System(HDFS),transform the data in Hadoop MapReduce, and then export the data backinto an RDBMS.
SQOOP是Cloudera公司开源的一款在HDFS以及数据库之间传输数据的软件。内部通过JDBC连接HADOOP以及数据库,因此从理论上来讲,只要是支持JDBC的数据库,SQOOP都可以兼容。并且,SQOOP不仅能把数据以文件的形式导入到HDFS上,还可以直接导入数据到HBASE或者HIVE中。
下面是一些性能测试数据,仅供参考:
表名:tb_keywords
行数:11628209
数据文件大小:1.4G
HDFS –> DB
DB -> HDFS
SQOOP
428s
166s
HDFS<->FILE<->DB
209s
105s
从结果上来看,以FILE作为中转方式性能是要高于SQOOP的。原因如下:
1、 本质上SQOOP使用的是JDBC,效率不会比MYSQL自带的到导入\导出工具效率高
2、 以导入数据到DB为例,SQOOP的设计思想是分阶段提交,也就是说假设一个表有1K行,那么它会先读出100行(默认值),然后插入,提交,再读取100行……如此往复
即便如此,SQOOP也是有优势的,比如说使用的便利性,任务执行的容错性等。在一些测试环境中如果需要的话可以考虑把它拿来作为一个工具使用。
下面是一些操作记录
[wanghai01@tc-crm-rd01.tc.baidu.com bin]$ sh export.sh
Fri Sep 23 20:15:47 CST 2011
11/09/23 20:15:48 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
11/09/23 20:15:48 INFO tool.CodeGenTool: Beginning code generation
11/09/23 20:15:48 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `tb_keyword_data_201104` AS t LIMIT 1
11/09/23 20:15:48 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `tb_keyword_data_201104` AS t LIMIT 1
11/09/23 20:15:48 INFO orm.CompilationManager: HADOOP_HOME is /home/wanghai01/hadoop/hadoop-0.20.2/bin/..
11/09/23 20:15:48 INFO orm.CompilationManager: Found hadoop core jar at: /home/wanghai01/hadoop/hadoop-0.20.2/bin/../hadoop-0.20.2-core.jar
11/09/23 20:15:49 ERROR orm.CompilationManager: Could not rename /tmp/sqoop-wanghai01/compile/eb16aae87a119b93acb3bc6ea74b5e97/tb_keyword_data_201104.java to /home/wanghai01/cloudera/sqoop-1.2.0-CDH3B4/bin/./tb_keyword_data_201104.java
11/09/23 20:15:49 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-wanghai01/compile/eb16aae87a119b93acb3bc6ea74b5e97/tb_keyword_data_201104.jar
11/09/23 20:15:49 INFO mapreduce.ExportJobBase: Beginning export of tb_keyword_data_201104
11/09/23 20:15:49 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `tb_keyword_data_201104` AS t LIMIT 1
11/09/23 20:15:49 INFO input.FileInputFormat: Total input paths to process : 1
11/09/23 20:15:49 INFO input.FileInputFormat: Total input paths to process : 1
11/09/23 20:15:49 INFO mapred.JobClient: Running job: job_201109211521_0012
11/09/23 20:15:50 INFO mapred.JobClient: map 0% reduce 0%
11/09/23 20:16:04 INFO mapred.JobClient: map 1% reduce 0%
11/09/23 20:16:10 INFO mapred.JobClient: map 2% reduce 0%
11/09/23 20:16:13 INFO mapred.JobClient: map 3% reduce 0%
11/09/23 20:16:19 INFO mapred.JobClient: map 4% reduce 0%
11/09/23 20:16:22 INFO mapred.JobClient: map 5% reduce 0%
11/09/23 20:16:25 INFO mapred.JobClient: map 6% reduce 0%
11/09/23 20:16:31 INFO mapred.JobClient: map 7% reduce 0%
11/09/23 20:16:34 INFO mapred.JobClient: map 8% reduce 0%
11/09/23 20:16:41 INFO mapred.JobClient: map 9% reduce 0%
11/09/23 20:16:44 INFO mapred.JobClient: map 10% reduce 0%
11/09/23 20:16:50 INFO mapred.JobClient: map 11% reduce 0%
11/09/23 20:16:53 INFO mapred.JobClient: map 12% reduce 0%
11/09/23 20:16:56 INFO mapred.JobClient: map 13% reduce 0%
11/09/23 20:17:02 INFO mapred.JobClient: map 14% reduce 0%
11/09/23 20:17:05 INFO mapred.JobClient: map 15% reduce 0%
11/09/23 20:17:11 INFO mapred.JobClient: map 16% reduce 0%
11/09/23 20:17:14 INFO mapred.JobClient: map 17% reduce 0%
11/09/23 20:17:17 INFO mapred.JobClient: map 18% reduce 0%
11/09/23 20:17:23 INFO mapred.JobClient: map 19% reduce 0%
11/09/23 20:17:25 INFO mapred.JobClient: map 20% reduce 0%
11/09/23 20:17:28 INFO mapred.JobClient: map 21% reduce 0%
11/09/23 20:17:34 INFO mapred.JobClient: map 22% reduce 0%
11/09/23 20:17:37 INFO mapred.JobClient: map 23% reduce 0%
11/09/23 20:17:43 INFO mapred.JobClient: map 24% reduce 0%
11/09/23 20:17:46 INFO mapred.JobClient: map 25% reduce 0%
11/09/23 20:17:49 INFO mapred.JobClient: map 26% reduce 0%
11/09/23 20:17:55 INFO mapred.JobClient: map 27% reduce 0%
11/09/23 20:17:58 INFO mapred.JobClient: map 28% reduce 0%
11/09/23 20:18:04 INFO mapred.JobClient: map 29% reduce 0%
11/09/23 20:18:07 INFO mapred.JobClient: map 30% reduce 0%
11/09/23 20:18:10 INFO mapred.JobClient: map 31% reduce 0%
11/09/23 20:18:16 INFO mapred.JobClient: map 32% reduce 0%
11/09/23 20:18:19 INFO mapred.JobClient: map 33% reduce 0%
11/09/23 20:18:25 INFO mapred.JobClient: map 34% reduce 0%
11/09/23 20:18:28 INFO mapred.JobClient: map 35% reduce 0%
11/09/23 20:18:31 INFO mapred.JobClient: map 36% reduce 0%
11/09/23 20:18:37 INFO mapred.JobClient: map 37% reduce 0%
11/09/23 20:18:40 INFO mapred.JobClient: map 38% reduce 0%
11/09/23 20:18:46 INFO mapred.JobClient: map 39% reduce 0%
11/09/23 20:18:49 INFO mapred.JobClient: map 40% reduce 0%
11/09/23 20:18:52 INFO mapred.JobClient: map 41% reduce 0%
11/09/23 20:18:58 INFO mapred.JobClient: map 42% reduce 0%
11/09/23 20:19:01 INFO mapred.JobClient: map 43% reduce 0%
11/09/23 20:19:04 INFO mapred.JobClient: map 44% reduce 0%
11/09/23 20:19:10 INFO mapred.JobClient: map 45% reduce 0%
11/09/23 20:19:13 INFO mapred.JobClient: map 46% reduce 0%
11/09/23 20:19:19 INFO mapred.JobClient: map 47% reduce 0%
11/09/23 20:19:22 INFO mapred.JobClient: map 48% reduce 0%
11/09/23 20:19:25 INFO mapred.JobClient: map 49% reduce 0%
11/09/23 20:19:34 INFO mapred.JobClient: map 50% reduce 0%
11/09/23 20:19:37 INFO mapred.JobClient: map 52% reduce 0%
11/09/23 20:19:40 INFO mapred.JobClient: map 53% reduce 0%
11/09/23 20:19:43 INFO mapred.JobClient: map 54% reduce 0%
11/09/23 20:19:46 INFO mapred.JobClient: map 55% reduce 0%
11/09/23 20:19:49 INFO mapred.JobClient: map 56% reduce 0%
11/09/23 20:19:52 INFO mapred.JobClient: map 57% reduce 0%
11/09/23 20:19:55 INFO mapred.JobClient: map 58% reduce 0%
11/09/23 20:19:58 INFO mapred.JobClient: map 59% reduce 0%
11/09/23 20:20:01 INFO mapred.JobClient: map 60% reduce 0%
11/09/23 20:20:04 INFO mapred.JobClient: map 62% reduce 0%
11/09/23 20:20:07 INFO mapred.JobClient: map 63% reduce 0%
11/09/23 20:20:10 INFO mapred.JobClient: map 64% reduce 0%
11/09/23 20:20:13 INFO mapred.JobClient: map 65% reduce 0%
11/09/23 20:20:16 INFO mapred.JobClient: map 66% reduce 0%
11/09/23 20:20:19 INFO mapred.JobClient: map 67% reduce 0%
11/09/23 20:20:22 INFO mapred.JobClient: map 68% reduce 0%
11/09/23 20:20:25 INFO mapred.JobClient: map 69% reduce 0%
11/09/23 20:20:28 INFO mapred.JobClient: map 70% reduce 0%
11/09/23 20:20:31 INFO mapred.JobClient: map 72% reduce 0%
11/09/23 20:20:34 INFO mapred.JobClient: map 73% reduce 0%
11/09/23 20:20:37 INFO mapred.JobClient: map 74% reduce 0%
11/09/23 20:20:40 INFO mapred.JobClient: map 75% reduce 0%
11/09/23 20:20:43 INFO mapred.JobClient: map 76% reduce 0%
11/09/23 20:20:46 INFO mapred.JobClient: map 77% reduce 0%
11/09/23 20:20:49 INFO mapred.JobClient: map 78% reduce 0%
11/09/23 20:20:52 INFO mapred.JobClient: map 80% reduce 0%
11/09/23 20:20:55 INFO mapred.JobClient: map 81% reduce 0%
11/09/23 20:20:58 INFO mapred.JobClient: map 82% reduce 0%
11/09/23 20:21:01 INFO mapred.JobClient: map 83% reduce 0%
11/09/23 20:21:04 INFO mapred.JobClient: map 84% reduce 0%
11/09/23 20:21:07 INFO mapred.JobClient: map 85% reduce 0%
11/09/23 20:21:10 INFO mapred.JobClient: map 86% reduce 0%
11/09/23 20:21:13 INFO mapred.JobClient: map 87% reduce 0%
11/09/23 20:21:22 INFO mapred.JobClient: map 88% reduce 0%
11/09/23 20:21:28 INFO mapred.JobClient: map 89% reduce 0%
11/09/23 20:21:37 INFO mapred.JobClient: map 90% reduce 0%
11/09/23 20:21:47 INFO mapred.JobClient: map 91% reduce 0%
11/09/23 20:21:53 INFO mapred.JobClient: map 92% reduce 0%
11/09/23 20:22:02 INFO mapred.JobClient: map 93% reduce 0%
11/09/23 20:22:11 INFO mapred.JobClient: map 94% reduce 0%
11/09/23 20:22:17 INFO mapred.JobClient: map 95% reduce 0%
11/09/23 20:22:26 INFO mapred.JobClient: map 96% reduce 0%
11/09/23 20:22:32 INFO mapred.JobClient: map 97% reduce 0%
11/09/23 20:22:41 INFO mapred.JobClient: map 98% reduce 0%
11/09/23 20:22:47 INFO mapred.JobClient: map 99% reduce 0%
11/09/23 20:22:53 INFO mapred.JobClient: map 100% reduce 0%
11/09/23 20:22:55 INFO mapred.JobClient: Job complete: job_201109211521_0012
11/09/23 20:22:55 INFO mapred.JobClient: Counters: 6
11/09/23 20:22:55 INFO mapred.JobClient: Job Counters
11/09/23 20:22:55 INFO mapred.JobClient: Launched map tasks=4
11/09/23 20:22:55 INFO mapred.JobClient: Data-local map tasks=4
11/09/23 20:22:55 INFO mapred.JobClient: FileSystemCounters
11/09/23 20:22:55 INFO mapred.JobClient: HDFS_BYTES_READ=1392402240
11/09/23 20:22:55 INFO mapred.JobClient: Map-Reduce Framework
11/09/23 20:22:55 INFO mapred.JobClient: Map input records=11628209
11/09/23 20:22:55 INFO mapred.JobClient: Spilled Records=0
11/09/23 20:22:55 INFO mapred.JobClient: Map output records=11628209
11/09/23 20:22:55 INFO mapreduce.ExportJobBase: Transferred 1.2968 GB in 425.642 seconds (3.1198 MB/sec)
11/09/23 20:22:55 INFO mapreduce.ExportJobBase: Exported 11628209 records.
Fri Sep 23 20:22:55 CST 2011
###############
[wanghai01@tc-crm-rd01.tc.baidu.com bin]$ sh import.sh
Fri Sep 23 20:40:33 CST 2011
11/09/23 20:40:33 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
11/09/23 20:40:33 INFO tool.CodeGenTool: Beginning code generation
11/09/23 20:40:33 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `tb_keyword_data_201104` AS t LIMIT 1
11/09/23 20:40:33 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `tb_keyword_data_201104` AS t LIMIT 1
11/09/23 20:40:33 INFO orm.CompilationManager: HADOOP_HOME is /home/wanghai01/hadoop/hadoop-0.20.2/bin/..
11/09/23 20:40:33 INFO orm.CompilationManager: Found hadoop core jar at: /home/wanghai01/hadoop/hadoop-0.20.2/bin/../hadoop-0.20.2-core.jar
11/09/23 20:40:34 ERROR orm.CompilationManager: Could not rename /tmp/sqoop-wanghai01/compile/a913cede5621df95376a26c1af737ee2/tb_keyword_data_201104.java to /home/wanghai01/cloudera/sqoop-1.2.0-CDH3B4/bin/./tb_keyword_data_201104.java
11/09/23 20:40:34 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-wanghai01/compile/a913cede5621df95376a26c1af737ee2/tb_keyword_data_201104.jar
11/09/23 20:40:34 WARN manager.MySQLManager: It looks like you are importing from mysql.
11/09/23 20:40:34 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
11/09/23 20:40:34 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
11/09/23 20:40:34 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
11/09/23 20:40:34 INFO mapreduce.ImportJobBase: Beginning import of tb_keyword_data_201104
11/09/23 20:40:34 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `tb_keyword_data_201104` AS t LIMIT 1
11/09/23 20:40:40 INFO mapred.JobClient: Running job: job_201109211521_0014
11/09/23 20:40:41 INFO mapred.JobClient: map 0% reduce 0%
11/09/23 20:40:54 INFO mapred.JobClient: map 25% reduce 0%
11/09/23 20:40:57 INFO mapred.JobClient: map 50% reduce 0%
11/09/23 20:41:36 INFO mapred.JobClient: map 75% reduce 0%
11/09/23 20:42:00 INFO mapred.JobClient: map 100% reduce 0%
11/09/23 20:43:19 INFO mapred.JobClient: Job complete: job_201109211521_0014
11/09/23 20:43:19 INFO mapred.JobClient: Counters: 5
11/09/23 20:43:19 INFO mapred.JobClient: Job Counters
11/09/23 20:43:19 INFO mapred.JobClient: Launched map tasks=4
11/09/23 20:43:19 INFO mapred.JobClient: FileSystemCounters
11/09/23 20:43:19 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1601269219
11/09/23 20:43:19 INFO mapred.JobClient: Map-Reduce Framework
11/09/23 20:43:19 INFO mapred.JobClient: Map input records=11628209
11/09/23 20:43:19 INFO mapred.JobClient: Spilled Records=0
11/09/23 20:43:19 INFO mapred.JobClient: Map output records=11628209
11/09/23 20:43:19 INFO mapreduce.ImportJobBase: Transferred 1.4913 GB in 165.0126 seconds (9.2544 MB/sec)
11/09/23 20:43:19 INFO mapreduce.ImportJobBase: Retrieved 11628209 records.
Fri Sep 23 20:43:19 CST 2011
import.sh和export.sh中的主要命令如下
/home/wanghai01/cloudera/sqoop-1.2.0-CDH3B4/bin/sqoop import --connect jdbc:mysql://XXXX/crm --username XX --password XX --table tb_keyword_data_201104 --split-by winfo_id --target-dir /user/wanghai01/data/ --fields-terminated-by '\t' --lines-terminated-by '\n' --input-null-string '' --input-null-non-string ''
/home/wanghai01/cloudera/sqoop-1.2.0-CDH3B4/bin/sqoop export --connect jdbc:mysql://XXXX/crm --username XX --password XX --table tb_keyword_data_201104 --export-dir /user/wanghai01/data/ --fields-terminated-by '\t' --lines-terminated-by '\n' --input-null-string '' --input-null-non-string ''
本篇文章来源于 Linux公社网站(www.linuxidc.com) 原文链接:http://www.linuxidc.com/Linux/2011-10/45080.htm
=============================================
Sqoop是一个用来将Hadoop和关系型数据库中的数据相互转移的工具,可以将一个关系型数据库(例如 : MySQL ,Oracle ,Postgres等)中的数据导入到Hadoop的HDFS中,也可以将HDFS的数据导入到关系型数据库中。
Sqoop的User Guide地址:http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html#_introduction
1:tar zxvf sqoop-1.1.0.tar.gz
2:修改配置文件 /home/hadoopuser/sqoop-1.1.0/conf/sqoop-site.xml
一般只需要修改如下几个项:
sqoop.metastore.client.enable.autoconnect
sqoop.metastore.client.autoconnect.url
sqoop.metastore.client.autoconnect.username
sqoop.metastore.client.autoconnect.password
sqoop.metastore.server.location
sqoop.metastore.server.port
3:
bin/sqoop help
bin/sqoop help import
4:
[hadoopuser@master sqoop-1.1.0]$ bin/sqoop import --connect jdbc:mysql://localhost/ppc --table data_ip --username kwps -P
Enter password:
11/02/18 10:51:58 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not find appropriate Hadoop shim for 0.20.2
java.lang.RuntimeException: Could not find appropriate Hadoop shim for 0.20.2
at com.cloudera.sqoop.shims.ShimLoader.loadShim(ShimLoader.java:190)
at com.cloudera.sqoop.shims.ShimLoader.getHadoopShim(ShimLoader.java:109)
at com.cloudera.sqoop.tool.BaseSqoopTool.init(BaseSqoopTool.java:173)
at com.cloudera.sqoop.tool.ImportTool.init(ImportTool.java:81)
at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:411)
at com.cloudera.sqoop.Sqoop.run(Sqoop.java:134)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:170)
at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:196)
at com.cloudera.sqoop.Sqoop.main(Sqoop.java:205)
解决办法:
默认情况下:
./hadoop-0.20.2/conf/hadoop-env.sh
# Extra Java runtime options. Empty by default.
# export HADOOP_OPTS=-server
需要更改成:
export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Dsqoop.shim.jar.dir=/home/hadoopuser/sqoop-1.1.0/shims"
特别需要注意的是:
Sqoop目前在Apache 版本的Hadoop 0.20.2上是无法使用的。
目前只支持CDH 3 beta 2版本。所以如果想使用的话,得升级到 CDH 3 beta 2版本了。
“Sqoop does not run with Apache Hadoop 0.20.2. The only supported platform is CDH 3 beta 2. It requires features of MapReduce not available in the Apache 0.20.2 release of Hadoop. You should upgrade to CDH 3 beta 2 if you want to run Sqoop 1.0.0.”
这个问题 已经被Cloudera 标记为 Major Bug,希望能尽快解决吧。
本篇文章来源于 Linux公社网站(www.linuxidc.com) 原文链接:http://www.linuxidc.com/Linux/2011-08/41442.htm
发表评论
-
实现mapreduce多文件自定义输出
2012-07-13 15:02 2532普通maprduce中通常是有map和reduce两个阶 ... -
hadoop中mapred.tasktracker.map.tasks.maximum的设置
2012-06-11 16:33 1150目前,我们邮件的一部分log已经迁移到Hadoop集群上 ... -
Hadoop集群的NameNode的备份
2012-06-01 15:58 1415Hadoop集群中,NameNode节 ... -
Hbase Shell的常用命令
2012-06-01 15:54 1366总结的一些Hbase shell的命令 都很简单,可以h ... -
Hadoop集群中增加新节点
2012-06-01 15:53 1141向一个正在运行的Hadoo ... -
Hadoop集群上使用Lzo压缩
2012-06-01 15:47 1184自从Hadoop集群搭建以来,我们一直使用的是Gzip进行 ... -
SSH连接反应慢的分析解决
2012-06-01 09:41 1451原创作品,允许转载,转载时请务必以超链接形式标明文章 原始出处 ... -
云计算大会视频与演讲备忘
2012-05-29 15:53 817<!--StartFragment --> 阿里 ... -
HBase HFile与Prefix Compression内部实现全解--KeyValue格式
2012-05-25 14:40 19281. 引子 HFile (HBase File)是HB ... -
HFile详解-基于HBase0.90.5
2012-05-25 14:25 23551. HFile详解 HFile文件分为以下六大部分 ... -
Hive-0.8.1索引的实验
2012-05-19 09:29 1528先说一个0.7.1和0.8.1的Metastore不兼容 一 ... -
Hive HBase 整合(中文)
2012-05-07 09:07 1462hive hbase整合,要求比较多,1.hive的得是0.6 ... -
编写hive udf函数
2012-05-04 19:13 1252udf编写入门大写转小写package com.afan;im ... -
Hive与HBase的整合
2012-04-28 10:48 1852开场白:Hive与HBase的整合功能的实现是利用两者本身对外 ... -
hadoop,hbase,hive安装全记录
2012-04-26 10:09 2538操作系统:CentOS 5.5Hadoop:hadoop- ... -
HDFS+MapReduce+Hive+HBase十分钟快速入门
2012-04-15 16:19 10841. 前言 本文的目的是让一个从未接触Hadoo ... -
云框架Hadoop之部署Hive
2012-04-12 15:47 880标签:Hive 云框架 Hadoop 部署 原创作品,允许 ... -
hive实战
2012-04-10 16:01 9771. 安装hive 2. hive实战 3. hi ... -
hadoop hive 资料
2012-04-09 11:10 1141Hive 是建立在 Hadoop 上的数据仓库基础构架。它提供 ... -
Hadoop的rmr和trash
2012-02-14 10:40 1172这两天在操作Hadoop集 ...
相关推荐
hadoop中的sqoop安装与使用
大数据技术基础实验报告-sqoop的安装配置与应用
第9章 Sqoop组件安装配置.pdf
电商数仓项目(九) Sqoop安装与配置
第9章 Sqoop组件安装配置.docx
大数据环境搭建———>Sqoop安装与配置
Hadoop hbase hive sqoop集群环境安装配置及使用文档
大数据
Sqoop学习文档(1){Sqoop基本概念、Sqoop的安装配置}。记录我的学习之旅,每份文档倾心倾力,带我成我大牛,回头观望满脸笑意,望大家多多给予意见,有问题或错误,请联系 我将及时改正;借鉴文章标明出处,谢谢
数据迁移工具sqoop和Hadoop系统集成步骤说明
Hadoop2.6伪分布安装sqoop1.4.6及hdfs与mysql导入导出数据 sqoop安装及简单测试 启动sqoop 启动mysql 从mysql导入数据到hdfs中 从HDFS导出数据到mysql
Sqoop相关知识; 多数使用Hadoop技术处理大数据业务的企业,有大量的数据存储在关系型数据中。由于没有工具支持,Hadoop和关系型数据库之间的数据传输是很困难的事情。传统的应用程序管理系统,即应用程序与使用RDBMS...
...
...
sqoop1.4.7安装包及安装配置文档
Sqoop集群搭建.Sqoop集群搭建.Sqoop集群搭建.Sqoop集群搭建.Sqoop集群搭建.Sqoop集群搭建.Sqoop集群搭建.Sqoop集群搭建.Sqoop集群搭建.
大数据离线分析系统,基于hadoop的hive以及sqoop的安装和配置
sqoop连接sqlserver的驱动工具,没有这个sqoop是连不上滴
sqoop-1.4.6-cdh5.14.0