- 浏览: 473460 次
- 性别:
- 来自: 南阳
文章分类
最新评论
-
yuanhongb:
这么说来,感觉CGI和现在的JSP或ASP技术有点像啊
cgi -
draem0507:
放假了还这么勤啊
JXL操作Excel -
chenjun1634:
学习中!!
PHP/Java Bridge -
Jelen_123:
好文章,给了我好大帮助!多谢!
hadoop安装配置 ubuntu9.10 hadoop0.20.2 -
lancezhcj:
一直用job
Oracle存储过程定时执行2种方法(转)
这属于Hbase的一个例子,不过Hbase的例子有点问题,需要更改下。
其实我感觉Hbase属于一个BigTable,感觉和xls真的很像,闲话不说了,上code才是王道。
- import java.io.IOException;
- import org.apache.hadoop.conf.Configuration;
- import org.apache.hadoop.fs.Path;
- import org.apache.hadoop.hbase.HBaseConfiguration;
- import org.apache.hadoop.hbase.client.Put;
- import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
- import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
- import org.apache.hadoop.hbase.mapreduce.TableReducer;
- import org.apache.hadoop.hbase.util.Bytes;
- import org.apache.hadoop.io.LongWritable;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.mapreduce.Job;
- import org.apache.hadoop.mapreduce.Mapper;
- import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
- import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
- import org.apache.hadoop.util.GenericOptionsParser;
- import org.apache.log4j.Logger;
- /**
- * Sample Uploader MapReduce
- * <p>
- * This is EXAMPLE code. You will need to change it to work for your context.
- * <p>
- * Uses {@link TableReducer} to put the data into HBase. Change the InputFormat
- * to suit your data. In this example, we are importing a CSV file.
- * <p>
- * <pre>row,family,qualifier,value</pre>
- * <p>
- * The table and columnfamily we're to insert into must preexist.
- * <p>
- * There is no reducer in this example as it is not necessary and adds
- * significant overhead. If you need to do any massaging of data before
- * inserting into HBase, you can do this in the map as well.
- * <p>Do the following to start the MR job:
- * <pre>
- * ./bin/hadoop org.apache.hadoop.hbase.mapreduce.SampleUploader /tmp/input.csv TABLE_NAME
- * </pre>
- * <p>
- * This code was written against HBase 0.21 trunk.
- */
- public class SampleUploader {
- public static Logger loger = Wloger.loger;
- private static final String NAME = "SampleUploader";
- static class Uploader
- extends Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
- private long checkpoint = 100;
- private long count = 0;
- @Override
- public void map(LongWritable key, Text line, Context context)
- throws IOException {
- // Input is a CSV file
- // Each map() is a single line, where the key is the line number
- // Each line is comma-delimited; row,family,qualifier,value
- // Split CSV line
- String [] values = line.toString().split(",");
- if(values.length != 4) {
- return;
- }
- // Extract each value
- byte [] row = Bytes.toBytes(values[0]);
- byte [] family = Bytes.toBytes(values[1]);
- byte [] qualifier = Bytes.toBytes(values[2]);
- byte [] value = Bytes.toBytes(values[3]);
- loger.info(values[0]+":"+values[1]+":"+values[2]+":"+values[3]);
- // Create Put
- Put put = new Put(row);
- put.add(family, qualifier, value);
- // Uncomment below to disable WAL. This will improve performance but means
- // you will experience data loss in the case of a RegionServer crash.
- // put.setWriteToWAL(false);
- try {
- context.write(new ImmutableBytesWritable(row), put);
- } catch (InterruptedException e) {
- e.printStackTrace();
- loger.error("write到hbase 异常:",e);
- }
- // Set status every checkpoint lines
- if(++count % checkpoint == 0) {
- context.setStatus("Emitting Put " + count);
- }
- }
- }
- /**
- * Job configuration.
- */
- public static Job configureJob(Configuration conf, String [] args)
- throws IOException {
- Path inputPath = new Path(args[0]);
- String tableName = args[1];
- Job job = new Job(conf, NAME + "_" + tableName);
- job.setJarByClass(Uploader.class);
- FileInputFormat.setInputPaths(job, inputPath);
- job.setInputFormatClass(TextInputFormat.class);
- job.setMapperClass(Uploader.class);
- // No reducers. Just write straight to table. Call initTableReducerJob
- // because it sets up the TableOutputFormat.
- loger.error("TableName:"+tableName);
- TableMapReduceUtil.initTableReducerJob(tableName, null, job);
- job.setNumReduceTasks(0);
- return job;
- }
- /**
- * Main entry point.
- *
- * @param args The command line parameters.
- * @throws Exception When running the job fails.
- */
- public static void main(String[] args) throws Exception {
- Configuration conf = HBaseConfiguration.create();
- String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
- if(otherArgs.length != 2) {
- System.err.println("Wrong number of arguments: " + otherArgs.length);
- System.err.println("Usage: " + NAME + " <input> <tablename>");
- System.exit(-1);
- }
- Job job = configureJob(conf, otherArgs);
- System.exit(job.waitForCompletion(true) ? 0 : 1);
- }
- }
import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.io.ImmutableBytesWritable; import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil; import org.apache.hadoop.hbase.mapreduce.TableReducer; import org.apache.hadoop.hbase.util.Bytes; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.util.GenericOptionsParser; import org.apache.log4j.Logger; /** * Sample Uploader MapReduce * <p> * This is EXAMPLE code. You will need to change it to work for your context. * <p> * Uses {@link TableReducer} to put the data into HBase. Change the InputFormat * to suit your data. In this example, we are importing a CSV file. * <p> * <pre>row,family,qualifier,value</pre> * <p> * The table and columnfamily we're to insert into must preexist. * <p> * There is no reducer in this example as it is not necessary and adds * significant overhead. If you need to do any massaging of data before * inserting into HBase, you can do this in the map as well. * <p>Do the following to start the MR job: * <pre> * ./bin/hadoop org.apache.hadoop.hbase.mapreduce.SampleUploader /tmp/input.csv TABLE_NAME * </pre> * <p> * This code was written against HBase 0.21 trunk. */ public class SampleUploader { public static Logger loger = Wloger.loger; private static final String NAME = "SampleUploader"; static class Uploader extends Mapper<LongWritable, Text, ImmutableBytesWritable, Put> { private long checkpoint = 100; private long count = 0; @Override public void map(LongWritable key, Text line, Context context) throws IOException { // Input is a CSV file // Each map() is a single line, where the key is the line number // Each line is comma-delimited; row,family,qualifier,value // Split CSV line String [] values = line.toString().split(","); if(values.length != 4) { return; } // Extract each value byte [] row = Bytes.toBytes(values[0]); byte [] family = Bytes.toBytes(values[1]); byte [] qualifier = Bytes.toBytes(values[2]); byte [] value = Bytes.toBytes(values[3]); loger.info(values[0]+":"+values[1]+":"+values[2]+":"+values[3]); // Create Put Put put = new Put(row); put.add(family, qualifier, value); // Uncomment below to disable WAL. This will improve performance but means // you will experience data loss in the case of a RegionServer crash. // put.setWriteToWAL(false); try { context.write(new ImmutableBytesWritable(row), put); } catch (InterruptedException e) { e.printStackTrace(); loger.error("write到hbase 异常:",e); } // Set status every checkpoint lines if(++count % checkpoint == 0) { context.setStatus("Emitting Put " + count); } } } /** * Job configuration. */ public static Job configureJob(Configuration conf, String [] args) throws IOException { Path inputPath = new Path(args[0]); String tableName = args[1]; Job job = new Job(conf, NAME + "_" + tableName); job.setJarByClass(Uploader.class); FileInputFormat.setInputPaths(job, inputPath); job.setInputFormatClass(TextInputFormat.class); job.setMapperClass(Uploader.class); // No reducers. Just write straight to table. Call initTableReducerJob // because it sets up the TableOutputFormat. loger.error("TableName:"+tableName); TableMapReduceUtil.initTableReducerJob(tableName, null, job); job.setNumReduceTasks(0); return job; } /** * Main entry point. * * @param args The command line parameters. * @throws Exception When running the job fails. */ public static void main(String[] args) throws Exception { Configuration conf = HBaseConfiguration.create(); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); if(otherArgs.length != 2) { System.err.println("Wrong number of arguments: " + otherArgs.length); System.err.println("Usage: " + NAME + " <input> <tablename>"); System.exit(-1); } Job job = configureJob(conf, otherArgs); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
Map/Reduce的输入/输出就不说了,不懂的,可以看hadoop专栏去.
[这个任务调用和上一个IndexBuilder有些不同哦,具体的可以参照上一个例子,相同点:都只有map任务]
xls内容如下:
- key3,family1,column1,xls1
- key3,family1,column2,xls11
- key4,family1,column1,xls2
- key4,family1,column2,xls12
key3,family1,column1,xls1 key3,family1,column2,xls11 key4,family1,column1,xls2 key4,family1,column2,xls12
这是csv格式的,如果是xls是可以导为csv格式的,具体可以google一下.
运行命令如下:
bin/hadoop jar SampleUploader.jar SampleUploader /tmp/input.csv 'table1'
这里的'table1'是上一遍IndexBuilder的时候建的表,表就使用上一张表[懒]
注意,这里使用的文件需要提交到hdfs上,否则会提示找不到,因为map/reduce是使用的是hdfs的文件系统.
http://www.iteye.com/topic/1117572
发表评论
-
mysql 定时任务
2015-11-03 09:57 733定时任务 查看event是否开启: show variabl ... -
tomcat服务器大数量数据提交Post too large解决办法
2015-10-29 11:05 705tomcat默认设置能接收HTTP POST请求的大小最大 ... -
Tomcat启动内存设置
2015-10-20 15:40 636Tomcat的启动分为startupo.bat启动和注册为w ... -
Java串口包Javax.comm的安装
2015-10-12 16:32 656安装个java的串口包安装了半天,一直找不到串口,现在终于搞 ... -
在 Java 应用程序中访问 USB 设备
2015-10-10 17:49 915介绍 USB、jUSB 和 JSR- ... -
自动生成Myeclipse7.5注册码
2015-08-11 16:46 437package com.rbt.action; impor ... -
mysql定时器
2015-08-04 14:01 5645.1以后可以使用 ALTER EVENT `tes ... -
oracle安装成功后,更改字符集
2015-07-23 11:53 599看了网上的文章,乱码有以下几种可能 1. 操作系统的字符集 ... -
利用html5调用本地摄像头拍照上传图片
2015-05-18 09:36 2573测试只有PC上可以,手机上不行 <!DOCTYPE ... -
必须Mark!最佳HTML5应用开发工具推荐
2015-05-15 22:50 929摘要:HTML5自诞生以来,作为新一代的Web标准,越来 ... -
Mobl试用二
2015-05-13 14:28 601最近有空又看了一下Mobl的一些说语法,备忘一下: 1 ... -
Nginx配置文件详细说明
2015-05-08 19:58 576在此记录下Nginx服务器nginx.conf的配置文件说明 ... -
axis调用cxf
2015-04-23 13:51 5191、写address时不用加?wsdl Service s ... -
mysql 获取第一个汉字首字母
2015-03-18 17:48 593select dmlb, dmz, dmsm1, CHAR ... -
failed to install Tomcat6 service解决办法
2015-02-12 09:20 500最近我重装了一下tomcat 6.0,可不知为什么,总是安装 ... -
tomcat 分配java内存
2015-02-11 10:37 561//首先检查程序有没有限入死循环 这个问题主要还是由这个问 ... -
[Android算法] Android蓝牙开发浅谈
2014-12-15 15:27 629对于一般的软件开发人 ... -
Android 内存溢出解决方案(OOM) 整理总结
2014-11-21 10:12 711原创作品,允许转载,转载时请务必以超链接形式标明文章 原始出 ... -
《HTML5从入门到精通》中文学习教程 PDF
2014-11-19 21:26 1092HTML5 草案的前身名为Web Applications ... -
mysql字符串函数(转载)
2014-11-13 12:05 535对于针对字符串位置的操作,第一个位置被标记为1。 AS ...
相关推荐
基于hadoop的,java实现hive快速导数据到Hbase解决方案。
利用sqoop把mysql数据导入到hbase中,建立phoenix与hbase的映射,用phoenix jdbc操作hbase!达到sql操作nosql!
kettle集群搭建以及使用kettle将mysql数据转换为Hbase数据
根据mysql中数据库配置表信息查询mysql中数据,将部分处理为json格式,上传到hbase中。
出现此问题时使用:java.lang.NullPointerException at org.json.JSONObject.(JSONObject.java:144) at org.apache.sqoop.util.SqoopJsonUtil.getJsonStringforMap(SqoopJsonUtil.java:43) at org.apache.sqoop....
java操作Hbase之从Hbase中读取数据写入hdfs中源码,附带全部所需jar包,欢迎下载学习。
博客配套文件,详细演示了如何通过MR程序的方式bulkload数据到hbase,代码可直接用于生产环境。
基于搭建好的hbase集群,将mysql(关系型数据库)的数据抽取到hbase表中。
利用hadoop的mapreduce把oracle/mysql中的数据导入到hbase和hdfs中的两个java程序
springboot搭建的hbase可视化界面 支持hbase的建表与删除 支持根据rowkey查询数据
基于数据冗余的HBase合并机制研究_HBase列式数据库的所有操作均以追加数据的方式写入,导致其合并机制占用资源过多,影响系统读性能。
课时21:使用sqoop2将mysql数据导入到HBase 课时22:集群管理之节点管理与数据任务 课时23:Rowkey设计与集群常见故障处理 课时24:集群调优经验分享 课时25:项目介绍与Solr环境搭建 课时26:数据层设计与中文...
Hbase有着先天的优势和先天的劣势,而劣势就是其较差的数据定位能力,也就是数据查询能力。因为面向列的特点,Hbase只能单单地以rowkey为主键作查询,而无法对表进行多维查询和join操作,并且查询通常都是全表扫描,耗费...
由于大数据里面涉及到非关系型数据库如hive、kudu、hbase等的数据迁移,目前涉及到的迁移工具都没有支持hive数据库的事务表的迁移,如果hive库里面存在大量的事务表的时候,目前的工具都是不支持的,例如华为的CDM,...
为解决现有的HBase数据压缩策略选择方法未考虑数据的冷热性,以及在选择过程中存在片面性和不可靠性的缺陷,提出了基于HBase数据分类的压缩策略选择方法。依据数据文件的访问频度将HBase数据划分为冷热数据,并限定具体...
MySQL通过sqoop工具用命令将数据导入到hbase的代码文件
Hbase笔记 —— 利用JavaAPI的方式操作Hbase数据库(往hbase的表中批量插入数据)
hbase备份和数据恢复,hbase与hive的互导,hbase和hdfs互导。
通过条件查询hbase数据导出csv,文本,html等文件,实现方式:将hbase关联hive,然后将hive数据导入真实表,在将真实表数据导入sql数据库
tsv格式的数据库测试文件,hbase可以通过采用importtsv导入外部数据到hbase中