Flink 整合hbase

最新推荐文章于 2024-11-21 07:42:55 发布

原创最新推荐文章于 2024-11-21 07:42:55 发布 · 4.8k 阅读

6 ·

本内容遵循CC 4.0 BY-SA版权协议

标签

#Flink #HBase

Flink学习专栏收录该内容

7 篇文章

订阅专栏

本文深入探讨HBase数据库的架构与特性，包括数据模型、存储机制、操作命令及优化策略。同时，介绍了如何使用Flink从HBase中读取数据，实现流式处理，涵盖连接配置、数据读取及集成示例。

Hbase是一个分布式的、面向列的开源数据库，是hadoop项目的子项目，不同于一般的数据库，是一个适合非机构化数据结构存储的数据库，是一个基于列而不是行的模式。在hadoop生态圈的角色是实时、分布式、高维数据的数据存储。一个高可靠性、高性能、面向列、可伸缩、实时读写的分布式数据库。在HBase中上面的表格只是一行数据。

图 hbase数据结构

Row key：决定一行数据的唯一标识，是按照字典序排列的，最多只能存储64k的字节数据。

Column family列族&qualifier列

Hbase中每个列都归属于某个列族，列族必须作为表模式定义的一部分预先给出，列名以列族为前缀，每个列族都可以有多个列成员，新的列成员可以随后按需，动态加入。可以理解为hbase中的列是二级列，family是第一级列，qualifier是第二级列，两个是父子关系。权限控制、存储以及调优是在列族层面上进行的。Hbase把同一列族数据都存储在同一目录下，由几个文件保存。

TimeStamp时间戳在hbase中每个cell存储单元对同一份数据有多个版本，根据唯一的时间戳来区分每个版本的差异。不同版本按照时间倒序排序。可在写入时自动赋值，或者客户显示赋值。

Cell单元格由列和行的坐标交叉决定，内容都是未解析的字节数组。

体系架构

Client

包含访问HBase的接口并维护cache来加快对HBase的访问

Zookeeper

保证任何时候，集群中只有一个master

存贮所有Region的寻址入口。

实时监控Region server的上线和下线信息。并实时通知Master

存储HBase的schema和table元数据

Master

为Region server分配region

负责Region server的负载均衡

失效的Region server并重新分配其上的region

管理用户对table的增删改操作

RegionServer

Region server维护region，处理对这些region的IO请求

Region server负责切分在运行过程中变得过大的region　

HLog(WAL log)：

HLog文件就是一个普通的Hadoop Sequence File，Sequence File 的Key是 HLogKey对象，HLogKey中记录了写入数据的归属信息，除了table和 region名字外，同时还包括sequence number和timestamp，timestamp是” 写入时间”，sequence number的起始值为0，或者是最近一次存入文件系统中sequence number。HLog SequeceFile的Value是HBase的KeyValue对象，即对应HFile中的 KeyValue

Region

HBase自动把表水平划分成多个区域(region)，每个region会保存一个表里面某段连续的数据；每个表一开始只有一个region，随着数据不断插入表，region不断增大，当增大到一个阀值的时候，region就会等分会两个新的region（裂变）；当table中的行不断增多，就会有越来越多的region。这样一张完整的表被保存在多个Regionserver上。

Memstore 与 storefile

一个region由多个store组成，一个store对应一个CF（列族）store包括位于内存中的memstore和位于磁盘的storefile写操作先写入 memstore，当memstore中的数据达到某个阈值，hregionserver会启动 flashcache进程写入storefile，每次写入形成单独的一个storefile

当storefile文件的数量增长到一定阈值后，系统会进行合并（minor、 major compaction），在合并过程中会进行版本合并和删除工作（majar），形成更大的storefile。

当一个region所有storefile的大小和超过一定阈值后，会把当前的region 分割为两个，并由hmaster分配到相应的regionserver服务器，实现负载均衡。客户端检索数据，先在memstore找，找不到再找storefile，HRegion是HBase中分布式存储和负载均衡的最小单元。最小单元就表示不同的HRegion可以分布在不同的HRegion server上。

HRegion由一个或者多个Store组成，每个store保存一个columns family。

每个Strore又由一个memStore和0至多个StoreFile组成。

hbase shell 通用命令

status：提供hbase的状况

version：提供正在使用Hbase版本

table_help :表引用命令提供帮助

whoami 提供用户的信息

数据定义语言：

这些是关于HBase在表中操作的命令。

create: 创建一个表。

create 'test', 'cf1 name','cf1 age'

list: 列出HBase的所有表。

list

disable: 禁用表。

disable ‘trst’

is_disabled: 验证表是否被禁用。

enable: 启用一个表

enable ‘test’

is_enabled: 验证表是否已启用。

is_enable ‘test’

describe: 提供了一个表的描述。

alter: 改变一个表。

修改：

语法：alter 'tablename' ,NAME=>'列族'，VERSIONS=>5

alter 'emp', NAME=>'personal age',VERSIONS=>5

使用alter，可以设置和删除表范围，运算符，如MAX_FILESIZE，READONLY，MEMSTORE_FLUSHSIZE，DEFERRED_LOG_FLUSH等。

alter 'emp',READONLY 将emp表设为只读

删除表范围运算符

alter '表名',METHOD=>'table_att_set',NAME=>"MAX_FILESIZE"

删除列族

alter ‘tablename’,'delete'=>'column family'

例：alter 'emp','delete'=>'professional'

删除表前必须先禁用表

disable 'table name'

drop 'table name'

批量删除表

disable_all 't.*'

exists: 验证表是否存在。

Exists ‘test’

drop: 从HBase中删除表。

Drop ‘test’

drop_all: 丢弃在命令中给出匹配“regex”的表。

#添加数据

##查看表数据

Scan ‘test’

数据操作语言

put: 把指定列在指定的行中单元格的值在一个特定的表。

put 'test','1','cf1 name','zhangsan'

get: 取行或单元格的内容。

Get ‘test’,’1’

delete: 删除表中的单元格值。

delete 'emp','1','personal data:city','时间戳'

deleteall: 删除给定行的所有单元格。

deleteall 'emp','1'

scan: 扫描并返回表数据。

count: 计数并返回表中的行的数目。

Count 'emp'

truncate 此命令将禁止删除并重新创建一个表

truncate 'emp'

grant 名利授予特定的权限，如读写，执行和管理表给一个特定的用户

R W X C(创建) A(管理权限)

grant 'Turorialspoint','RWXCA'

revoke 命令用于撤销用户访问表的权限

revoke 'Turorialspoint'

user_permission 列出特定表的所有用户权限

user_permission 'emp'

###测试HBASE

##添加maven依赖

<dependency>

         <groupId>org.apache.flink</groupId>

         <artifactId>flink-hbase_2.12</artifactId>

         <version>${flink.version}</version>

     </dependency>

package xpu.cheng.FlinkCase.Hbase;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class HbaseTest {
/**
   * 连接hbase
   * 声明静态配置
   */
static Configuration conf=null;
static Connection conn=null;
static Admin admin=null;
static {
    conf = HBaseConfiguration.create();
    conf.set("hbase.zookeeper.quorum", "master,slave2,slave3");
    conf.set("hbase.zookeeper.property,clinet", "2181");
    try{
      conn=ConnectionFactory.createConnection(conf);
      admin=conn.getAdmin();
    } catch (IOException e) {
      e.printStackTrace();
    }
}

/**
   * 创建只有一个列族的表
   * 创建HBase表，是通过Admin来执行的，
   * 表和列簇则是分别通过TableDescriptorBuilder和ColumnFamilyDescriptorBuilder来构建
   * @throws IOException
   */
public static void createTable() throws IOException {
    /**判断emp表是否存在*/
    if(!admin.tableExists(TableName.valueOf("emp"))){
      TableName tableName=TableName.valueOf("emp");
      //表描述构造器
      HTableDescriptor tdb=new HTableDescriptor(tableName);
      //获得列族描述
      HColumnDescriptor hcd=new HColumnDescriptor("personal");
      HColumnDescriptor hcd1=new HColumnDescriptor("professional");
      //加入列族
      tdb.addFamily(hcd);
      tdb.addFamily(hcd1);
      admin.createTable(tdb);
      System.out.println("Table created");
    }
}

/**
   * 查看table表
   * @throws IOException
   */
public static void list() throws IOException {
    HTableDescriptor[] tableDescriptors=admin.listTables();
      for(int i=0;i<tableDescriptors.length;i++){
        System.out.println(tableDescriptors[i].getNameAsString());
      }
}
public static void scantable(TableName tb) throws IOException {
    Table table=conn.getTable(tb);
     Scan s = new Scan();
    ResultScanner rs = table.getScanner(s);
    for (Result r : rs) {
      for (KeyValue kv : r.raw()) {
        System.out.print("row : " + new String(kv.getRow()) + " ");
        System.out.print("family : " + new String(kv.getFamily()) + " ");
        System.out.print("qualifier : " + new String(kv.getQualifier()) + " ");
        System.out.print("Timestamp : " + kv.getTimestamp() + " ");
        System.out.println("value : " + new String(kv.getValue()) + " ");
      }
    }

}
    public static void getResult(TableName tableName,String rowkey) throws IOException {

    Table table=conn.getTable(tableName);
    //获取一行
    Get get=new Get(Bytes.toBytes(rowkey));
    Result set=table.get(get);
    Cell[] cells=set.rawCells();
    for(Cell cell:cells){
      System.out.println(Bytes.toString(cell.getQualifierArray(),cell.getValueOffset(),cell.getQualifierLength())
       +"::"+
      Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
    }
    table.close();

}
/**
   * 根据rowKey删除一行数据、或者删除某一行的某个列簇，或者某一行某个列簇某列
   * @param tableName
   * @param rowKey
   * @throws Exception
   */
public static void deleteData(TableName tableName, String rowKey, String columnFamily, String columnName) throws Exception{
    Table table = conn.getTable(tableName);
    Delete delete = new Delete(Bytes.toBytes(rowKey));
    //①根据rowKey删除一行数据
    table.delete(delete);

    //②删除某一行的某一个列簇内容
    delete.addFamily(Bytes.toBytes(columnFamily));

    //③删除某一行某个列簇某列的值
    delete.addColumn(Bytes.toBytes(columnFamily), Bytes.toBytes(columnName));
    table.close();
}
/**
   * 根据RowKey , 列簇，列名修改值
   * @param tableName
   * @param rowKey
   * @param columnFamily
   * @param columnName
   * @param columnValue
   * @throws Exception
   */
public static void updateData(TableName tableName, String rowKey, String columnFamily, String columnName, String columnValue) throws Exception{
    Table table = conn.getTable(tableName);
    Put put1 = new Put(Bytes.toBytes(rowKey));
    put1.addColumn(Bytes.toBytes(columnFamily), Bytes.toBytes(columnName), Bytes.toBytes(columnValue));
    table.put(put1);
    table.close();
}
/**
   * 添加数据（多个rowKey，多个列族）
   * @throws Exception
   */
public static void insertMany() throws Exception{
    Table table = conn.getTable(TableName.valueOf("emp"));
    List<Put> puts = new ArrayList<Put>();
    Put put1 = new Put(Bytes.toBytes("rowKey1"));
    put1.addColumn(Bytes.toBytes("personal"), Bytes.toBytes("name"), Bytes.toBytes("wd"));
    put1.addColumn(Bytes.toBytes("personal"), Bytes.toBytes("age"), Bytes.toBytes("23"));
    put1.addColumn(Bytes.toBytes("professional"), Bytes.toBytes("name"), Bytes.toBytes("jingli"));

    Put put2 = new Put(Bytes.toBytes("rowKey2"));
    put2.addColumn(Bytes.toBytes("personal"), Bytes.toBytes("name"), Bytes.toBytes("wh"));
    put2.addColumn(Bytes.toBytes("personal"), Bytes.toBytes("age"), Bytes.toBytes("25"));
    put2.addColumn(Bytes.toBytes("professional"), Bytes.toBytes("name"), Bytes.toBytes("jingli"));

    Put put3 = new Put(Bytes.toBytes("rowKey3"));
    put3.addColumn(Bytes.toBytes("personal"), Bytes.toBytes("name"), Bytes.toBytes("wd"));
    put3.addColumn(Bytes.toBytes("personal"), Bytes.toBytes("age"), Bytes.toBytes("33"));
    put3.addColumn(Bytes.toBytes("professional"), Bytes.toBytes("name"), Bytes.toBytes("jingli"));

    Put put4 = new Put(Bytes.toBytes("rowKey4"));
    put4.addColumn(Bytes.toBytes("personal"), Bytes.toBytes("name"), Bytes.toBytes("wd"));
    put4.addColumn(Bytes.toBytes("personal"), Bytes.toBytes("age"), Bytes.toBytes("88"));
    put4.addColumn(Bytes.toBytes("professional"), Bytes.toBytes("name"), Bytes.toBytes("jingli"));
    puts.add(put1);
    puts.add(put2);
    puts.add(put3);
    puts.add(put4);
    table.put(puts);
    table.close();
}

public static void main(String[] args) throws Exception {
    TableName tb=TableName.valueOf("emp");
    String row="rowKey4";

   /* getResult(tb,row);
    list();
    createTable();
    */
    insertMany();
    scantable(tb);
}
}

###flink读取hbase中的数据

##创建数据表

create 'test', 'cf1 name','cf1 age'

##添加数据

put 'test','1','cf1 name','zhangsan'

/**

 * 以hbase为数据源，从HBASE中获取数据，然后一流的形式发送

*继承RichSourceFunction重写父类方法

*/

  public class HBaseReader extends RichSourceFunction<String> {

  

  private Connection conn=null;

  private Table table=null;

  private Scan scan=null;

  /**

   * 在open方法中使用hbase的客户端连接

   * @param

   * @throws Exception

   */

  @Override

  public void open(Configuration parameters) throws Exception {

    super.open(parameters);

    org.apache.hadoop.conf.Configuration config=HBaseConfiguration.create();

  

    config.set(HConstants.ZOOKEEPER_QUORUM,"master");

    config.set(HConstants.ZOOKEEPER_CLIENT_PORT,"2181");

  

    config.setInt(HConstants.HBASE_CLIENT_OPERATION_TIMEOUT,3000);

    config.setInt(HConstants.HBASE_CLIENT_SCANNER_TIMEOUT_PERIOD,3000);

  

    TableName tableName= TableName.valueOf("test");

  

    conn=ConnectionFactory.createConnection(config);

    table=conn.getTable(tableName);

    scan=new Scan();

  }

  

  /**

   * run方法来自java接口文件sourceFunction

   * @param sourceContext

   * @throws Exception

   */

  @Override

  public void run(SourceContext<String> sourceContext) throws Exception {

  

    Iterator<Result> iterator = table.getScanner(scan).iterator();

    while(iterator.hasNext()){

      Result next = iterator.next();

      String string = Bytes.toString(next.getRow());

      StringBuffer sb=new StringBuffer();

      for(Cell cell:next.listCells()){

        String s = Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());

        sb.append("age "+s).append("_name ");

      }

      String result = sb.replace(sb.length() - 1, sb.length(), "").toString();

      Tuple2<String,String> tuple2=new Tuple2<>(string,result);

      sourceContext.collect(tuple2.toString());

    }

  

  }

  

    @Override

    public void cancel() {

  

    }

  

  @Override

  public void close() throws Exception {

    super.close();

    if(table!=null){

      table.close();

    }

    if(conn!=null){

      conn.close();

    }

  }

}

###flink读取hbase

public class FlinkReadHbase {

  public static void main(String[] args) throws Exception {

    StreamExecutionEnvironment env=StreamExecutionEnvironment.getExecutionEnvironment();

    env.enableCheckpointing(5000);

    env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);

    env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);

    DataStream dataStream=env.addSource(new HBaseReader());

    dataStream.print();

    env.execute("flink read hbase");

  }