GoldenGate實時投遞資料到大資料平臺(2)- Cassandra

margiex發表於2017-12-27

簡介

GoldenGate是一款可以實時投遞資料到大資料平臺的軟體,針對apache cassandra,經過簡單配置,即可實現從關係型資料將增量資料實時投遞到Cassandra,以下介紹配置過程。

Cassandra安裝

解壓apache-cassandra-3.11.1-bin.tar.gz到 /opt/cassandra

sudo mkdir /var/lib/cassandra

sudo mkdir /var/log/cassandra

sudo chown hadoop /var/log/cassandra

sudo chown hadoop /var/lib/cassandra

啟動 cassandra

/opt/cassandara/bin/Cassandra

檢視狀態

$ ./nodetool status

Datacenter: datacenter1

=======================

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

-- Address Load Tokens Owns (effective) Host ID Rack

UN 127.0.0.1 92.1 KiB 256 100.0% e4f1431e-85ec-483a-a1b4-fe7bc8f7c9d2 rack1

進入Cassandra的shell

/opt/cassandara/bin/cqlsh

Connected to Test Cluster at 127.0.0.1:9042.

[cqlsh 5.0.1 | Cassandra 3.11.1 | CQL spec 3.4.4 | Native protocol v4]

Use HELP for help.

cqlsh>  SELECT cluster_name, listen_address FROM system.local;
 cluster_name | listen_address
--------------+----------------
 Test Cluster |      127.0.0.1
 
(1 rows)

檢視系統空間

cqlsh> desc system.local;
 
CREATE TABLE system.local (
    key text PRIMARY KEY,
    bootstrapped text,
    broadcast_address inet,
    cluster_name text,
    cql_version text,
    data_center text,
    gossip_generation int,
    host_id uuid,
    listen_address inet,
    native_protocol_version text,
    partitioner text,
    rack text,
    release_version text,
    rpc_address inet,
    schema_version uuid,
    thrift_version text,
    tokens set<text>,
    truncated_at map<uuid, blob>
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = 'information about the local node'
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 0
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 3600000
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';

檢視有哪些keyspaces

cqlsh>describe keyspaces;

system_schema system_auth system system_distributed system_traces

建立後面OGG投遞時schema對應的keyspaces

cqlsh>CREATE KEYSPACE QASOURCE WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 1};

建立測試表,驗證一下cassandra的功能

Cqlsh>use qasource;

Cqlsh>create table test

(

id int ,

name varchar,

primary key(id)

);

插入幾條記錄

cqlsh:qasource> insert into test(id,name) values(1,'bck');

cqlsh:qasource> insert into test(id,name) values(2,'test');

cqlsh:qasource> select * from test;

id | name

----+------

1 | bck

2 | test

根據名稱查詢,由於無索引,會報錯

cqlsh:qasource> select * from test where name='test';

InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"

基於主鍵可正常查詢

cqlsh:qasource> select * from test where id=2;

id | name

----+------

2 | test

(1 rows)

建立索引後可正常查詢

cqlsh:qasource> create index idx_test on test(name);

cqlsh:qasource> select * from test where name='test';

id | name

----+------

2 | test

(1 rows)

如果更新,則需要指定key對應的WHERE條件

cqlsh:qasource> select * from test where id=2;

id | name

----+-------

2 | newly

(1 rows)

刪除記錄

cqlsh:qasource> delete from test where id=2;

cqlsh:qasource> select * from test ;

id | name

----+------

1 | bck

(1 rows)


OGG配置

解壓cassandra3.1.2 java driver到/u01/drivers/cassandra-java-driver-3.1.2

設定OGG for bigdata需要的環境變數

export LD_LIBRARY_PATH=$JAVA_HOME/jre/lib/amd64/server

配置replicat程式

REPLICAT rcass

-- Trail file for this example is located in "AdapterExamples/trail" directory

-- Command to add REPLICAT

-- add replicat rcass, exttrail AdapterExamples/trail/tr

TARGETDB LIBFILE libggjava.so SET property=dirprm/cass.props

REPORTCOUNT EVERY 1 MINUTES, RATE

GROUPTRANSOPS 1000

MAP QASOURCE.*, TARGET QASOURCE.*; 


新增投遞程式,使用OGG軟體中自帶的佇列檔案

GGSCI>add replicat rcass, exttrail AdapterExamples/trail/tr

投遞到cassandra需要的屬性檔案 cass.props

gg.handlerlist=cassandra

#The handler properties

gg.handler.cassandra.type=cassandra

gg.handler.cassandra.mode=op

gg.handler.cassandra.contactPoints=localhost

gg.handler.cassandra.ddlHandling=CREATE,ADD,DROP

gg.handler.cassandra.compressedUpdates=true

gg.handler.cassandra.cassandraMode=async

gg.handler.cassandra.consistencyLevel=ONE

goldengate.userexit.timestamp=utc

goldengate.userexit.writers=javawriter

javawriter.stats.display=TRUE

javawriter.stats.full=TRUE

gg.log=log4j

gg.log.level=INFO

gg.report.time=30sec

#Set the classpath here to the Datastax Cassandra Java Driver (3.1 latest)

#Link to the Cassandra drivers website

#http://cassandra.apache.org/doc/latest/getting_started/drivers.html#java

#Link to the Datastax Cassandra Java Driver

#https://github.com/datastax/java-driver

gg.classpath=/opt/cassandra-java-driver-3.1.2/*:/opt/cassandra-java-driver-3.1.2/lib/*

javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=.:ggjava/ggjava.jar:./dirprm


測試

啟動ogg replicat 程式

GGSCI (ol73) 12> start rcass

Sending START request to MANAGER ...

REPLICAT RCASS starting

檢視狀態

GGSCI (ol73) 13> info rcass

REPLICAT RCASS Last Started 2017-12-25 11:29 Status STARTING

Checkpoint Lag 00:00:00 (updated 02:56:41 ago)

Process ID 100794

Log Read Checkpoint File /u01/ogg4bd_12.3/AdapterExamples/trail/tr000000000

First Record RBA 0

GGSCI (ol73) 14> info rcass

REPLICAT RCASS Last Started 2017-12-25 11:30 Status RUNNING

Checkpoint Lag 00:00:00 (updated 00:00:00 ago)

Process ID 100794

Log Read Checkpoint File /u01/ogg4bd_12.3/AdapterExamples/trail/tr000000000

2015-11-06 02:45:39.000000 RBA 5660

已經正常啟動,並寫入資料。

統計寫入的資料

GGSCI (ol73) 15> stats rcass, total

Sending STATS request to REPLICAT RCASS ...

Start of Statistics at 2017-12-25 11:31:12.

Replicating from QASOURCE.TCUSTMER to QASOURCE.TCUSTMER:

*** Total statistics since 2017-12-25 11:30:43 ***

Total inserts 5.00

Total updates 1.00

Total deletes 0.00

Total discards 0.00

Total operations 6.00

Replicating from QASOURCE.TCUSTORD to QASOURCE.TCUSTORD:

*** Total statistics since 2017-12-25 11:30:43 ***

Total inserts 5.00

Total updates 3.00

Total deletes 2.00

Total discards 0.00

Total operations 10.00

End of Statistics. 


同步成功。

進入cassandra shell進行驗證

./cqlsh

cqlsh> desc qasource;

CREATE KEYSPACE qasource WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;

CREATE TABLE qasource.tcustord (

cust_code text,

order_date text,

product_code text,

order_id double,

product_amount bigint,

product_price double,

transaction_id double,

PRIMARY KEY (cust_code, order_date, product_code, order_id)

) WITH CLUSTERING ORDER BY (order_date ASC, product_code ASC, order_id ASC)

AND bloom_filter_fp_chance = 0.01

AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}

AND comment = ''

AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}

AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}

AND crc_check_chance = 1.0

AND dclocal_read_repair_chance = 0.1

AND default_time_to_live = 0

AND gc_grace_seconds = 864000

AND max_index_interval = 2048

AND memtable_flush_period_in_ms = 0

AND min_index_interval = 128

AND read_repair_chance = 0.0

AND speculative_retry = '99PERCENTILE';

CREATE TABLE qasource.tcustmer (

cust_code text PRIMARY KEY,

city text,

name text,

state text

) WITH bloom_filter_fp_chance = 0.01

AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}

AND comment = ''

AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}

AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}

AND crc_check_chance = 1.0

AND dclocal_read_repair_chance = 0.1

AND default_time_to_live = 0

AND gc_grace_seconds = 864000

AND max_index_interval = 2048

AND memtable_flush_period_in_ms = 0

AND min_index_interval = 128

AND read_repair_chance = 0.0

AND speculative_retry = '99PERCENTILE';


可以看到,已經增加了2張表到qasource keyspace中。

在cassandra中查詢OGG寫入的資料

cqlsh>use QASOURCE

cqlsh>desc keyspace QASOURCE;

檢視錶定義

cqlsh>desc table qasource.tcustmer;

cqlsh:qasource> desc table qasource.tcustmer;

CREATE TABLE qasource.tcustmer (

cust_code text PRIMARY KEY,

city text,

name text,

state text

) WITH bloom_filter_fp_chance = 0.01

AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}

AND comment = ''

AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}

AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}

AND crc_check_chance = 1.0

AND dclocal_read_repair_chance = 0.1

AND default_time_to_live = 0

AND gc_grace_seconds = 864000

AND max_index_interval = 2048

AND memtable_flush_period_in_ms = 0

AND min_index_interval = 128

AND read_repair_chance = 0.0

AND speculative_retry = '99PERCENTILE'; 

查詢資料

cqlsh>select * from qasource.tcustmer ;

cust_code | city | name | state

-----------+-------------+--------------------+-------

WILL | SEATTLE | BG SOFTWARE CO. | WA

JANE | DENVER | ROCKY FLYER INC. | CO

ANN | NEW YORK | ANN'S BOATS | NY

DAVE | TALLAHASSEE | DAVE'S PLANES INC. | FL

BILL | DENVER | BILL'S USED CARS | CO

(5 rows)


cqlsh:qasource> select * from qasource.tcustmer where cust_code='WILL';

cust_code | city | name | state

-----------+---------+-----------------+-------

WILL | SEATTLE | BG SOFTWARE CO. | WA

(1 rows)

cqlsh:qasource> select * from qasource.tcustord;

cust_code | order_date | product_code | order_id | product_amount | product_price | transaction_id

-----------+---------------------+--------------+----------+----------------+---------------+----------------

WILL | 1994-09-30 15:33:00 | CAR | 144 | 3 | 16520 | 100

BILL | 1995-12-31 15:00:00 | CAR | 765 | 3 | 14000 | 100

BILL | 1996-01-01 00:00:00 | TRUCK | 333 | 15 | 25000 | 100

(3 rows)

可以正常訪問資料,測試完成。

相關文章