Linux（vi/vim）

一般模式

編輯模式

指令模式

壓縮和解壓

gzip/gunzip 壓縮

（1）只能壓縮檔案不能壓縮目錄

（2）不保留原來的檔案

gzip壓縮：gzip hello.txt

gunzip解壓縮檔案：gunzip hello.txt.gz

zip/unzip 壓縮

可以壓縮目錄且保留原始檔

zip壓縮（壓縮 1.txt 和2.txt，壓縮後的名稱為mypackage.zip）：zip hello.zip hello.txt world.txt

unzip解壓：unzip hello.zip

unzip解壓到指定目錄：unzip hello.zip -d /opt

tar 打包

tar壓縮多個檔案：tar -zcvf hello.txt world.txt

tar壓縮目錄：tar -zcvf hello.tar.gz opt/

tar解壓到當前目錄：tar -zxvf hello.tar.gz

tar解壓到指定目錄：tar -zxvf hello.tar.gz -C /opt

RPM

RPM查詢命令：rpm -qa |grep firefox

RPM解除安裝命令：

rpm -e xxxxxx

rpm -e --nodeps xxxxxx（不檢查依賴）

RPM安裝命令：

rpm -ivh xxxxxx.rpm

rpm -ivh --nodeps fxxxxxx.rpm（--nodeps，不檢測依賴進度）

Shell

輸入/輸出重定向

指令碼編輯

Hadoop

啟動類命令

hadoop fs/hdfs dfs 命令

yarn命令

Zookeeper

啟動命令

基本操作

四字母命令

Kafka

「注:」 這裡機器我只寫一個。命令你們也可使用 ./bin/xx.sh (如：./bin/kafka-topics.sh)

檢視當前伺服器中的所有topic

kafka-topics --zookeeper xxxxxx:2181 --list --exclude-internal 
說明：
exclude-internal：排除kafka內部topic
比如： --exclude-internal  --topic "test_.*"

建立topic

kafka-topics --zookeeper xxxxxx:2181  --create 
--replication-factor 
--partitions 1 
--topic topic_name
說明：
--topic 定義topic名
--replication-factor  定義副本數
--partitions  定義分割槽數

刪除topic

「注意:」 需要server.properties中設定delete.topic.enable=true否則只是標記刪除【關注尚矽谷，輕鬆學IT】

kafka-topics --zookeeper xxxxxx:2181 --delete --topic topic_name

生產者

kafka-console-producer --broker-list xxxxxx:9092 --topic topic_name
可加：--property parse.key=true（有key訊息）

消費者

kafka-console-consumer --bootstrap-server xxxxxx:9092 --topic topic_name
注：可選
--from-beginning：會把主題中以往所有的資料都讀取出來
--whitelist '.*' ：消費所有的topic
--property print.key=true：顯示key進行消費
--partition 0：指定分割槽消費
--offset：指定起始偏移量消費

檢視某個Topic的詳情

kafka-topics --zookeeper xxxxxx:2181 --describe --topic topic_name

修改分割槽數

kafka-topics --zookeeper xxxxxx:2181 --alter --topic topic_name --partitions 6

檢視某個消費者組資訊

kafka-consumer-groups --bootstrap-server  xxxxxx:9092  --describe --group group_name

刪除消費者組

kafka-consumer-groups --bootstrap-server  xxxxxx:9092  ---delete --group group_name

重置offset

kafka-consumer-groups --bootstrap-server  xxxxxx:9092  --group group_name
--reset-offsets --all-topics --to-latest --execute

leader重新選舉

指定Topic指定分割槽用重新PREFERRED：優先副本策略進行Leader重選舉

kafka-leader-election --bootstrap-server xxxxxx:9092 
--topic topic_name --election-type PREFERRED --partition 0

所有Topic所有分割槽用重新PREFERRED：優先副本策略進行Leader重選舉

kafka-leader-election --bootstrap-server xxxxxx:9092 
--election-type preferred  --all-topic-partitions

查詢kafka版本資訊

kafka-configs --bootstrap-server xxxxxx:9092
--describe --version

增刪改配置

topic新增/修改動態配置

kafka-configs --bootstrap-server xxxxxx:9092
--alter --entity-type topics --entity-name topic_name 
--add-config file.delete.delay.ms=222222,retention.ms=999999

topic刪除動態配置

kafka-configs --bootstrap-server xxxxxx:9092 
--alter --entity-type topics --entity-name topic_name 
--delete-config file.delete.delay.ms,retention.ms

持續批量拉取訊息

單次最大消費10條訊息(不加引數意為持續消費)

kafka-verifiable-consumer --bootstrap-server xxxxxx:9092 
--group group_name
--topic topic_name --max-messages 10

刪除指定分割槽的訊息

刪除指定topic的某個分割槽的訊息刪除至offset為1024

json檔案offset-json-file.json

{
    "partitions": [
        {
            "topic": "topic_name",
            "partition": 0,
            "offset": 1024
        }
    ],
    "version": 1
}

kafka-delete-records --bootstrap-server xxxxxx:9092 
--offset-json-file offset-json-file.json

檢視Broker磁碟資訊

查詢指定topic磁碟資訊

kafka-log-dirs --bootstrap-server xxxxxx:9090 
--describe --topic-list topic1,topic2

查詢指定Broker磁碟資訊

kafka-log-dirs --bootstrap-server xxxxxx:9090 
--describe --topic-list topic1 --broker-list 0

Hive

啟動類

hive 啟動後設資料服務（metastore和hiveserver2）和優雅關閉指令碼

啟動：hive.sh start
關閉：hive.sh stop
重啟：hive.sh restart
狀態：hive.sh status

指令碼如下

#!/bin/bash
HIVE_LOG_DIR=$HIVE_HOME/logs
mkdir -p $HIVE_LOG_DIR
#檢查程式是否執行正常，引數1為程式名，引數2為程式埠
function check_process()
{
    pid=$(ps -ef 2>/dev/null | grep -v grep | grep -i $1 | awk '{print $2}')
    ppid=$(netstat -nltp 2>/dev/null | grep $2 | awk '{print $7}' | cut -d '/' -f 1)
    echo $pid
    [[ "$pid" =~ "$ppid" ]] && [ "$ppid" ] && return 0 || return 1
}
function hive_start()
{
    metapid=$(check_process HiveMetastore 9083)
    cmd="nohup hive --service metastore >$HIVE_LOG_DIR/metastore.log 2>&1 &"
    cmd=$cmd" sleep4; hdfs dfsadmin -safemode wait >/dev/null 2>&1"
    [ -z "$metapid" ] && eval $cmd || echo "Metastroe服務已啟動"
    server2pid=$(check_process HiveServer2 10000)
    cmd="nohup hive --service hiveserver2 >$HIVE_LOG_DIR/hiveServer2.log 2>&1 &"
    [ -z "$server2pid" ] && eval $cmd || echo "HiveServer2服務已啟動"
}
function hive_stop()
{
    metapid=$(check_process HiveMetastore 9083)
    [ "$metapid" ] && kill $metapid || echo "Metastore服務未啟動"
    server2pid=$(check_process HiveServer2 10000)
    [ "$server2pid" ] && kill $server2pid || echo "HiveServer2服務未啟動"
}
case $1 in
"start")
    hive_start
    ;;
"stop")
    hive_stop
    ;;
"restart")
    hive_stop
    sleep 2
    hive_start
    ;;
"status")
    check_process HiveMetastore 9083 >/dev/null && echo "Metastore服務執行正常" || echo "Metastore服務執行異常"
    check_process HiveServer2 10000 >/dev/null && echo "HiveServer2服務執行正常" || echo "HiveServer2服務執行異常"
    ;;
*)
    echo Invalid Args!
    echo 'Usage: '$(basename $0)' start|stop|restart|status'
    ;;
esac

常用互動命令

SQL類(特殊的)

內建函式

（1） NVL

給值為NULL的資料賦值，它的格式是NVL( value，default_value)。它的功能是如果value為NULL，則NVL函式返回default_value的值，否則返回value的值，如果兩個引數都為NULL ，則返回NULL

select nvl(column, 0) from xxx；

（2）行轉列

（3）列轉行(一列轉多行)

「Split(str, separator)：」 將字串按照後面的分隔符切割，轉換成字元array。

「EXPLODE(col)：」將hive一列中複雜的array或者map結構拆分成多行。

「LATERAL VIEW」

用法：
LATERAL VIEW udtf(expression) tableAlias AS columnAlias

解釋：lateral view用於和split, explode等UDTF一起使用，它能夠將一行資料拆成多行資料，在此基礎上可以對拆分後的資料進行聚合。

lateral view首先為原始表的每行呼叫UDTF，UDTF會把一行拆分成一或者多行，lateral view再把結果組合，產生一個支援別名表的虛擬表。

「準備資料來源測試」

「SQL」

SELECT movie,category_name 
FROM movie_info 
lateral VIEW
explode(split(category,",")) movie_info_tmp  AS category_name ;

「測試結果」

《功勳》      記錄
《功勳》      劇情
《戰狼2》     戰爭
《戰狼2》     動作
《戰狼2》     災難

視窗函式

（1）OVER()

定分析函式工作的資料視窗大小，這個資料視窗大小可能會隨著行的變而變化。

（2）CURRENT ROW（當前行）

n PRECEDING：往前n行資料
n FOLLOWING：往後n行資料

（3）UNBOUNDED（無邊界）

UNBOUNDED PRECEDING 前無邊界，表示從前面的起點
UNBOUNDED FOLLOWING後無邊界，表示到後面的終點

「SQL案例：由起點到當前行的聚合」

select 
    sum(money) over(partition by user_id order by pay_time rows between UNBOUNDED PRECEDING and current row) 
from or_order;

「SQL案例：當前行和前面一行做聚合」

select 
    sum(money) over(partition by user_id order by pay_time rows between 1 PRECEDING and current row) 
from or_order;

「SQL案例：當前行和前面一行和後一行做聚合」

select 
    sum(money) over(partition by user_id order by pay_time rows between 1 PRECEDING AND 1 FOLLOWING )
from or_order;

「SQL案例：當前行及後面所有行」

select 
    sum(money) over(partition by user_id order by pay_time rows between current row and UNBOUNDED FOLLOWING  )
from or_order;

（4）LAG(col,n,default_val)

往前第n行資料，沒有的話default_val

（5）LEAD(col,n, default_val)

往後第n行資料，沒有的話default_val

「SQL案例：查詢使用者購買明細以及上次的購買時間和下次購買時間」

select 
 user_id,,pay_time,money,
 
 lag(pay_time,1,'1970-01-01') over(PARTITION by name order by pay_time) prev_time,
 
 lead(pay_time,1,'1970-01-01') over(PARTITION by name order by pay_time) next_time
from or_order;

（6）FIRST_VALUE(col,true/false)

當前視窗下的第一個值，第二個引數為true，跳過空值。

（7）LAST_VALUE (col,true/false)

當前視窗下的最後一個值，第二個引數為true，跳過空值。

「SQL案例：查詢使用者每個月第一次的購買時間和每個月的最後一次購買時間」

select
 FIRST_VALUE(pay_time) 
     over(
         partition by user_id,month(pay_time) order by pay_time 
         rows between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING
         ) first_time,
 
 LAST_VALUE(pay_time) 
     over(partition by user_id,month(pay_time) order by pay_time rows between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING
     ) last_time
from or_order;

（8）NTILE(n)

把有序視窗的行分發到指定資料的組中，各個組有編號，編號從1開始，對於每一行，NTILE返回此行所屬的組的編號。（用於將分組資料按照順序切分成n片，返回當前切片值）

「SQL案例：查詢前25%時間的訂單資訊」

select * from (
    select User_id,pay_time,money,
    
    ntile(4) over(order by pay_time) sorted
    
    from or_order
) t
where sorted = 1;

4個By

（1）Order By

全域性排序，只有一個Reducer。

（2）Sort By

分割槽內有序。

（3）Distrbute By

類似MR中Partition，進行分割槽，結合sort by使用。

（4） Cluster By

當Distribute by和Sorts by欄位相同時，可以使用Cluster by方式。Cluster by除了具有Distribute by的功能外還兼具Sort by的功能。但是排序只能是升序排序，不能指定排序規則為ASC或者DESC。

在生產環境中Order By用的比較少，容易導致OOM。

在生產環境中Sort By+ Distrbute By用的多。

排序函式

（1）RANK()

排序相同時會重複，總數不會變

（2）DENSE_RANK()

排序相同時會重複，總數會減少

（3）ROW_NUMBER()

會根據順序計算

日期函式

datediff：返回結束日期減去開始日期的天數

datediff(string enddate, string startdate) 
select datediff('2021-11-20','2021-11-22')

date_add：返回開始日期startdate增加days天后的日期

date_add(string startdate, int days) 
select date_add('2021-11-20',3)

date_sub：返回開始日期startdate減少days天后的日期

date_sub (string startdate, int days) 
select date_sub('2021-11-22',3)

Redis

啟動類

key

String

List

Set

Hash

zset(Sorted set)

Flink

啟動

./start-cluster.sh

run

./bin/flink run [OPTIONS]
./bin/flink run -m yarn-cluster -c com.wang.flink.WordCount /opt/app/WordCount.jar

info

./bin/flink info [OPTIONS]

list

./bin/flink list [OPTIONS]

stop

./bin/flink stop  [OPTIONS] <Job ID>

cancel(弱化)

./bin/flink cancel  [OPTIONS] <Job ID>

savepoint

./bin/flink savepoint  [OPTIONS] <Job ID>

原創作者：王了個博

大資料開發之常用命令大全