Saprk distinct
Saprk core Transformation 轉換運算元
RDD整體上分為Value型別、雙Value型別和Key-Value型別
Value型別
distinct
package com.xcu.bigdata.spark.core.pg02_rdd.pg022_rdd_transform
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
/**
* @Package : com.xcu.bigdata.spark.core.pg02_rdd.pg022_rdd_transform
* @Author :
* @Date : 2020 11月 星期五
* @Desc : 全域性去重
*/
object Spark08_Distinct {
def main(args: Array[String]): Unit = {
//建立配置檔案
val conf: SparkConf = new SparkConf().setAppName("").setMaster("local[*]")
//建立SparkContext,該物件是提交的入口
val sc = new SparkContext(conf)
//建立RDD
val rdd: RDD[Int] = sc.makeRDD(List(1, 2, 3, 4, 5, 5, 4, 3, 3), 3)
//用mapPartitionsWithIndex的目的是為了證明全域性去重,和shuffle
rdd.mapPartitionsWithIndex {
(index, datas) => {
println(index + ":" + datas.mkString(","))
datas
}
}.collect()
println("***********************************")
val newRDD: RDD[Int] = rdd.distinct()
newRDD.mapPartitionsWithIndex {
(index, datas) => {
println(index + ":" + datas.mkString(","))
datas
}
}.collect()
//釋放資源
sc.stop()
}
}
相關文章
- Subarray Distinct Values
- uniq(uid) distinct uidUI
- RxJava_distinct&distinctUntilChangedRxJava
- 7.14 APPROX_COUNT_DISTINCTAPP
- [LeetCode] 115. Distinct SubsequencesLeetCode
- Oracle vs PostgreSQL Develop(15) - DISTINCT ONOracleSQLdev
- 7.15 APPROX_COUNT_DISTINCT_AGGAPP
- 7.16 APPROX_COUNT_DISTINCT_DETAILAPPAI
- PostgreSQL DBA(169) - Develop(Distinct vs Group by)SQLdev
- 【Leetcode】1081. Smallest Subsequence of Distinct CharactersLeetCode
- sql - distinct 去重複的用法SQL
- 【PostgreSQL 】PostgreSQL 15對distinct的優化SQL優化
- [20200117]push_pred distinct group by.txt
- SQL -去重Group by 和Distinct的效率SQL
- oracle中distinct和group by的區別Oracle
- distinct 去重需要注意的地方
- Solution - Atcoder Atcoder ARC137C Distinct Numbers
- MySQL distinct 和 order by 排序混淆的替代方案MySql排序
- 30 天精通 RxJS (15):Observable Operators - distinct, distinctUntilChangedJS
- World Tour Finals 2019 D - Distinct Boxes 題解
- C#黔驢技巧之去重(Distinct)C#
- 【Leetcode】1180. Count Substrings with Only One Distinct LetterLeetCode
- 對含distinct操作的SQL的優化SQL優化
- MySQL 中的 distinct 和 group by 的效能比較MySql
- 大資料下的Distinct Count(二):Bitmap篇大資料
- C#中Linq的去重方式Distinct詳解C#
- pg distinct 改寫遞迴最佳化(德哥的思路)遞迴
- 銀彈谷V平臺VSQL使用distinct與union all使用SQL
- PostgreSQL 資料庫中 DISTINCT 關鍵字的 4 種用法SQL資料庫
- oracle之優化一用group by或exists優化distinctOracle優化
- HDU6301 Distinct Values (多校第一場1004) (貪心)
- distinct 全部欄位和單列的含義和注意事項,
- [ - Flutter 狀態篇 redux - ] StoreConnector還是StoreBuilder,讓distinct把好關FlutterReduxRebuild
- [Laravel系列] 解決laravel中paginate()與distinct() count語句錯誤問題Laravel
- Django筆記十之values_list指定欄位取值及distinct去重處理Django筆記
- 【資料庫】PostgreSQL中使用`SELECT DISTINCT`和`SUBSTRING`函式實現去重查詢資料庫SQL函式
- Flutter 入門與實戰(六十三):Redux之利用 distinct 屬性進行效能優化FlutterRedux優化
- [重慶思莊每日技術分享]-oracle12c新特性 去重統計函式APPROX_COUNT_DISTINCTOracle函式APP