Spark Distributed matrix 分散式矩陣

智慧先行者發表於2017-05-06

RowMatrix行矩陣

import org.apache.spark.rdd.RDD
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.linalg.distributed.RowMatrix

val df1 = Seq(
     |       (1.0, 2.0, 3.0),
     |       (1.1, 2.1, 3.1),
     |       (1.2, 2.2, 3.2)).toDF("c1", "c2", "c3")
df1: org.apache.spark.sql.DataFrame = [c1: double, c2: double ... 1 more field]

df1.show
+---+---+---+
| c1| c2| c3|
+---+---+---+
|1.0|2.0|3.0|
|1.1|2.1|3.1|
|1.2|2.2|3.2|
+---+---+---+
                       
// DataFrame轉換成RDD[Vector]
val rowsVector= df1.rdd.map {
     |       x =>
     |         Vectors.dense(
     |           x(0).toString().toDouble,
     |           x(1).toString().toDouble,
     |           x(2).toString().toDouble)
     |     }
rowsVector: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector] = MapPartitionsRDD[4] at map

// Create a RowMatrix from an RDD[Vector].
val mat1: RowMatrix = new RowMatrix(rowsVector)
mat1: org.apache.spark.mllib.linalg.distributed.RowMatrix = org.apache.spark.mllib.linalg.distributed.RowMatrix@7ba821ef

// Get its size.
val m = mat1.numRows()
m: Long = 3                                                                     

val n = mat1.numCols()
n: Long = 3

// 將RowMatrix轉換成DataFrame
val resDF = mat1.rows.map {
     |       x =>
     |         (x(0).toDouble,
     |           x(1).toDouble,
     |           x(2).toDouble)
     |     }.toDF("c1", "c2", "c3")
resDF: org.apache.spark.sql.DataFrame = [c1: double, c2: double ... 1 more field]

resDF.show
+---+---+---+
| c1| c2| c3|
+---+---+---+
|1.0|2.0|3.0|
|1.1|2.1|3.1|
|1.2|2.2|3.2|
+---+---+---+


mat1.rows.collect().take(10)
res3: Array[org.apache.spark.mllib.linalg.Vector] = Array([1.0,2.0,3.0], [1.1,2.1,3.1], [1.2,2.2,3.2])

CoordinateMatrix座標矩陣

import org.apache.spark.rdd.RDD
import org.apache.spark.mllib.linalg.distributed.{CoordinateMatrix, MatrixEntry}

// 第一列：行座標；第二列：列座標；第三列：矩陣元素
val df = Seq(
     |       (0, 0, 1.1), (0, 1, 1.2), (0, 2, 1.3),
     |       (1, 0, 2.1), (1, 1, 2.2), (1, 2, 2.3),
     |       (2, 0, 3.1), (2, 1, 3.2), (2, 2, 3.3),
     |       (3, 0, 4.1), (3, 1, 4.2), (3, 2, 4.3)).toDF("row", "col", "value")
df: org.apache.spark.sql.DataFrame = [row: int, col: int ... 1 more field]

df.show
+---+---+-----+
|row|col|value|
+---+---+-----+
|  0|  0|  1.1|
|  0|  1|  1.2|
|  0|  2|  1.3|
|  1|  0|  2.1|
|  1|  1|  2.2|
|  1|  2|  2.3|
|  2|  0|  3.1|
|  2|  1|  3.2|
|  2|  2|  3.3|
|  3|  0|  4.1|
|  3|  1|  4.2|
|  3|  2|  4.3|
+---+---+-----+

// 生成入口矩陣
val entr = df.rdd.map { x =>
     |       val a = x(0).toString().toLong
     |       val b = x(1).toString().toLong
     |       val c = x(2).toString().toDouble
     |       MatrixEntry(a, b, c)
     |     }
entr: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.distributed.MatrixEntry] = MapPartitionsRDD[20] at map

// 生成座標矩陣
val mat: CoordinateMatrix = new CoordinateMatrix(entr)
mat: org.apache.spark.mllib.linalg.distributed.CoordinateMatrix = org.apache.spark.mllib.linalg.distributed.CoordinateMatrix@5381deec

mat.numRows()
res5: Long = 4                                                                  

mat.numCols()
res6: Long = 3

mat.entries.collect().take(10)
res7: Array[org.apache.spark.mllib.linalg.distributed.MatrixEntry] = Array(MatrixEntry(0,0,1.1), MatrixEntry(0,1,1.2), MatrixEntry(0,2,1.3), MatrixEntry(1,0,2.1), MatrixEntry(1,1,2.2), MatrixEntry(1,2,2.3), MatrixEntry(2,0,3.1), MatrixEntry(2,1,3.2), MatrixEntry(2,2,3.3), MatrixEntry(3,0,4.1))

// 座標矩陣轉成，帶行索引的DataFrame，行索引為行座標
val t = mat.toIndexedRowMatrix().rows.map { x =>
     |       val v=x.vector
     |       (x.index,v(0).toDouble, v(1).toDouble, v(2).toDouble)
     |     }
t: org.apache.spark.rdd.RDD[(Long, Double, Double, Double)] = MapPartitionsRDD[33] at map

t.toDF().show
+---+---+---+---+                                                               
| _1| _2| _3| _4|
+---+---+---+---+
|  0|1.1|1.2|1.3|
|  1|2.1|2.2|2.3|
|  2|3.1|3.2|3.3|
|  3|4.1|4.2|4.3|
+---+---+---+---+

// 座標矩陣轉成DataFrame
val t1 = mat.toRowMatrix().rows.map { x =>
     |       (x(0).toDouble, x(1).toDouble, x(2).toDouble)
     |     }
t1: org.apache.spark.rdd.RDD[(Double, Double, Double)] = MapPartitionsRDD[26] at map

t1.toDF().show
+---+---+---+
| _1| _2| _3|
+---+---+---+
|1.1|1.2|1.3|
|3.1|3.2|3.3|
|2.1|2.2|2.3|
|4.1|4.2|4.3|
+---+---+---+

【矩陣乘法】Matrix Power Series
2020-12-19
矩陣
Cellular Matrix 蜂窩矩陣（一）
2020-04-06
矩陣
SciTech-Matrix Analysis of Management+Theory-管理科學的“矩陣式分析”
2024-07-22
矩陣
[原始碼解析] PyTorch 分散式(14) --使用 Distributed Autograd 和 Distributed Optimizer
2021-12-13
原始碼PyTorch分散式
flutter佈局-5-Matrix4矩陣變換
2018-11-07
Flutter矩陣
張量（Tensor）、標量（scalar）、向量（vector）、矩陣（matrix）
2023-05-10
矩陣
動手畫混淆矩陣(Confusion Matrix)（含程式碼）
2022-08-10
矩陣
分散式系統(Distributed System)資料
2018-05-26
分散式
非科班程式設計師才不知道的矩陣Matrix
2021-09-09
程式設計師矩陣
hadoop+spark偽分散式
2024-04-12
HadoopSpark分散式
POJ 3233 Matrix Power Series （矩陣快速冪+等比數列二分求和）
2020-04-06
矩陣
用Spark學習矩陣分解推薦演算法
2018-09-30
Spark矩陣演算法
巨大的矩陣（矩陣加速）
2024-08-16
矩陣
鄰接矩陣、度矩陣
2021-12-07
矩陣
奇異矩陣，非奇異矩陣，偽逆矩陣
2020-09-29
矩陣
資料結構：陣列，稀疏矩陣，矩陣的壓縮。應用：矩陣的轉置，矩陣相乘
2020-10-28
資料結構陣列矩陣
矩陣
2024-04-28
矩陣
什麼是矩陣式專案管理？
2023-04-23
矩陣專案管理
求任意矩陣的伴隨矩陣
2024-06-18
矩陣
矩陣和陣列
2020-10-06
矩陣陣列
webgl內建函式--幾何函式與矩陣函式
2018-10-23
Web函式矩陣
Akka-Cluster（2）- distributed pub/sub mechanism 分散式釋出/訂閱機制
2018-11-08
分散式
Spark中分散式使用HanLP（1.7.0)分詞示例
2019-05-08
Spark分散式HanLP分詞
理解矩陣
2018-08-06
矩陣
海浪矩陣
2024-05-05
矩陣
矩陣相乘
2020-11-01
矩陣
稀疏矩陣
2020-10-15
矩陣
螺旋矩陣
2024-09-03
矩陣
矩陣乘法
2024-11-07
矩陣
8.6 矩陣？
2024-08-06
矩陣
找矩陣
2024-08-21
矩陣
矩陣分解
2020-12-06
矩陣
快手矩陣管理平臺，矩陣管理有方法
2020-07-31
矩陣
機器學習中的矩陣向量求導(四) 矩陣向量求導鏈式法則
2019-05-07
機器學習矩陣求導
機器學習中的矩陣向量求導(五) 矩陣對矩陣的求導
2019-05-27
機器學習矩陣求導
矩陣：如何使用矩陣操作進行 PageRank 計算？
2019-03-21
矩陣
演算法學習：矩陣快速冪/矩陣加速
2024-08-11
演算法矩陣
Hadoop 及Spark 分散式HA執行環境搭建
2023-02-27
HadoopSpark分散式
高斯消除矩陣
2018-11-09
矩陣

Spark Distributed matrix 分散式矩陣

RowMatrix行矩陣

CoordinateMatrix座標矩陣

相關文章