聚類的基本問題及兩個常用演算法

weixin_33895657發表於2019-01-25

原文網址 : https://blog.csdn.net/weixin_33895657/article/details/87094010

聚類演算法

一、聚類的定義及其兩個基本問題

Data clustering is the task of partitioning a set of objects into groups such that the similarity of objects within each group is higher than that of objects across groups.

To cluster data, we need:

A distance measure (to quantify how similar or dissimilar two objects are)

An algorithm for clustering the data based on the distance measure

1、Distance measure

point and point distance

point and cluster distance：等價於point與cluster center point的距離

cluster and cluster distance：等價於cluster center points的距離

2、The Closest-Pair Problem

找出P中距離最近的兩個點：

the closest-pair problem

（1）Brute force algorithm: 時間複雜度為O()

SlowClosestPair

（2）Divide and conquer algorithm: 時間複雜度O(n(logn)^2)

FastClosestPair的recurrences:

T(n) = 2T(n/2) + f(n) ，f(n)為ClosestPairStrip時間複雜度O(nlogn)

T(2) = O(1)

FastClosestPair

ClosestPairStrip時間複雜度：O(nlogn)

ClosestPairStrip

二、兩種常用聚類演算法

1、Hierarchical Clustering 層次聚類

演算法思想：給定data、目標簇數k

step1：首先把每個點當成一個簇

step2：找到最近的兩個簇，把它們合併成一個簇

step3：重複step2直到只剩下k個簇

層次聚類

2、K-means Clustering K均值聚類

演算法思想：給定data、目標簇數k、迭代次數q

step1：初始化k個centers（如何初始化？）

step2：把每個點分配到離它最近的center

step3：屬於同一個center的點構成一個cluster

step4：重新計算每個cluster的center

step5：重複step2-4 q次

時間複雜度：O(qkn)

K-means聚類

3、如何選擇一個合適的k？

通常情況下，我們並不知道應該聚成多少類，因此我們會選擇不同的k，比較聚出來的簇的質量，衡量簇的質量用error of a cluster：

聚類誤差

參考資料：Coursera Algorithmic Thinking, Rice University.

相關文章

聚類演算法——DBSCAN演算法原理及公式
2020-05-20
聚類演算法公式
聚類演算法
2020-04-26
聚類演算法
聚類(part3)--高階聚類演算法
2020-10-11
聚類演算法
聚類之K均值聚類和EM演算法
2019-05-13
聚類演算法
Spark中的聚類演算法
2020-09-27
Spark聚類演算法
部分聚類演算法簡介及優缺點分析
2023-01-10
聚類演算法
聚類演算法綜述
2018-12-09
聚類演算法
OPTICS聚類演算法原理
2020-05-14
聚類演算法
初探DBSCAN聚類演算法
2021-05-22
聚類演算法
14聚類演算法-程式碼案例六-譜聚類(SC)演算法案例
2018-12-16
聚類演算法
09聚類演算法-層次聚類-CF-Tree、BIRCH、CURE
2018-12-11
聚類演算法
04聚類演算法-程式碼案例一-K-means聚類
2018-12-08
聚類演算法
手寫一個HTTP框架：兩個類實現基本的IoC功能
2020-10-14
HTTP框架
可伸縮聚類演算法綜述（可伸縮聚類演算法開篇）
2018-10-30
聚類演算法
聚類模型的演算法效能評價
2024-06-27
聚類模型演算法
sklearn建模及評估（聚類）
2019-09-03
聚類
深度聚類演算法敘談
2021-05-18
聚類演算法
深度聚類演算法淺談
2021-04-15
聚類演算法
兩個小問題深入淺出List的效能問題
2023-02-25
資源限制類問題的常用解決方案
2021-10-06
機器學習中的聚類演算法演變及學習筆記
2020-05-16
機器學習聚類演算法筆記
學java就兩個問題
2018-06-11
Java
再來兩個小問題
2019-03-14
【Python機器學習實戰】聚類演算法（1）——K-Means聚類
2021-12-06
Python機器學習聚類演算法
必知的git基本命令及常見問題
2023-01-16
Git
兩個看似奇怪的MySQL語句問題
2018-03-28
MySql
Cookie出現兩個同名Key的問題
2021-09-09
Cookie
關於dcat-admin的兩個問題...
2022-05-30
git的幾個常用基本操作
2020-07-20
Git
使用imp/exp遇到兩個問題
2019-04-15
資料鏈路層的三個基本問題
2020-07-20
安裝IE8後引發的兩個問題及解決辦法
2020-04-04
KMeans演算法與GMM混合高斯聚類
2023-04-16
演算法聚類
兩個考研政治很多人問題但是解釋通的問題
2020-12-05
記錄後臺遇到的兩個小問題
2022-05-24
【Python機器學習實戰】聚類演算法（2）——層次聚類(HAC)和DBSCAN
2021-12-16
Python機器學習聚類演算法
一個小小的演算法題：求兩數之和
2021-03-01
演算法
問一個 python 演算法題
2024-05-17
Python演算法