研學社 · 入門組 | 第七期:不需要老師的無監督學習

使用者d8a171發表於2017-06-16

近些年,人工智慧領域發生了飛躍性的突破,更使得許多科技領域的學生或工作者對這一領域產生了濃厚的興趣。在入門人工智慧的道路上,The Master Algorithm 可以說是必讀書目之一,其重要性不需多言。作者 Pedro Domingos 看似只是粗略地介紹了機器學習領域的主流思想,然而幾乎所有當今已出現的、或未出現的重要應用均有所提及。本書既適合初學者從更宏觀的角度概覽機器學習這一領域,又埋下無數伏筆,讓有心人能夠對特定技術問題進行深入學習,是一本不可多得的指導性入門書籍。詼諧幽默的行文風格也讓閱讀的過程充滿趣味。

以這本書為載體,機器之心「人工智慧研學社 · 入門組」近期將正式開班!

加入方式


我們邀請所有對人工智慧、機器學習感興趣的初學者加入我們,透過對 The Master Algorithm 的閱讀與討論,宏觀、全面地瞭解人工智慧的發展歷史與技術原理。

第七章總結

本章討論了類比推理的三種演算法:即最近鄰、K近鄰(KNN)和支援向量機(SVM)。K近鄰演算法是最近鄰演算法(k=1)的擴充套件,但K近鄰可以處理更多的情況,並在使用一個相似度量的情況下對新的樣本進行分類。KNN早在80年代初就已經作為一種非引數技術應用於統計學估計和模式識別。

支援向量機(SVM)框架是目前最流行的“非專門設計”監督學習方法:即如果我們沒有任何專業領域的先驗知識,那麼SVM將是一個十分傑出的方法。該演算法的初步設想由前蘇聯統計學家Vladimir Vapnik於60年代在PhD學位論文中提出。1993年,貝爾實驗室對使用人工神經網路(ANN)實現“手寫字型識別(HCR)”比較感興趣,而 Vapnik 敢打賭支援向量機在手寫字型識別任務上要比人工神經網路更加出色。

支援向量機主要有三個屬性令其十分出色:

  1. 支援向量機構建的是最大間隔分類器,它的決策邊界會保留樣本點間的最大距離,而這種大間距就正好可以產生很好的泛化效果。
  2. 支援向量機可以構建線性分離超平面,但它同時有能力透過核函式將資料嵌入高維空間中。而這種高維線性分離器實際上在原空間是非線性的。這就意味著假設空間大大擴充套件了原來使用嚴格線性表徵的方法。
  3. 支援向量機是非引數方法,它保留了訓練樣本,並可能需要將它們全部儲存下來。但另一方面在實踐上,它通常只保留原始樣本中的一小部分,有時即採用很小的一個常數乘以維度數。因此支援向量機結合了非引數和引數模型的優點:它能靈活地表徵複雜函式,並有效地防止過擬合現象。


第七週 Q&A 總結

  • 1.K近鄰演算法是如何基於最近鄰演算法提升的?
  • A:K近鄰可以降低由最近鄰演算法帶來的潛在偏差。
  • 2.為什麼提高維度會令最近鄰方法產生困難?
  • A:提高維度意味著需要計算更多的特徵,這也就大大提高了計算複雜度。更不必說實際上很多特徵都是不相關的,所以最近鄰演算法並不擅長於處理不相關的特徵。
  • 3.解釋一下支援向量機的機制。
  • A:SVM是一種二元分類模型,其旨在尋找能最大化樣本資料間隔的超平面。

第八章總結

第八章“沒有老師的學習(Learning without a teacher)”展示了目前查詢所面臨的問題:即如果一個“機器人嬰兒”用人類的方法教導會怎麼樣?這表明一個人透過日常生活的經驗和其所制定的決策,就是一種大腦用來執行這些行動的演算法,它和前面我們提到的各種類比演算法是一樣的。而每一種演算法都在大腦內顯現一些痕跡,因此大腦就是一種尚未被發現的終極演算法。


【重要部分】

  • 簡介:
  •    研究者基於認知科學中的孩童學習理論開發出機器學習演算法。
  • 物以類聚:
  •    作者介紹了聚類演算法,它是一種將相似目標組合成一個叢集的演算法,如最大期望(EM)演算法和k均值聚類演算法。
  • 挖掘資料形:
  •    PCA 和 Isomap 是兩種降維演算法。
  • 回報驅動型機器人:
  •    作者介紹了強化學習,即一種依靠學習者對環境的響應而採取不同行動的技術。
  • 熟能生巧:
  •    Chunking 是一種啟發自心理學的卓越演算法。
  • 關係學習:
  •    該章節最後作者解釋了另一種很具潛力的演算法——關係學習。

【關鍵概念】
  • 聚類演算法
  •    k均值演算法
  •    樸素貝葉斯
  •    最大期望演算法
  • 降維演算法
  •    主成分分析(PCA)
  •    等距特徵對映
  • 強化學習
  •   效果律
  • Chunking
  •    A power law
  •    Soar
  • 關係學習
  •   馬爾可夫網路

【測試】

  1. 請分別給出兩個馬爾可夫模型和K均值聚類演算法的應用案例。
  2. 請描述哪種情況下Isomap演算法可以實現最好的降維方法。
  3. 帶泛化的強化學習為什麼經常很難得到一個穩定解?
  4. 聚類和Chunking有什麼區別?

Chapter #7 Review

【Chapter Summary】

This chapter discusses three algorithms that use analogical reasoning: nearest-neighbor, K-nearest-neighbor and support vector machine. K-nearest-neighbor is an upgraded version of nearest-neighbor, but a simple algorithm that stores all available cases and classifies new cases based on a similarity measure. KNN has already been used in statistical estimation and pattern recognition in the beginning of 1970's as a non-parametric technique.

The support vector machine or SVM framework is currently the most popular approach for “off-the-shelf” supervised learning: if you do not have any specialized prior knowledge of a domain, then SVM is an excellent method. It is invented by Soviet statistician Vladimir Vapnik, who developed the initial idea of SVM while working on his PhD thesis in 1960s. In 1993, Bell lab was interested in “handwritten character recognition” (HCR) using Artificial Neural Network (ANN). Vapnik made a bet that SVM would perform better than ANN on handwritten character recognition task.

There are three properties that make the SVMs attractive :

  • 1) It constructs a maximum margin separator, a decision boundary with the largest possible distance to example points, which helps to produce a good generalization;
  • 2) SVMs create a linear separating hyperplane, but they have the ability to embed the data into a higher-dimensional space. The high-dimensional linear separator is actually nonlinear in the original space. This means the hypothesis space is greatly expanded over methods that use strictly linear representations;
  • 3) SVMs are nonparametric method, they retain training examples and potentially need to store them all. On the other hand, in practice, they often end up only retaining a small fraction of the original examples, sometimes as few as a small constant times the number of dimensions. Thus, SVMs combine the advantages of nonparametric and parametric models: they are flexible to represent complex functions, and are resistant to overfitting.

Week 7 Q & A Collection

  1. How does k-nearest neighbor improve based on nearest-neighbor?
  2. k-nearest neighbor reduced the bias potentially caused by a single nearest neighbor
  3. Why would rising dimensionality create problems for nearest-neighbor algorithms?
  4. Rising dimensinality means more features to be involved in computing which can increase the complexity dramatically; not to mention that most features are actually irrelevant, and nearest neighbor is not good at deal with the irrelevant features
  5. Explain the mechanism of SVM.
  6. SVM is a binary classification model which intends to find the line or hyperplane with the maximum-margin to separate data

Chapter #8 Preview

【Chapter Summary】

Chapter 8, “Learning without a teacher”, presents the current difficulties faced by an audacious inquiry: what if a “baby robot” had to be taught as a human being? It is shown, through daily life examples of a person and the decisions he/she makes, which is the “type” of algorithm that the brain uses to perform such actions and the analogies to the algorithms presented in the previous chapters. Each tribe shows some traces in the brain, therefore the brain is the master algorithm, which is just yet to be found.


【Important Sections】

  • Intro:
  •    Researchers develop machine learning algorithms based on the theories of child learning in cognitive science.
  • Putting together birds of a feather:
  •    The author introduces clustering, an algorithm that would group similar objects, such as Expectation-Maximization(EM) algorithm and k-means.
  • Discovering the shape of the data:
  •    PCA and Isomap are two algorithms for dimensionality reduction.
  • The hedonistic robot:
  •    The author talks about reinforcement learning, a technique that relies on response of the environment to various actions by the learner.
  • Practice makes perfect:
  •    Chunking is a preeminent learning algorithm inspired by psychology.
  • Learning to relate:
  •    The chapter ends with the author explaining another potentially killer algorithm - relational learning.


【Key Concepts】

  • Cluster 聚類
  •    k-means algorithm k均值演算法
  •    Naïve Bayes 樸素貝葉斯
  •    Expectation-Maximization(EM) algorithm 最大期望演算法
  • Dimensionality reduction 降維
  •    Principal-component analysis(PCA) 主成分分析
  •    Isomap 等距特徵對映
  • Reinforcement Learning 強化學習
  •    The Law of Effect 效果律
  • Chunking
  •    A power law
  •    Soar
  • Relational Learning 關係學習
  •    Markov network 馬爾可夫網路


【Quiz】

  1. Give two applications of Markov models and k-means algorithm respectively.
  2. Describe the situation when Isomap algorithm performs the best for dimensionality reduction.
  3. Reinforcement learning with generalization often fails to settle on a stable solution, why?
  4. What is the difference between clustering and chunking?

相關文章