machine-learning
  • 機器學習:使用Python
    • 簡介Scikit-learn 機器學習
  • 分類法 Classification
    • Ex 1: Recognizing hand-written digits
    • EX 2: Normal and Shrinkage Linear Discriminant Analysis for classification
    • EX 3: Plot classification probability
    • EX 4: Classifier Comparison
    • EX 5: Linear and Quadratic Discriminant Analysis with confidence ellipsoid
  • 特徵選擇 Feature Selection
    • Ex 1: Pipeline Anova SVM
    • Ex 2: Recursive Feature Elimination
    • Ex 3: Recursive Feature Elimination with Cross-Validation
    • Ex 4: Feature Selection using SelectFromModel
    • Ex 5: Test with permutations the significance of a classification score
    • Ex 6: Univariate Feature Selection
    • Ex 7: Comparison of F-test and mutual information
  • 互分解 Cross Decomposition
  • 通用範例 General Examples
    • Ex 1: Plotting Cross-Validated Predictions
    • Ex 2: Concatenating multiple feature extraction methods
    • Ex 3: Isotonic Regression
    • Ex 4: Imputing missing values before building an estimator
    • Ex 5: ROC Curve with Visualization API
    • Ex 7: Face completion with a multi-output estimators
  • 群聚法 Clustering
    • EX 1: Feature_agglomeration.md
    • EX 2: Mean-shift 群聚法.md
    • EX 6: 以群聚法切割錢幣影像.md
    • EX 10:_K-means群聚法
    • EX 12: Spectral clustering for image segmentation
    • Plot Hierarchical Clustering Dendrogram
  • 支持向量機
    • EX 1:Non_linear_SVM.md
    • [EX 4: SVM_with _custom _kernel.md](SVM/EX4_SVM_with _custom _kernel.md)
  • 機器學習資料集 Datasets
    • Ex 1: The digits 手寫數字辨識
    • Ex 2: Plot randomly generated classification dataset 分類數據集
    • Ex 3: The iris 鳶尾花資料集
    • Ex 4: Plot randomly generated multilabel dataset 多標籤數據集
  • 應用範例 Application
    • 用特徵臉及SVM進行人臉辨識實例
    • 維基百科主要的特徵向量
    • 波士頓房地產雲端評估(一)
    • 波士頓房地產雲端評估(二)
  • 類神經網路 Neural_Networks
    • Ex 1: Visualization of MLP weights on MNIST
    • Ex 2: Restricted Boltzmann Machine features for digit classification
    • Ex 3: Compare Stochastic learning strategies for MLPClassifier
    • Ex 4: Varying regularization in Multi-layer Perceptron
  • 決策樹 Decision_trees
    • Ex 1: Decision Tree Regression
    • Ex 2: Multi-output Decision Tree Regression
    • Ex 3: Plot the decision surface of a decision tree on the iris dataset
    • Ex 4: Understanding the decision tree structure
  • 機器學習:使用 NVIDIA JetsonTX2
    • 從零開始
    • 讓 TX2 動起來
    • 安裝OpenCV
    • 安裝TensorFlow
  • 廣義線性模型 Generalized Linear Models
    • Ex 3: SGD: Maximum margin separating hyperplane
  • 模型選擇 Model Selection
    • Ex 3: Plotting Validation Curves
    • Ex 4: Underfitting vs. Overfitting
  • 半監督式分類法 Semi-Supervised Classification
    • Ex 3: Label Propagation digits: Demonstrating performance
    • Ex 4: Label Propagation digits active learning
    • Decision boundary of label propagation versus SVM on the Iris dataset
  • Ensemble_methods
    • IsolationForest example
  • Miscellaneous_examples
    • Multilabel classification
  • Nearest_Neighbors
    • Nearest Neighbors Classification
Powered by GitBook
On this page
  • 機器學習資料集/ 範例一: The digits dataset
  • (一)引入函式庫及內建手寫數字資料庫
  • (二)資料集介紹
  • (三)應用範例介紹
  1. 機器學習資料集 Datasets

Ex 1: The digits 手寫數字辨識

Previous機器學習資料集 DatasetsNextEx 2: Plot randomly generated classification dataset 分類數據集

Last updated 6 years ago

機器學習資料集/ 範例一: The digits dataset

這個範例目的是介紹機器學習範例資料集的操作,對於初學者以及授課特別適合使用。

(一)引入函式庫及內建手寫數字資料庫

#這行是在ipython notebook的介面裏專用,如果在其他介面則可以拿掉
%matplotlib inline
from sklearn import datasets

import matplotlib.pyplot as plt

#載入數字資料集
digits = datasets.load_digits()

#畫出第一個圖片
plt.figure(1, figsize=(3, 3))
plt.imshow(digits.images[-1], cmap=plt.cm.gray_r, interpolation='nearest')
plt.show()

(二)資料集介紹

digits = datasets.load_digits() 將一個dict型別資料存入digits,我們可以用下面程式碼來觀察裏面資料

for key,value in digits.items() :
    try:
        print (key,value.shape)
    except:
        print (key)
('images', (1797L, 8L, 8L))
('data', (1797L, 64L))
('target_names', (10L,))
DESCR
('target', (1797L,))

顯示

說明

('images', (1797L, 8L, 8L))

共有 1797 張影像,影像大小為 8x8

('data', (1797L, 64L))

data 則是將8x8的矩陣攤平成64個元素之一維向量

('target_names', (10L,))

說明10種分類之對應 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

DESCR

資料之描述

('target', (1797L,))

記錄1797張影像各自代表那一個數字

接下來我們試著以下面指令來觀察資料檔,每張影像所對照的實際數字存在digits.target變數中

images_and_labels = list(zip(digits.images, digits.target))
for index, (image, label) in enumerate(images_and_labels[:4]):
    plt.subplot(2, 4, index + 1)
    plt.axis('off')
    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
    plt.title('Training: %i' % label)
#接著我們嘗試將這個機器學習資料之描述檔顯示出來
print(digits['DESCR'])
Optical Recognition of Handwritten Digits Data Set
===================================================

Notes
-----
Data Set Characteristics:
    :Number of Instances: 5620
    :Number of Attributes: 64
    :Attribute Information: 8x8 image of integer pixels in the range 0..16.
    :Missing Attribute Values: None
    :Creator: E. Alpaydin (alpaydin '@' boun.edu.tr)
    :Date: July; 1998

This is a copy of the test set of the UCI ML hand-written digits datasets
http://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits

The data set contains images of hand-written digits: 10 classes where
each class refers to a digit.

Preprocessing programs made available by NIST were used to extract
normalized bitmaps of handwritten digits from a preprinted form. From a
total of 43 people, 30 contributed to the training set and different 13
to the test set. 32x32 bitmaps are divided into nonoverlapping blocks of
4x4 and the number of on pixels are counted in each block. This generates
an input matrix of 8x8 where each element is an integer in the range
0..16. This reduces dimensionality and gives invariance to small
distortions.

For info on NIST preprocessing routines, see M. D. Garris, J. L. Blue, G.
T. Candela, D. L. Dimmick, J. Geist, P. J. Grother, S. A. Janet, and C.
L. Wilson, NIST Form-Based Handprint Recognition System, NISTIR 5469,
1994.

References
----------
  - C. Kaynak (1995) Methods of Combining Multiple Classifiers and Their
    Applications to Handwritten Digit Recognition, MSc Thesis, Institute of
    Graduate Studies in Science and Engineering, Bogazici University.
  - E. Alpaydin, C. Kaynak (1998) Cascading Classifiers, Kybernetika.
  - Ken Tang and Ponnuthurai N. Suganthan and Xi Yao and A. Kai Qin.
    Linear dimensionalityreduction using relevance weighted LDA. School of
    Electrical and Electronic Engineering Nanyang Technological University.
    2005.
  - Claudio Gentile. A New Approximate Maximal Margin Classification
    Algorithm. NIPS. 2000.

這個描述檔說明了這個資料集是在 1998年時建立的,由E. Alpaydin, C. Kaynak ,Department of Computer Engineering Bogazici University, Istanbul Turkey 建立的。數字的筆跡總共來自43個人,一開始取像時為32x32的點陣影像,之後經運算處理形成 8x8影像,其中灰階記錄的範圍則為 0~16的整數。

(三)應用範例介紹

在整個scikit-learn應用範例中,有以下幾個範例是利用了這組手寫辨識資料集。這個資料集的使用最適合機器學習初學者來理解分類法的原理以及其進階應用

http://scikit-learn.org/stable/auto_examples/datasets/plot_digits_last_image.html
分類法 Classification
Ex 1: Recognizing hand-written digits
特徵選擇 Feature Selection
Ex 2: Recursive Feature Elimination
Ex 3: Recursive Feature Elimination with Cross-Validation
png
png