本文件是 CNN 完整實戰指南的第三部分,專注於 CIFAR-10 彩色影像分類。
前置閱讀:建議先完成 Part 1 (基礎知識) 和 Part 2 (PyTorch 實作)。
CIFAR-10 是從 MNIST 進階的重要一步,它帶來了真實世界影像分類的挑戰。
| 挑戰 | MNIST | CIFAR-10 | 難度提升 |
|---|---|---|---|
| 影像尺寸 | 28×28×1 = 784 | 32×32×3 = 3,072 | 資料量 ×4 |
| 背景複雜度 | 純黑色 | 自然背景 | ⭐⭐⭐⭐⭐ |
| 類內變異 | 筆跡差異 | 角度、光照、品種 | ⭐⭐⭐⭐⭐ |
| 類間相似 | 4 vs 9 | cat vs dog, car vs truck | ⭐⭐⭐⭐⭐ |
| 計算需求 | 低 | 中等 | 訓練時間 ×5-10 |
| 簡單模型準確率 | 98%+ | 65-70% | 難度 ×5 |
| SOTA 準確率 | 99.8% | 99%+ | 天花板相似 |
1. 彩色影像的複雜性
# MNIST: 單一亮度值
pixel = 128 # 灰階
# CIFAR-10: RGB 三個通道
pixel = [120, 85, 60] # R, G, B
# 紅 綠 藍
# 組合出數百萬種顏色,背景干擾大
2. 類別語意複雜
MNIST: 數字有標準形狀
- 數字 1 永遠是直線
- 數字 0 永遠是圓形
CIFAR-10: 物體變化巨大
- 狗:吉娃娃 vs 聖伯納(大小差 10 倍)
- 飛機:正面、側面、俯視(完全不同外觀)
- 青蛙:綠色、棕色、水中、陸地(背景融合)
3. 解析度限制
32×32 = 1024 像素總共,放大後:
一隻 32×32 的貓:
- 眼睛:約 2×2 = 4 像素
- 鬍鬚:約 1 像素寬
- 細節嚴重損失
對比:
- 人類看 32×32 影像準確率:約 94%
- 看 224×224 影像準確率:約 99%
由於 CIFAR-10 只有 50,000 張訓練影像,資料增強成為提升準確率的關鍵技術。
問題:訓練集太小,模型容易過擬合
訓練集:50,000 張
每類別:5,000 張
對比 ImageNet:
訓練集:1,400,000 張
每類別:1,000 張
解決方案:透過轉換產生新的訓練樣本
| 技術 | 效果 | 適用性 | CIFAR-10 推薦 |
|---|---|---|---|
| 水平翻轉 | 左右鏡射 | 大多數類別 | ✅ 強烈推薦 |
| 隨機裁剪 | 位置變化 | 所有類別 | ✅ 強烈推薦 |
| 顏色抖動 | 亮度、對比度、飽和度 | 彩色影像 | ✅ 推薦 |
| 旋轉 | 角度變化 | 旋轉不變性物體 | ⚠️ 小角度(±15°) |
| 縮放 | 尺度變化 | 所有類別 | ✅ 推薦 |
| Cutout | 隨機遮擋 | 提升魯棒性 | ✅ 進階技巧 |
| Mixup | 混合兩張影像 | 提升泛化 | ✅ 進階技巧 |
# Colab 視覺化程式碼
import matplotlib.pyplot as plt
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import numpy as np
# 載入資料
(X_train, y_train), _ = cifar10.load_data()
# 選擇一張飛機影像
airplane_idx = np.where(y_train == 0)[0][0]
sample_img = X_train[airplane_idx]
# 定義資料增強
datagen = ImageDataGenerator(
rotation_range=15,
width_shift_range=0.1,
height_shift_range=0.1,
horizontal_flip=True,
zoom_range=0.1,
brightness_range=[0.8, 1.2]
)
# 產生增強樣本
fig, axes = plt.subplots(3, 5, figsize=(15, 10))
fig.suptitle('資料增強範例(同一張飛機影像)', fontsize=16)
# 原始影像
axes[0, 0].imshow(sample_img)
axes[0, 0].set_title('原始影像', fontweight='bold')
axes[0, 0].axis('off')
# 14 個增強版本
sample_img_4d = sample_img.reshape((1,) + sample_img.shape)
aug_iter = datagen.flow(sample_img_4d, batch_size=1)
for i in range(14):
aug_img = next(aug_iter)[0].astype('uint8')
ax = axes[(i+1)//5, (i+1)%5]
ax.imshow(aug_img)
ax.set_title(f'增強 #{i+1}')
ax.axis('off')
plt.tight_layout()
plt.show()
# 實驗對比(相同模型)
無資料增強:
訓練準確率: 95%
測試準確率: 72% ← 嚴重過擬合
準確率差距: 23%
有資料增強:
訓練準確率: 88%
測試準確率: 85% ← 良好泛化
準確率差距: 3%
針對 CIFAR-10 的挑戰,我們設計一個更深、更強的 CNN 架構。
| 特性 | SimpleCNN (MNIST) | DeepCIFAR (CIFAR-10) | 改進 |
|---|---|---|---|
| 卷積塊數 | 2 | 4 | 更深,學習更複雜特徵 |
| 濾波器數 | 32→64 | 64→128→256→512 | 漸進式增加容量 |
| Batch Normalization | 無 | 每層都有 | 加速訓練、穩定性 |
| 全局平均池化 | 無 | 有 | 減少參數、防過擬合 |
| Dropout 率 | 0.25, 0.5 | 0.3, 0.4, 0.5 | 更積極的正規化 |
| 參數量 | ~140K | ~1.2M | 容量提升應對複雜任務 |
| 感受野 | 小 | 大 | 能看到更大範圍的上下文 |
什麼是 Batch Normalization?
在每層之後,將輸出正規化到均值 0、標準差 1:
# 偽代碼
def batch_norm(x):
mean = x.mean()
std = x.std()
x_normalized = (x - mean) / std
# 可學習的縮放和平移參數
x_output = gamma * x_normalized + beta
return x_output
為什麼有效?
| 問題 | 無 BN | 有 BN |
|---|---|---|
| 內部協變量偏移 | 每層輸入分布不斷變化 | 穩定分布 |
| 梯度消失/爆炸 | 深層網路難訓練 | 梯度流動順暢 |
| 學習率敏感 | 需精細調整 | 可用較大學習率 |
| 訓練速度 | 慢 | 快 2-3 倍 |
| 正規化效果 | 需額外 Dropout | 本身有正規化作用 |
實際效果:
# CIFAR-10 實驗
無 BN:
- 20 epochs 達到 75% 準確率
- 最終: 78%
有 BN:
- 10 epochs 達到 80% 準確率 ← 快 2 倍
- 最終: 87% ← 高 9%
傳統方法:
# Flatten + Dense
x = Flatten()(x) # 8×8×512 → 32768
x = Dense(512)(x) # 32768 → 512
x = Dense(10)(x) # 512 → 10
參數量 = 32768×512 + 512×10 = 16,781,824 # 1600 萬!
GAP 方法:
# Global Average Pooling
x = GlobalAveragePooling2D()(x) # 8×8×512 → 512
x = Dense(10)(x) # 512 → 10
參數量 = 512×10 = 5,120 # 僅 5 千!
優勢:
以下是完整的 CIFAR-10 實作(Keras),可直接在 Colab 執行。
# ============================================
# CIFAR-10 + DeepCIFAR 完整實作(Keras)
# 執行環境:Google Colab (GPU)
# 預期訓練時間:15-20 分鐘
# 預期準確率:85-88%
# ============================================
# ========== 儲存格 1: 導入套件 ==========
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix, classification_report
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import (
Conv2D, MaxPooling2D, BatchNormalization, Dropout,
GlobalAveragePooling2D, Dense, Activation
)
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import (
EarlyStopping, ReduceLROnPlateau, ModelCheckpoint, LearningRateScheduler
)
print(f"TensorFlow 版本: {tf.__version__}")
print(f"GPU 可用: {len(tf.config.list_physical_devices('GPU')) > 0}")
# 設定隨機種子
np.random.seed(42)
tf.random.set_seed(42)
# ========== 儲存格 2: 載入並視覺化資料 ==========
print("載入 CIFAR-10 資料集...")
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
# 類別名稱
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck']
class_names_zh = ['飛機', '汽車', '鳥', '貓', '鹿',
'狗', '青蛙', '馬', '船', '卡車']
print(f"訓練集形狀: {X_train.shape}") # (50000, 32, 32, 3)
print(f"測試集形狀: {X_test.shape}") # (10000, 32, 32, 3)
# 視覺化每個類別的範例
fig, axes = plt.subplots(2, 5, figsize=(15, 7))
y_train_flat = y_train.flatten()
for i in range(10):
idx = np.where(y_train_flat == i)[0][0]
ax = axes[i//5, i%5]
ax.imshow(X_train[idx])
ax.set_title(f'{class_names[i]}\n{class_names_zh[i]}',
fontsize=12, fontweight='bold')
ax.axis('off')
plt.suptitle('CIFAR-10 十個類別範例', fontsize=16)
plt.tight_layout()
plt.show()
# ========== 儲存格 3: 資料預處理 ==========
# 正規化到 [0, 1]
X_train = X_train.astype('float32') / 255.0
X_test = X_test.astype('float32') / 255.0
# One-Hot 編碼
y_train_cat = keras.utils.to_categorical(y_train, 10)
y_test_cat = keras.utils.to_categorical(y_test, 10)
print(f"正規化後像素範圍: [{X_train.min():.2f}, {X_train.max():.2f}]")
print(f"標籤形狀: {y_train_cat.shape}")
# ========== 儲存格 4: 資料增強 ==========
datagen = ImageDataGenerator(
rotation_range=15, # 隨機旋轉 ±15 度
width_shift_range=0.1, # 水平平移 10%
height_shift_range=0.1, # 垂直平移 10%
horizontal_flip=True, # 水平翻轉
zoom_range=0.1, # 隨機縮放 10%
fill_mode='nearest' # 填充模式
)
# 計算統計量(用於 featurewise normalization,此處未使用)
datagen.fit(X_train)
print("✓ 資料增強設定完成")
# ========== 儲存格 5: 建立 DeepCIFAR 模型 ==========
def create_deep_cifar_model(input_shape=(32, 32, 3), num_classes=10):
"""
DeepCIFAR 架構
- 4 個卷積塊(每塊 2 層卷積 + BN + MaxPool + Dropout)
- Global Average Pooling
- Dense 輸出層
"""
model = Sequential(name='DeepCIFAR')
# ===== 卷積塊 1: 32×32 → 16×16 =====
model.add(Conv2D(64, (3, 3), padding='same', input_shape=input_shape))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.3))
# ===== 卷積塊 2: 16×16 → 8×8 =====
model.add(Conv2D(128, (3, 3), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Conv2D(128, (3, 3), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.4))
# ===== 卷積塊 3: 8×8 → 4×4 =====
model.add(Conv2D(256, (3, 3), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Conv2D(256, (3, 3), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.4))
# ===== 卷積塊 4: 4×4 → 2×2 =====
model.add(Conv2D(512, (3, 3), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Conv2D(512, (3, 3), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.5))
# ===== 全局平均池化 + 輸出 =====
model.add(GlobalAveragePooling2D())
model.add(Dense(num_classes, activation='softmax'))
return model
# 建立模型
model = create_deep_cifar_model()
# 顯示架構
model.summary()
# 計算參數量
total_params = model.count_params()
print(f"\n總參數量: {total_params:,}")
# ========== 儲存格 6: 編譯模型 ==========
# 學習率調度:Cosine Annealing
def cosine_annealing(epoch, lr):
"""
Cosine Annealing 學習率調度
"""
import math
max_epochs = 100
min_lr = 1e-6
max_lr = 0.001
if epoch < 10: # Warm-up
return max_lr * (epoch + 1) / 10
else:
progress = (epoch - 10) / (max_epochs - 10)
return min_lr + (max_lr - min_lr) * 0.5 * (1 + math.cos(math.pi * progress))
model.compile(
optimizer=Adam(learning_rate=0.001),
loss='categorical_crossentropy',
metrics=['accuracy']
)
print("✓ 模型編譯完成")
# ========== 儲存格 7: 設定 Callbacks ==========
callbacks = [
# Early Stopping
EarlyStopping(
monitor='val_accuracy',
patience=15,
restore_best_weights=True,
verbose=1
),
# 學習率調度
LearningRateScheduler(cosine_annealing, verbose=1),
# 儲存最佳模型
ModelCheckpoint(
'best_deepcifar.h5',
monitor='val_accuracy',
save_best_only=True,
verbose=1
)
]
# ========== 儲存格 8: 訓練模型 ==========
print("開始訓練...")
print("=" * 70)
BATCH_SIZE = 128
EPOCHS = 100
history = model.fit(
datagen.flow(X_train, y_train_cat, batch_size=BATCH_SIZE),
steps_per_epoch=len(X_train) // BATCH_SIZE,
epochs=EPOCHS,
validation_data=(X_test, y_test_cat),
callbacks=callbacks,
verbose=1
)
print("\n✓ 訓練完成!")
# ========== 儲存格 9: 評估模型 ==========
print("\n" + "=" * 70)
print("在測試集上評估模型")
print("=" * 70)
# 載入最佳模型
model.load_weights('best_deepcifar.h5')
test_loss, test_accuracy = model.evaluate(X_test, y_test_cat, verbose=0)
print(f"測試集損失: {test_loss:.4f}")
print(f"測試集準確率: {test_accuracy*100:.2f}%")
# ========== 儲存格 10: 視覺化訓練歷史 ==========
fig, axes = plt.subplots(1, 3, figsize=(18, 5))
# 準確率
axes[0].plot(history.history['accuracy'], 'b-', label='訓練準確率', linewidth=2)
axes[0].plot(history.history['val_accuracy'], 'r-', label='驗證準確率', linewidth=2)
axes[0].set_title('DeepCIFAR 訓練準確率', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('準確率')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
# 損失
axes[1].plot(history.history['loss'], 'b-', label='訓練損失', linewidth=2)
axes[1].plot(history.history['val_loss'], 'r-', label='驗證損失', linewidth=2)
axes[1].set_title('DeepCIFAR 訓練損失', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('損失值')
axes[1].legend()
axes[1].grid(True, alpha=0.3)
# 學習率
if 'lr' in history.history:
axes[2].plot(history.history['lr'], 'g-', linewidth=2)
axes[2].set_title('學習率調度', fontsize=14, fontweight='bold')
axes[2].set_xlabel('Epoch')
axes[2].set_ylabel('學習率')
axes[2].set_yscale('log')
axes[2].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# ========== 儲存格 11: 混淆矩陣 ==========
# 預測
y_pred = model.predict(X_test, verbose=0)
y_pred_classes = np.argmax(y_pred, axis=1)
y_true_classes = y_test.flatten()
# 混淆矩陣
cm = confusion_matrix(y_true_classes, y_pred_classes)
plt.figure(figsize=(12, 10))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
xticklabels=class_names_zh,
yticklabels=class_names_zh,
cbar_kws={'label': '數量'})
plt.title('DeepCIFAR 混淆矩陣', fontsize=16, fontweight='bold')
plt.xlabel('預測標籤')
plt.ylabel('真實標籤')
plt.xticks(rotation=45)
plt.yticks(rotation=0)
plt.tight_layout()
plt.show()
# 分類報告
print("\n分類報告:")
print(classification_report(y_true_classes, y_pred_classes,
target_names=class_names_zh))
# ========== 儲存格 12: 每個類別的準確率分析 ==========
print("\n每個類別的詳細分析:")
print("=" * 70)
for i in range(10):
mask = y_true_classes == i
total = mask.sum()
correct = (y_pred_classes[mask] == y_true_classes[mask]).sum()
accuracy = correct / total * 100
# 最常被誤判為哪個類別
wrong_mask = y_pred_classes[mask] != y_true_classes[mask]
if wrong_mask.sum() > 0:
wrong_preds = y_pred_classes[mask][wrong_mask]
most_common_wrong = np.bincount(wrong_preds).argmax()
wrong_name = class_names_zh[most_common_wrong]
else:
wrong_name = "無"
print(f"{class_names_zh[i]:4s}: {accuracy:5.2f}% ({correct:4d}/{total:4d}) "
f"最常誤判為: {wrong_name}")
# ========== 儲存格 13: 預測範例視覺化 ==========
# 隨機選擇 20 張測試影像
indices = np.random.choice(len(X_test), 20, replace=False)
fig, axes = plt.subplots(4, 5, figsize=(15, 12))
for i, ax in enumerate(axes.flat):
idx = indices[i]
# 原始影像
img = X_test[idx]
true_label = y_test[idx][0]
# 預測
img_batch = img[np.newaxis, ...]
pred = model.predict(img_batch, verbose=0)
pred_label = np.argmax(pred)
confidence = pred[0][pred_label] * 100
# 顯示
ax.imshow(img)
color = 'green' if pred_label == true_label else 'red'
ax.set_title(
f'真實: {class_names_zh[true_label]}\n'
f'預測: {class_names_zh[pred_label]}\n'
f'信心度: {confidence:.1f}%',
color=color, fontsize=9
)
ax.axis('off')
plt.suptitle('DeepCIFAR 預測結果(綠色=正確,紅色=錯誤)',
fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()
# ========== 儲存格 14: 錯誤案例分析 ==========
# 找出信心度高但預測錯誤的案例
wrong_indices = np.where(y_pred_classes != y_true_classes)[0]
wrong_confidences = np.max(y_pred[wrong_indices], axis=1)
# 排序,取信心度最高的 16 個錯誤
top_wrong_indices = wrong_indices[np.argsort(wrong_confidences)[-16:]]
fig, axes = plt.subplots(4, 4, figsize=(12, 12))
for i, ax in enumerate(axes.flat):
idx = top_wrong_indices[i]
img = X_test[idx]
true_label = y_test[idx][0]
pred_label = y_pred_classes[idx]
confidence = np.max(y_pred[idx]) * 100
ax.imshow(img)
ax.set_title(
f'真實: {class_names_zh[true_label]}\n'
f'預測: {class_names_zh[pred_label]}\n'
f'信心度: {confidence:.1f}%',
color='red', fontsize=9
)
ax.axis('off')
plt.suptitle('最自信但預測錯誤的案例(這些影像確實難以辨識)',
fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()
print("\n✓ 所有程式碼執行完成!")
以下是 PyTorch 版本的完整實作:
# ============================================
# CIFAR-10 + DeepCIFAR 完整實作(PyTorch)
# 執行環境:Google Colab (GPU)
# 預期訓練時間:15-20 分鐘
# 預期準確率:85-88%
# ============================================
# ========== 儲存格 1: 導入套件 ==========
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix, classification_report
from tqdm import tqdm
print(f"PyTorch 版本: {torch.__version__}")
print(f"CUDA 可用: {torch.cuda.is_available()}")
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"使用裝置: {device}")
# 設定隨機種子
torch.manual_seed(42)
if torch.cuda.is_available():
torch.cuda.manual_seed(42)
np.random.seed(42)
# ========== 儲存格 2: 資料轉換與增強 ==========
# 訓練集轉換(包含資料增強)
transform_train = transforms.Compose([
transforms.RandomCrop(32, padding=4), # 隨機裁剪(padding=4 後裁回 32)
transforms.RandomHorizontalFlip(), # 水平翻轉(p=0.5)
transforms.RandomRotation(15), # 隨機旋轉 ±15 度
transforms.ColorJitter( # 顏色抖動
brightness=0.2,
contrast=0.2,
saturation=0.2,
hue=0.1
),
transforms.ToTensor(),
transforms.Normalize( # 標準化
mean=[0.4914, 0.4822, 0.4465],
std=[0.2470, 0.2435, 0.2616]
)
])
# 測試集轉換(僅標準化)
transform_test = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(
mean=[0.4914, 0.4822, 0.4465],
std=[0.2470, 0.2435, 0.2616]
)
])
# 載入資料集
train_dataset = datasets.CIFAR10(
root='./data',
train=True,
download=True,
transform=transform_train
)
test_dataset = datasets.CIFAR10(
root='./data',
train=False,
download=True,
transform=transform_test
)
# 創建 DataLoader
BATCH_SIZE = 128
train_loader = DataLoader(
train_dataset,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=2,
pin_memory=True
)
test_loader = DataLoader(
test_dataset,
batch_size=BATCH_SIZE,
shuffle=False,
num_workers=2,
pin_memory=True
)
print(f"訓練集大小: {len(train_dataset)}")
print(f"測試集大小: {len(test_dataset)}")
print(f"批次數: 訓練={len(train_loader)}, 測試={len(test_loader)}")
# 類別名稱
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck']
class_names_zh = ['飛機', '汽車', '鳥', '貓', '鹿',
'狗', '青蛙', '馬', '船', '卡車']
# ========== 儲存格 3: 定義模型 ==========
class ConvBlock(nn.Module):
"""卷積塊:Conv → BN → ReLU → Conv → BN → ReLU → MaxPool → Dropout"""
def __init__(self, in_channels, out_channels, dropout_rate):
super(ConvBlock, self).__init__()
self.conv1 = nn.Conv2d(in_channels, out_channels, 3, padding=1)
self.bn1 = nn.BatchNorm2d(out_channels)
self.conv2 = nn.Conv2d(out_channels, out_channels, 3, padding=1)
self.bn2 = nn.BatchNorm2d(out_channels)
self.pool = nn.MaxPool2d(2, 2)
self.dropout = nn.Dropout(dropout_rate)
def forward(self, x):
x = F.relu(self.bn1(self.conv1(x)))
x = F.relu(self.bn2(self.conv2(x)))
x = self.pool(x)
x = self.dropout(x)
return x
class DeepCIFAR(nn.Module):
"""
DeepCIFAR 架構(PyTorch 版本)
- 4 個卷積塊
- Global Average Pooling
- 全連接輸出層
"""
def __init__(self, num_classes=10):
super(DeepCIFAR, self).__init__()
# 4 個卷積塊
self.block1 = ConvBlock(3, 64, dropout_rate=0.3) # 32 → 16
self.block2 = ConvBlock(64, 128, dropout_rate=0.4) # 16 → 8
self.block3 = ConvBlock(128, 256, dropout_rate=0.4) # 8 → 4
self.block4 = ConvBlock(256, 512, dropout_rate=0.5) # 4 → 2
# 全局平均池化(2×2 → 1×1)
self.gap = nn.AdaptiveAvgPool2d(1)
# 全連接層
self.fc = nn.Linear(512, num_classes)
def forward(self, x):
x = self.block1(x)
x = self.block2(x)
x = self.block3(x)
x = self.block4(x)
x = self.gap(x) # (N, 512, 2, 2) → (N, 512, 1, 1)
x = x.view(x.size(0), -1) # (N, 512, 1, 1) → (N, 512)
x = self.fc(x) # (N, 512) → (N, 10)
return x
# 創建模型
model = DeepCIFAR().to(device)
# 顯示模型
print(model)
print("\n" + "=" * 70)
# 計算參數量
def count_parameters(model):
return sum(p.numel() for p in model.parameters() if p.requires_grad)
total_params = count_parameters(model)
print(f"總參數量: {total_params:,}")
# 測試前向傳播
sample_input = torch.randn(2, 3, 32, 32).to(device)
sample_output = model(sample_input)
print(f"輸入形狀: {sample_input.shape}")
print(f"輸出形狀: {sample_output.shape}")
# ========== 儲存格 4: 定義損失函數、優化器、學習率調度 ==========
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-4)
# Cosine Annealing 學習率調度
scheduler = optim.lr_scheduler.CosineAnnealingLR(
optimizer,
T_max=100, # 最大 epoch
eta_min=1e-6 # 最小學習率
)
print("損失函數: CrossEntropyLoss")
print("優化器: Adam (lr=0.001, weight_decay=1e-4)")
print("學習率調度: CosineAnnealingLR")
# ========== 儲存格 5: 訓練與評估函數 ==========
def train_epoch(model, train_loader, criterion, optimizer, device):
"""訓練一個 epoch"""
model.train()
running_loss = 0.0
correct = 0
total = 0
pbar = tqdm(train_loader, desc='Training')
for data, target in pbar:
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
running_loss += loss.item()
_, predicted = torch.max(output.data, 1)
total += target.size(0)
correct += (predicted == target).sum().item()
# 更新進度條
pbar.set_postfix({
'loss': f'{running_loss/len(train_loader):.4f}',
'acc': f'{100.*correct/total:.2f}%'
})
epoch_loss = running_loss / len(train_loader)
epoch_acc = 100.0 * correct / total
return epoch_loss, epoch_acc
def evaluate(model, test_loader, criterion, device):
"""評估模型"""
model.eval()
running_loss = 0.0
correct = 0
total = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
loss = criterion(output, target)
running_loss += loss.item()
_, predicted = torch.max(output.data, 1)
total += target.size(0)
correct += (predicted == target).sum().item()
epoch_loss = running_loss / len(test_loader)
epoch_acc = 100.0 * correct / total
return epoch_loss, epoch_acc
# ========== 儲存格 6: 訓練迴圈 ==========
NUM_EPOCHS = 100
# 記錄歷史
history = {
'train_loss': [],
'train_acc': [],
'test_loss': [],
'test_acc': [],
'lr': []
}
print("開始訓練...")
print("=" * 70)
best_acc = 0.0
for epoch in range(1, NUM_EPOCHS + 1):
print(f"\nEpoch {epoch}/{NUM_EPOCHS}")
print("-" * 70)
# 訓練
train_loss, train_acc = train_epoch(
model, train_loader, criterion, optimizer, device
)
# 評估
test_loss, test_acc = evaluate(
model, test_loader, criterion, device
)
# 更新學習率
current_lr = optimizer.param_groups[0]['lr']
scheduler.step()
# 記錄歷史
history['train_loss'].append(train_loss)
history['train_acc'].append(train_acc)
history['test_loss'].append(test_loss)
history['test_acc'].append(test_acc)
history['lr'].append(current_lr)
# 儲存最佳模型
if test_acc > best_acc:
best_acc = test_acc
torch.save(model.state_dict(), 'best_deepcifar_pytorch.pth')
best_marker = '⭐ (Best)'
else:
best_marker = ''
# 顯示結果
print(f"Train Loss: {train_loss:.4f} | Train Acc: {train_acc:.2f}%")
print(f"Test Loss: {test_loss:.4f} | Test Acc: {test_acc:.2f}% {best_marker}")
print(f"Learning Rate: {current_lr:.6f}")
# 早停檢查(簡化版,實際可用 patience)
if epoch > 30 and test_acc < best_acc - 5:
print(f"\n早停:測試準確率下降超過 5%")
break
print("\n" + "=" * 70)
print(f"✓ 訓練完成!最佳測試準確率: {best_acc:.2f}%")
# ========== 儲存格 7: 視覺化訓練歷史 ==========
fig, axes = plt.subplots(1, 3, figsize=(18, 5))
# 準確率
axes[0].plot(history['train_acc'], 'b-', label='訓練準確率', linewidth=2)
axes[0].plot(history['test_acc'], 'r-', label='測試準確率', linewidth=2)
axes[0].set_title('DeepCIFAR 訓練準確率 (PyTorch)', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('準確率 (%)')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
# 損失
axes[1].plot(history['train_loss'], 'b-', label='訓練損失', linewidth=2)
axes[1].plot(history['test_loss'], 'r-', label='測試損失', linewidth=2)
axes[1].set_title('DeepCIFAR 訓練損失 (PyTorch)', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('損失值')
axes[1].legend()
axes[1].grid(True, alpha=0.3)
# 學習率
axes[2].plot(history['lr'], 'g-', linewidth=2)
axes[2].set_title('學習率調度 (Cosine Annealing)', fontsize=14, fontweight='bold')
axes[2].set_xlabel('Epoch')
axes[2].set_ylabel('學習率')
axes[2].set_yscale('log')
axes[2].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# ========== 儲存格 8: 詳細評估 ==========
# 載入最佳模型
model.load_state_dict(torch.load('best_deepcifar_pytorch.pth'))
model.eval()
# 收集所有預測
all_preds = []
all_targets = []
with torch.no_grad():
for data, target in test_loader:
data = data.to(device)
output = model(data)
_, predicted = torch.max(output.data, 1)
all_preds.extend(predicted.cpu().numpy())
all_targets.extend(target.numpy())
all_preds = np.array(all_preds)
all_targets = np.array(all_targets)
# 混淆矩陣
cm = confusion_matrix(all_targets, all_preds)
plt.figure(figsize=(12, 10))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
xticklabels=class_names_zh,
yticklabels=class_names_zh,
cbar_kws={'label': '數量'})
plt.title('DeepCIFAR 混淆矩陣 (PyTorch)', fontsize=16, fontweight='bold')
plt.xlabel('預測標籤')
plt.ylabel('真實標籤')
plt.xticks(rotation=45)
plt.yticks(rotation=0)
plt.tight_layout()
plt.show()
# 分類報告
print("\n分類報告:")
print(classification_report(all_targets, all_preds,
target_names=class_names_zh))
# 每個類別的準確率
print("\n每個類別的詳細分析:")
print("=" * 70)
for i in range(10):
mask = all_targets == i
total = mask.sum()
correct = (all_preds[mask] == all_targets[mask]).sum()
accuracy = correct / total * 100
print(f"{class_names_zh[i]:4s}: {accuracy:5.2f}% ({correct:4d}/{total:4d})")
print("\n✓ 所有程式碼執行完成!")
Keras 版本:
Epoch 50/100
391/391 [======] - 12s - loss: 0.3421 - accuracy: 0.8821 - val_loss: 0.4123 - val_accuracy: 0.8652
最終測試準確率: 86.52%
PyTorch 版本:
Epoch [50/100]
----------------------------------------------------------------------
Training: 100%|██████████| 391/391 [00:45<00:00, loss=0.3398, acc=88.35%]
Train Loss: 0.3398 | Train Acc: 88.35%
Test Loss: 0.4089 | Test Acc: 86.71% ⭐ (Best)
最終測試準確率: 86.71%
| 類別 | 準確率 | 難度 | 最常混淆對象 |
|---|---|---|---|
| 飛機 | 89.2% | ★★☆☆☆ | 船 |
| 汽車 | 93.5% | ★☆☆☆☆ | 卡車 |
| 鳥 | 82.1% | ★★★★☆ | 青蛙、飛機 |
| 貓 | 76.8% | ★★★★★ | 狗 |
| 鹿 | 85.3% | ★★★☆☆ | 馬 |
| 狗 | 78.5% | ★★★★★ | 貓 |
| 青蛙 | 88.7% | ★★☆☆☆ | 鳥 |
| 馬 | 90.1% | ★★☆☆☆ | 鹿 |
| 船 | 91.8% | ★★☆☆☆ | 飛機 |
| 卡車 | 89.4% | ★★☆☆☆ | 汽車 |
觀察:
- 最容易:汽車 (93.5%) - 輪廓清晰
- 最困難:貓 (76.8%) vs 狗 (78.5%) - 高度相似
| 指標 | MNIST | CIFAR-10 | 比例 |
|---|---|---|---|
| 訓練時間 | 3 分鐘 | 20 分鐘 | ×7 |
| 參數量 | 140K | 1.2M | ×8.6 |
| 準確率(簡單模型) | 99.4% | 70% | |
| 準確率(進階模型) | 99.4% | 87% | |
| 提升空間 | 0.4% | 13% | |
| 人類表現 | 99.8% | 94% | |
| SOTA | 99.8% | 99%+ |
需要更深的網路、更強的正規化
關鍵技術:
Global Average Pooling:減少參數、防過擬合
框架對比:
現在你已經掌握了 CNN 的核心技術,準備好進階挑戰了嗎?
請繼續閱讀:CNN_intro_b07_part4.md - 實戰技巧與疑難排解
本文件完成時間:2025-10-07 15:00:00
版本:b07_part3
下一部分:CNN_intro_b07_part4.md (實戰技巧與總結)