このリポジトリには、tensorflow-gpu-1.5.0または他の互換性のあるバージョンのtensorflowが必要です。
タイプ別に簡単に実行できます
python cifar10.py
一般的な畳み込みニューラルネットワークでは、サンプリングはほぼユビキタスであり、以前はmax_poolingでしたが、現在は畳み込みにまたがっています。例としてvggネットワークを取り上げます。これは、非常に多くのmax_poolingを使用します。
入力側は以下のとおりです。ネットワークで多くの2x2プーリングが使用されていることがわかります。また、セマンティックセグメンテーションまたはオブジェクト検出を行う場合、かなり多くのアップサンプリングまたは転置畳み込みを使用します。 典型的なfcn構造、赤で区別されるデコボリューションに注意してください以前は、分類ネットワークの最後の数層でfcを使用していました。その後、fcのパラメーターが多すぎて、一般化のパフォーマンスが低いことが判明しました。これはグローバル平均プーリングに置き換えられ、ネットワーク内のネットワークに最初に登場しました。 GAPは、空間的特徴を直接スカラーに集約します。それ以来、分類ネットワークのパラダイムは次のように動作します(Reluは、ショートカットを考慮せずにconvとdeconvに統合されています)。Input-->Conv-->DownSample_x_2-->Conv-->DownSample_x_2-->Conv-->DownSample_x_2-->GAP-->Conv1x1-->Softmax-->Output
そして、セマンティックセグメンテーションネットワークのパラダイムは次のように動作します。
Input-->Conv-->DownSample_x_2-->Conv-->DownSample_x_2-->Conv-->DownSample_x_2-->Deconv_x_2-->Deconv_x_2-->Deconv_x_2-->Softmax-->Output
しかし、私たちはそれについて考えなければなりません。ダウンサンプリングとアップサンプリングは本当に必要ですか?取り外すことはできませんか?
cifar10の分類タスクで、ダウンサンプリングを削除し、畳み込みを拡張畳み込みに変更しようとしました。ダイヤルレートはそれぞれ増加しました。モデル構造を以下に示します。
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 32, 32, 16) 448
_________________________________________________________________
batch_normalization (BatchNo (None, 32, 32, 16) 64
_________________________________________________________________
activation (Activation) (None, 32, 32, 16) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 32, 32, 24) 3480
_________________________________________________________________
batch_normalization_1 (Batch (None, 32, 32, 24) 96
_________________________________________________________________
activation_1 (Activation) (None, 32, 32, 24) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 32, 32, 32) 6944
_________________________________________________________________
batch_normalization_2 (Batch (None, 32, 32, 32) 128
_________________________________________________________________
activation_2 (Activation) (None, 32, 32, 32) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 32, 32, 48) 13872
_________________________________________________________________
batch_normalization_3 (Batch (None, 32, 32, 48) 192
_________________________________________________________________
activation_3 (Activation) (None, 32, 32, 48) 0
_________________________________________________________________
global_average_pooling2d (Gl (None, 48) 0
_________________________________________________________________
dense (Dense) (None, 10) 490
_________________________________________________________________
activation_4 (Activation) (None, 10) 0
=================================================================
80回のトレーニングの後、最終的に次の分類結果が得られました。
時代 | 損失 | val_accuracy |
---|---|---|
10 | 0.9200 | 0.6346 |
20 | 0.7925 | 0.6769 |
30 | 0.7293 | 0.7193 |
40 | 0.6737 | 0.7479 |
50 | 0.6516 | 0.7470 |
60 | 0.6311 | 0.7678 |
70 | 0.6085 | 0.7478 |
80 | 0.5865 | 0.7665 |
検証データセットの精度曲線を以下に示します。
最終的な正解率は76%に達しました。同じパラメータを持つvgg構造を持つ畳み込みネットワークの正解率は基本的にこれに近いです。
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 32, 32, 3)] 0
__________________________________________________________________________________________________
conv2d (Conv2D) (None, 32, 32, 16) 448 input_1[0][0]
__________________________________________________________________________________________________
batch_normalization (BatchNorma (None, 32, 32, 16) 64 conv2d[0][0]
__________________________________________________________________________________________________
activation (Activation) (None, 32, 32, 16) 0 batch_normalization[0][0]
__________________________________________________________________________________________________
conv2d_1 (Conv2D) (None, 32, 32, 16) 2320 activation[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 32, 32, 16) 64 conv2d_1[0][0]
__________________________________________________________________________________________________
activation_1 (Activation) (None, 32, 32, 16) 0 batch_normalization_1[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D) (None, 32, 32, 16) 2320 activation_1[0][0]
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 32, 32, 16) 64 conv2d_2[0][0]
__________________________________________________________________________________________________
add (Add) (None, 32, 32, 16) 0 activation[0][0]
batch_normalization_2[0][0]
__________________________________________________________________________________________________
activation_2 (Activation) (None, 32, 32, 16) 0 add[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D) (None, 32, 32, 16) 2320 activation_2[0][0]
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 32, 32, 16) 64 conv2d_3[0][0]
__________________________________________________________________________________________________
activation_3 (Activation) (None, 32, 32, 16) 0 batch_normalization_3[0][0]
__________________________________________________________________________________________________
conv2d_4 (Conv2D) (None, 32, 32, 16) 2320 activation_3[0][0]
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 32, 32, 16) 64 conv2d_4[0][0]
__________________________________________________________________________________________________
add_1 (Add) (None, 32, 32, 16) 0 activation_2[0][0]
batch_normalization_4[0][0]
__________________________________________________________________________________________________
activation_4 (Activation) (None, 32, 32, 16) 0 add_1[0][0]
__________________________________________________________________________________________________
conv2d_5 (Conv2D) (None, 32, 32, 16) 2320 activation_4[0][0]
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, 32, 32, 16) 64 conv2d_5[0][0]
__________________________________________________________________________________________________
activation_5 (Activation) (None, 32, 32, 16) 0 batch_normalization_5[0][0]
__________________________________________________________________________________________________
conv2d_6 (Conv2D) (None, 32, 32, 16) 2320 activation_5[0][0]
__________________________________________________________________________________________________
batch_normalization_6 (BatchNor (None, 32, 32, 16) 64 conv2d_6[0][0]
__________________________________________________________________________________________________
add_2 (Add) (None, 32, 32, 16) 0 activation_4[0][0]
batch_normalization_6[0][0]
__________________________________________________________________________________________________
activation_6 (Activation) (None, 32, 32, 16) 0 add_2[0][0]
__________________________________________________________________________________________________
conv2d_7 (Conv2D) (None, 32, 32, 16) 2320 activation_6[0][0]
__________________________________________________________________________________________________
batch_normalization_7 (BatchNor (None, 32, 32, 16) 64 conv2d_7[0][0]
__________________________________________________________________________________________________
activation_7 (Activation) (None, 32, 32, 16) 0 batch_normalization_7[0][0]
__________________________________________________________________________________________________
conv2d_8 (Conv2D) (None, 32, 32, 16) 2320 activation_7[0][0]
__________________________________________________________________________________________________
batch_normalization_8 (BatchNor (None, 32, 32, 16) 64 conv2d_8[0][0]
__________________________________________________________________________________________________
add_3 (Add) (None, 32, 32, 16) 0 activation_6[0][0]
batch_normalization_8[0][0]
__________________________________________________________________________________________________
activation_8 (Activation) (None, 32, 32, 16) 0 add_3[0][0]
__________________________________________________________________________________________________
conv2d_9 (Conv2D) (None, 32, 32, 16) 2320 activation_8[0][0]
__________________________________________________________________________________________________
batch_normalization_9 (BatchNor (None, 32, 32, 16) 64 conv2d_9[0][0]
__________________________________________________________________________________________________
activation_9 (Activation) (None, 32, 32, 16) 0 batch_normalization_9[0][0]
__________________________________________________________________________________________________
conv2d_10 (Conv2D) (None, 32, 32, 16) 2320 activation_9[0][0]
__________________________________________________________________________________________________
batch_normalization_10 (BatchNo (None, 32, 32, 16) 64 conv2d_10[0][0]
__________________________________________________________________________________________________
add_4 (Add) (None, 32, 32, 16) 0 activation_8[0][0]
batch_normalization_10[0][0]
__________________________________________________________________________________________________
activation_10 (Activation) (None, 32, 32, 16) 0 add_4[0][0]
__________________________________________________________________________________________________
conv2d_11 (Conv2D) (None, 32, 32, 16) 2320 activation_10[0][0]
__________________________________________________________________________________________________
batch_normalization_11 (BatchNo (None, 32, 32, 16) 64 conv2d_11[0][0]
__________________________________________________________________________________________________
activation_11 (Activation) (None, 32, 32, 16) 0 batch_normalization_11[0][0]
__________________________________________________________________________________________________
conv2d_12 (Conv2D) (None, 32, 32, 16) 2320 activation_11[0][0]
__________________________________________________________________________________________________
batch_normalization_12 (BatchNo (None, 32, 32, 16) 64 conv2d_12[0][0]
__________________________________________________________________________________________________
add_5 (Add) (None, 32, 32, 16) 0 activation_10[0][0]
batch_normalization_12[0][0]
__________________________________________________________________________________________________
activation_12 (Activation) (None, 32, 32, 16) 0 add_5[0][0]
__________________________________________________________________________________________________
conv2d_13 (Conv2D) (None, 32, 32, 32) 4640 activation_12[0][0]
__________________________________________________________________________________________________
batch_normalization_13 (BatchNo (None, 32, 32, 32) 128 conv2d_13[0][0]
__________________________________________________________________________________________________
activation_13 (Activation) (None, 32, 32, 32) 0 batch_normalization_13[0][0]
__________________________________________________________________________________________________
conv2d_14 (Conv2D) (None, 32, 32, 32) 9248 activation_13[0][0]
__________________________________________________________________________________________________
conv2d_15 (Conv2D) (None, 32, 32, 32) 4640 activation_12[0][0]
__________________________________________________________________________________________________
batch_normalization_14 (BatchNo (None, 32, 32, 32) 128 conv2d_14[0][0]
__________________________________________________________________________________________________
add_6 (Add) (None, 32, 32, 32) 0 conv2d_15[0][0]
batch_normalization_14[0][0]
__________________________________________________________________________________________________
activation_14 (Activation) (None, 32, 32, 32) 0 add_6[0][0]
__________________________________________________________________________________________________
conv2d_16 (Conv2D) (None, 32, 32, 32) 9248 activation_14[0][0]
__________________________________________________________________________________________________
batch_normalization_15 (BatchNo (None, 32, 32, 32) 128 conv2d_16[0][0]
__________________________________________________________________________________________________
activation_15 (Activation) (None, 32, 32, 32) 0 batch_normalization_15[0][0]
__________________________________________________________________________________________________
conv2d_17 (Conv2D) (None, 32, 32, 32) 9248 activation_15[0][0]
__________________________________________________________________________________________________
batch_normalization_16 (BatchNo (None, 32, 32, 32) 128 conv2d_17[0][0]
__________________________________________________________________________________________________
add_7 (Add) (None, 32, 32, 32) 0 activation_14[0][0]
batch_normalization_16[0][0]
__________________________________________________________________________________________________
activation_16 (Activation) (None, 32, 32, 32) 0 add_7[0][0]
__________________________________________________________________________________________________
conv2d_18 (Conv2D) (None, 32, 32, 32) 9248 activation_16[0][0]
__________________________________________________________________________________________________
batch_normalization_17 (BatchNo (None, 32, 32, 32) 128 conv2d_18[0][0]
__________________________________________________________________________________________________
activation_17 (Activation) (None, 32, 32, 32) 0 batch_normalization_17[0][0]
__________________________________________________________________________________________________
conv2d_19 (Conv2D) (None, 32, 32, 32) 9248 activation_17[0][0]
__________________________________________________________________________________________________
batch_normalization_18 (BatchNo (None, 32, 32, 32) 128 conv2d_19[0][0]
__________________________________________________________________________________________________
add_8 (Add) (None, 32, 32, 32) 0 activation_16[0][0]
batch_normalization_18[0][0]
__________________________________________________________________________________________________
activation_18 (Activation) (None, 32, 32, 32) 0 add_8[0][0]
__________________________________________________________________________________________________
conv2d_20 (Conv2D) (None, 32, 32, 32) 9248 activation_18[0][0]
__________________________________________________________________________________________________
batch_normalization_19 (BatchNo (None, 32, 32, 32) 128 conv2d_20[0][0]
__________________________________________________________________________________________________
activation_19 (Activation) (None, 32, 32, 32) 0 batch_normalization_19[0][0]
__________________________________________________________________________________________________
conv2d_21 (Conv2D) (None, 32, 32, 32) 9248 activation_19[0][0]
__________________________________________________________________________________________________
batch_normalization_20 (BatchNo (None, 32, 32, 32) 128 conv2d_21[0][0]
__________________________________________________________________________________________________
add_9 (Add) (None, 32, 32, 32) 0 activation_18[0][0]
batch_normalization_20[0][0]
__________________________________________________________________________________________________
activation_20 (Activation) (None, 32, 32, 32) 0 add_9[0][0]
__________________________________________________________________________________________________
conv2d_22 (Conv2D) (None, 32, 32, 32) 9248 activation_20[0][0]
__________________________________________________________________________________________________
batch_normalization_21 (BatchNo (None, 32, 32, 32) 128 conv2d_22[0][0]
__________________________________________________________________________________________________
activation_21 (Activation) (None, 32, 32, 32) 0 batch_normalization_21[0][0]
__________________________________________________________________________________________________
conv2d_23 (Conv2D) (None, 32, 32, 32) 9248 activation_21[0][0]
__________________________________________________________________________________________________
batch_normalization_22 (BatchNo (None, 32, 32, 32) 128 conv2d_23[0][0]
__________________________________________________________________________________________________
add_10 (Add) (None, 32, 32, 32) 0 activation_20[0][0]
batch_normalization_22[0][0]
__________________________________________________________________________________________________
activation_22 (Activation) (None, 32, 32, 32) 0 add_10[0][0]
__________________________________________________________________________________________________
conv2d_24 (Conv2D) (None, 32, 32, 32) 9248 activation_22[0][0]
__________________________________________________________________________________________________
batch_normalization_23 (BatchNo (None, 32, 32, 32) 128 conv2d_24[0][0]
__________________________________________________________________________________________________
activation_23 (Activation) (None, 32, 32, 32) 0 batch_normalization_23[0][0]
__________________________________________________________________________________________________
conv2d_25 (Conv2D) (None, 32, 32, 32) 9248 activation_23[0][0]
__________________________________________________________________________________________________
batch_normalization_24 (BatchNo (None, 32, 32, 32) 128 conv2d_25[0][0]
__________________________________________________________________________________________________
add_11 (Add) (None, 32, 32, 32) 0 activation_22[0][0]
batch_normalization_24[0][0]
__________________________________________________________________________________________________
activation_24 (Activation) (None, 32, 32, 32) 0 add_11[0][0]
__________________________________________________________________________________________________
conv2d_26 (Conv2D) (None, 32, 32, 64) 18496 activation_24[0][0]
__________________________________________________________________________________________________
batch_normalization_25 (BatchNo (None, 32, 32, 64) 256 conv2d_26[0][0]
__________________________________________________________________________________________________
activation_25 (Activation) (None, 32, 32, 64) 0 batch_normalization_25[0][0]
__________________________________________________________________________________________________
conv2d_27 (Conv2D) (None, 32, 32, 64) 36928 activation_25[0][0]
__________________________________________________________________________________________________
conv2d_28 (Conv2D) (None, 32, 32, 64) 18496 activation_24[0][0]
__________________________________________________________________________________________________
batch_normalization_26 (BatchNo (None, 32, 32, 64) 256 conv2d_27[0][0]
__________________________________________________________________________________________________
add_12 (Add) (None, 32, 32, 64) 0 conv2d_28[0][0]
batch_normalization_26[0][0]
__________________________________________________________________________________________________
activation_26 (Activation) (None, 32, 32, 64) 0 add_12[0][0]
__________________________________________________________________________________________________
conv2d_29 (Conv2D) (None, 32, 32, 64) 36928 activation_26[0][0]
__________________________________________________________________________________________________
batch_normalization_27 (BatchNo (None, 32, 32, 64) 256 conv2d_29[0][0]
__________________________________________________________________________________________________
activation_27 (Activation) (None, 32, 32, 64) 0 batch_normalization_27[0][0]
__________________________________________________________________________________________________
conv2d_30 (Conv2D) (None, 32, 32, 64) 36928 activation_27[0][0]
__________________________________________________________________________________________________
batch_normalization_28 (BatchNo (None, 32, 32, 64) 256 conv2d_30[0][0]
__________________________________________________________________________________________________
add_13 (Add) (None, 32, 32, 64) 0 activation_26[0][0]
batch_normalization_28[0][0]
__________________________________________________________________________________________________
activation_28 (Activation) (None, 32, 32, 64) 0 add_13[0][0]
__________________________________________________________________________________________________
conv2d_31 (Conv2D) (None, 32, 32, 64) 36928 activation_28[0][0]
__________________________________________________________________________________________________
batch_normalization_29 (BatchNo (None, 32, 32, 64) 256 conv2d_31[0][0]
__________________________________________________________________________________________________
activation_29 (Activation) (None, 32, 32, 64) 0 batch_normalization_29[0][0]
__________________________________________________________________________________________________
conv2d_32 (Conv2D) (None, 32, 32, 64) 36928 activation_29[0][0]
__________________________________________________________________________________________________
batch_normalization_30 (BatchNo (None, 32, 32, 64) 256 conv2d_32[0][0]
__________________________________________________________________________________________________
add_14 (Add) (None, 32, 32, 64) 0 activation_28[0][0]
batch_normalization_30[0][0]
__________________________________________________________________________________________________
activation_30 (Activation) (None, 32, 32, 64) 0 add_14[0][0]
__________________________________________________________________________________________________
conv2d_33 (Conv2D) (None, 32, 32, 64) 36928 activation_30[0][0]
__________________________________________________________________________________________________
batch_normalization_31 (BatchNo (None, 32, 32, 64) 256 conv2d_33[0][0]
__________________________________________________________________________________________________
activation_31 (Activation) (None, 32, 32, 64) 0 batch_normalization_31[0][0]
__________________________________________________________________________________________________
conv2d_34 (Conv2D) (None, 32, 32, 64) 36928 activation_31[0][0]
__________________________________________________________________________________________________
batch_normalization_32 (BatchNo (None, 32, 32, 64) 256 conv2d_34[0][0]
__________________________________________________________________________________________________
add_15 (Add) (None, 32, 32, 64) 0 activation_30[0][0]
batch_normalization_32[0][0]
__________________________________________________________________________________________________
activation_32 (Activation) (None, 32, 32, 64) 0 add_15[0][0]
__________________________________________________________________________________________________
conv2d_35 (Conv2D) (None, 32, 32, 64) 36928 activation_32[0][0]
__________________________________________________________________________________________________
batch_normalization_33 (BatchNo (None, 32, 32, 64) 256 conv2d_35[0][0]
__________________________________________________________________________________________________
activation_33 (Activation) (None, 32, 32, 64) 0 batch_normalization_33[0][0]
__________________________________________________________________________________________________
conv2d_36 (Conv2D) (None, 32, 32, 64) 36928 activation_33[0][0]
__________________________________________________________________________________________________
batch_normalization_34 (BatchNo (None, 32, 32, 64) 256 conv2d_36[0][0]
__________________________________________________________________________________________________
add_16 (Add) (None, 32, 32, 64) 0 activation_32[0][0]
batch_normalization_34[0][0]
__________________________________________________________________________________________________
activation_34 (Activation) (None, 32, 32, 64) 0 add_16[0][0]
__________________________________________________________________________________________________
conv2d_37 (Conv2D) (None, 32, 32, 64) 36928 activation_34[0][0]
__________________________________________________________________________________________________
batch_normalization_35 (BatchNo (None, 32, 32, 64) 256 conv2d_37[0][0]
__________________________________________________________________________________________________
activation_35 (Activation) (None, 32, 32, 64) 0 batch_normalization_35[0][0]
__________________________________________________________________________________________________
conv2d_38 (Conv2D) (None, 32, 32, 64) 36928 activation_35[0][0]
__________________________________________________________________________________________________
batch_normalization_36 (BatchNo (None, 32, 32, 64) 256 conv2d_38[0][0]
__________________________________________________________________________________________________
add_17 (Add) (None, 32, 32, 64) 0 activation_34[0][0]
batch_normalization_36[0][0]
__________________________________________________________________________________________________
activation_36 (Activation) (None, 32, 32, 64) 0 add_17[0][0]
__________________________________________________________________________________________________
global_average_pooling2d (Globa (None, 64) 0 activation_36[0][0]
__________________________________________________________________________________________________
flatten (Flatten) (None, 64) 0 global_average_pooling2d[0][0]
__________________________________________________________________________________________________
dense (Dense) (None, 10) 650 flatten[0][0]
==================================================================================================
これは私たちに考えさせました、サンプリングは本当に必要ですか?もちろん、エンジニアリングの観点からは、サンプリングによってフィーチャマップのサイズを大幅に削減できるため、計算量を大幅に削減できます。ただし、この実験面では、サンプリングは畳み込みニューラルネットワークのパフォーマンスの向上には役立ちません。最大プーリングにはノイズを抑制する効果があるので便利ですが、最大プーリングは、従来のメディアンファイラーと同じように、ダウンサンプリングなしで実装することもできます。
これは、各畳み込み層が空間相関のエンコードに使用され、浅い特徴が短距離相関をエンコードし、より深い畳み込み層が長距離空間相関をエンコードすることも示しています。特定のレベルでは、統計的な意味での空間相関はなくなります(これは、画像内の意味のあるオブジェクトのサイズによって異なります)。このレイヤーでは、GAPを使用して空間フィーチャを集約できます。
サンプリング層がないと、分類ネットワークのパラダイムは次のようになります。
Input-->Conv(dilate_rate=1)-->Conv(dilate_rate=2)-->Conv(dilate_rate=4)-->Conv(dilate_rate=8)-->GAP-->Conv1x1-->Softmax-->Output
セマンティックセグメンテーションネットワークのパラダイムは次のようになります。
Input-->Conv(dilate_rate=1)-->Conv(dilate_rate=2)-->Conv(dilate_rate=4)-->Conv(dilate_rate=8)-->Conv(dilate_rate=4)-->Conv(dilate_rate=2)-->Conv(dilate_rate=1)-->Softmax-->Output
私の知る限り、画像の分類とセグメンテーションにグローバル平均プールと組み合わせた拡張畳み込みを使用した最初の人でした。パフォーマンスの向上がない場合でも(ただし、基本的に問題はありません)。拡張畳み込みは必要ないことに注意してください。より大きなカーネルサイズの畳み込みはそれを置き換えることができますが、これは必然的により多くのパラメーターを導入し、過剰適合につながる可能性があります。同様のアイデアがdeeplabの論文[セマンティック画像セグメンテーションのためのAtrousConvolutionの再考]に最初に登場しました:https://arxiv.org/abs/1706.05587
この記事では、拡張畳み込みは主に、ネットワークの最後の数層のダウンサンプリング操作と対応するフィルターカーネルのアップサンプリング操作を削除することで、新しい学習パラメーターを追加せずに、よりコンパクトな特徴を抽出するために使用されます。
シリアル方式で設計された大胆な畳み込みモジュールは、block4などのResNetの最後のブロックをコピーし、コピーされたブロックをシリアル方式でカスケードします。