TensorFlow课程05 通过生成器创建图像数据集

在使用Fashion_minist数据集时,所有的图像都被处理的很标准,大小都是相同的尺寸,

加载数据

下载horse-or-human数据集:

1
2
3
wget --no-check-certificate \
https://storage.googleapis.com/laurencemoroney-blog.appspot.com/horse-or-human.zip \
-O /tmp/horse-or-human.zip

解压文件:

1
2
3
4
5
6
7
import os
import zipfile

local_zip = '/tmp/horse-or-human.zip'
zip_ref = zipfile.ZipFile(local_zip, 'r')
zip_ref.extractall('/tmp/horse-or-human')
zip_ref.close()

创建对应的图片目录:

1
2
3
4
5
# Directory with our training horse pictures
train_horse_dir = os.path.join('/tmp/horse-or-human/horses')

# Directory with our training human pictures
train_human_dir = os.path.join('/tmp/horse-or-human/humans')

获取所有的图片名:

1
2
3
4
5
train_horse_names = os.listdir(train_horse_dir)
print(train_horse_names[:10])

train_human_names = os.listdir(train_human_dir)
print(train_human_names[:10])

让我们找出目录中马和人类图像的总数:

1
2
print('total training horse images:', len(os.listdir(train_horse_dir)))
print('total training human images:', len(os.listdir(train_human_dir)))

total training horse images: 500

total training human images: 527

现在让我们来看一些照片,以便更好地了解它们的样子。首先,配置matplot参数:

1
2
3
4
5
6
7
8
9
10
11
%matplotlib inline

import matplotlib.pyplot as plt
import matplotlib.image as mpimg

# Parameters for our graph; we'll output images in a 4x4 configuration
nrows = 4
ncols = 4

# Index for iterating over images
pic_index = 0

现在,展示一批8匹马和8张人类照片。您可以重新运行该单元,以便每次都能看到新批次:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# Set up matplotlib fig, and size it to fit 4x4 pics
fig = plt.gcf()
fig.set_size_inches(ncols * 4, nrows * 4)

pic_index += 8
next_horse_pix = [os.path.join(train_horse_dir, fname)
for fname in train_horse_names[pic_index-8:pic_index]]
next_human_pix = [os.path.join(train_human_dir, fname)
for fname in train_human_names[pic_index-8:pic_index]]

for i, img_path in enumerate(next_horse_pix+next_human_pix):
# Set up subplot; subplot indices start at 1
sp = plt.subplot(nrows, ncols, i + 1)
sp.axis('Off') # Don't show axes (or gridlines)

img = mpimg.imread(img_path)
plt.imshow(img)

plt.show()

开始构建模型

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import tensorflow as tf

# 定义我们的模型 注意最后一层我们用了一个sigmoid激活函数的二分类器
model = tf.keras.models.Sequential([
# Note the input shape is the desired size of the image 300x300 with 3 bytes color
# This is the first convolution
tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(300, 300, 3)),
tf.keras.layers.MaxPooling2D(2, 2),
# The second convolution
tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# The third convolution
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# The fourth convolution
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# The fifth convolution
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# Flatten the results to feed into a DNN
tf.keras.layers.Flatten(),
# 512 neuron hidden layer
tf.keras.layers.Dense(512, activation='relu'),
# Only 1 output neuron. It will contain a value from 0-1 where 0 for 1 class ('horses') and 1 for the other ('humans')
tf.keras.layers.Dense(1, activation='sigmoid')
])

通过model.summary()函数查看模型的概要信息

1
model.summary()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 298, 298, 16) 448
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 149, 149, 16) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 147, 147, 32) 4640
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 73, 73, 32) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 71, 71, 64) 18496
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 35, 35, 64) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 33, 33, 64) 36928
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 16, 16, 64) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 14, 14, 64) 36928
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 7, 7, 64) 0
_________________________________________________________________
flatten (Flatten) (None, 3136) 0
_________________________________________________________________
dense (Dense) (None, 512) 1606144
_________________________________________________________________
dense_1 (Dense) (None, 1) 513
=================================================================
Total params: 1,704,097
Trainable params: 1,704,097
Non-trainable params: 0
_________________________________________________________________

接下来我们使用一个学习率为0,001的RMSProp优化器来训练我们的模型。

注意:RMSProp优化后的梯度下降会比原始的随机梯度下降更好,因为它会自动的调节学习率。(其它的例如AdamAdagrad优化器也有类似的功能)

1
2
3
4
5
from tensorflow.keras.optimizers import RMSprop

model.compile(loss='binary_crossentropy',
optimizer=RMSprop(lr=0.001),
metrics=['acc'])

数据处理

我们开始设置我们的生成器,通过从我们指定的目录下读取图片,并将其转化为float32的tensor,并且将其(带有标签)喂入我们的网络。我们有一个用来生成训练集图片的生成器和一个用于生成验证机图片的生成器。我们的生成器将批量的产出尺寸为300x300的并带有标签的图片。

正如你可能已经知道的,进入神经网络的数据通常应该以某种方式标准化,以使其更容易被网络处理。(将原始像素馈入convnet是不常见的。)在本例中,我们将通过将像素值归一化到[0,1]范围(最初所有值都在[0,255]范围内)来预处理图像。

在Keras中,这可以通过使用keras.preprocessing.image.ImageDataGenerator这个类的rescale参数来完成。该ImageDataGenerator类允许您通过.flow(data,labels)或者.flow_from_directory(directory)方法来实例化一个用来批量生成图片的生成器。然后,这些生成器可以与接受数据生成器作为输入的Keras模型方法一起使用,例如:fit_generatorevaluate_generatorpredict_generator

1
2
3
4
5
6
7
8
9
10
11
12
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# All images will be rescaled by 1./255
train_datagen = ImageDataGenerator(rescale=1/255)

# Flow training images in batches of 128 using train_datagen generator
train_generator = train_datagen.flow_from_directory(
'/tmp/horse-or-human/', # This is the source directory for training images
target_size=(300, 300), # All images will be resized to 300x300
batch_size=128,
# Since we use binary_crossentropy loss, we need binary labels
class_mode='binary')

训练

让我们来训练15个epochs,这可能需要花费几分钟的时间。

请注意观察每个epoch的数值变换。

损失值和准确率是训练进步的重要标志。它正在猜测训练数据的分类,然后根据已知标签进行测量,计算结果。准确性是正确猜测的一部分。

1
2
3
4
5
history = model.fit_generator(
train_generator,
steps_per_epoch=8,
epochs=15,
verbose=1)

输出结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Epoch 1/15
8/8 [==============================] - 10s 1s/step - loss: 1.3253 - acc: 0.5539
Epoch 2/15
8/8 [==============================] - 6s 750ms/step - loss: 1.1127 - acc: 0.6855
Epoch 3/15
8/8 [==============================] - 5s 639ms/step - loss: 0.5517 - acc: 0.6344
Epoch 4/15
8/8 [==============================] - 6s 728ms/step - loss: 0.8176 - acc: 0.7586
Epoch 5/15
8/8 [==============================] - 7s 828ms/step - loss: 0.3862 - acc: 0.8760
Epoch 6/15
8/8 [==============================] - 5s 630ms/step - loss: 0.3092 - acc: 0.8656
Epoch 7/15
8/8 [==============================] - 7s 845ms/step - loss: 0.2193 - acc: 0.9248
Epoch 8/15
8/8 [==============================] - 6s 730ms/step - loss: 0.2179 - acc: 0.9188
Epoch 9/15
8/8 [==============================] - 6s 724ms/step - loss: 0.0841 - acc: 0.9744
Epoch 10/15
8/8 [==============================] - 6s 739ms/step - loss: 0.3503 - acc: 0.8343
Epoch 11/15
8/8 [==============================] - 6s 744ms/step - loss: 0.2504 - acc: 0.9188
Epoch 12/15
8/8 [==============================] - 6s 745ms/step - loss: 0.3888 - acc: 0.8954
Epoch 13/15
8/8 [==============================] - 7s 831ms/step - loss: 0.1144 - acc: 0.9600
Epoch 14/15
8/8 [==============================] - 5s 633ms/step - loss: 0.0668 - acc: 0.9793
Epoch 15/15
8/8 [==============================] - 7s 835ms/step - loss: 0.0396 - acc: 0.9863

运行模型

现在让我们来看看使用该模型实际运行预测的情况。这段代码将允许您从文件系统中选择一个或多个文件,然后上传它们,并在模型中运行它们,给出对象是马还是人的指示。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import numpy as np
from google.colab import files
from keras.preprocessing import image

uploaded = files.upload()

for fn in uploaded.keys():

# predicting images
path = '/content/' + fn
img = image.load_img(path, target_size=(300, 300))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)

images = np.vstack([x])
classes = model.predict(images, batch_size=10)
print(classes[0])
if classes[0]>0.5:
print(fn + " is a human")
else:
print(fn + " is a horse")

可视化中间表示

为了了解我们的convnet学到了什么样的特性,一件有趣的事情就是想象输入在通过convnet时是如何被转换的。

让我们从训练集中选取一个随机图像,然后生成一个图形,其中每行都是图层的输出,并且该行中的每个图像都是输出要素图中的特定过滤器。重新运行此单元格,为各种训练图像生成中间表示。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
import numpy as np
import random
from tensorflow.keras.preprocessing.image import img_to_array, load_img

# Let's define a new Model that will take an image as input, and will output
# intermediate representations for all layers in the previous model after
# the first.
successive_outputs = [layer.output for layer in model.layers[1:]]
#visualization_model = Model(img_input, successive_outputs)
visualization_model = tf.keras.models.Model(inputs = model.input, outputs = successive_outputs)
# Let's prepare a random input image from the training set.
horse_img_files = [os.path.join(train_horse_dir, f) for f in train_horse_names]
human_img_files = [os.path.join(train_human_dir, f) for f in train_human_names]
img_path = random.choice(horse_img_files + human_img_files)

img = load_img(img_path, target_size=(300, 300)) # this is a PIL image
x = img_to_array(img) # Numpy array with shape (150, 150, 3)
x = x.reshape((1,) + x.shape) # Numpy array with shape (1, 150, 150, 3)

# Rescale by 1/255
x /= 255

# Let's run our image through our network, thus obtaining all
# intermediate representations for this image.
successive_feature_maps = visualization_model.predict(x)

# These are the names of the layers, so can have them as part of our plot
layer_names = [layer.name for layer in model.layers]

# Now let's display our representations
for layer_name, feature_map in zip(layer_names, successive_feature_maps):
if len(feature_map.shape) == 4:
# Just do this for the conv / maxpool layers, not the fully-connected layers
n_features = feature_map.shape[-1] # number of features in feature map
# The feature map has shape (1, size, size, n_features)
size = feature_map.shape[1]
# We will tile our images in this matrix
display_grid = np.zeros((size, size * n_features))
for i in range(n_features):
# Postprocess the feature to make it visually palatable
x = feature_map[0, :, :, i]
x -= x.mean()
x /= x.std()
x *= 64
x += 128
x = np.clip(x, 0, 255).astype('uint8')
# We'll tile each filter into this big horizontal grid
display_grid[:, i * size : (i + 1) * size] = x
# Display the grid
scale = 20. / n_features
plt.figure(figsize=(scale * n_features, scale))
plt.title(layer_name)
plt.grid(False)
plt.imshow(display_grid, aspect='auto', cmap='viridis')

如你所见,我们从图像的原始像素走向越来越抽象和紧凑的表示。下游的表示开始强调网络关注的内容,它们显示越来越少的功能被“激活”;大多数设置为零。这叫做“稀疏”表征稀疏性是深度学习的一个关键特征。

这些表示携带越来越少的关于图像原始像素的信息,但是越来越精确的关于图像类别的信息。你可以把convnet(或一般的深层网络)想象成一个信息蒸馏管道。