lab:training/test dataset, learning rate, normalization

해당 게시물은 Edwith에서 제공하는
머신러닝과 딥러닝 BASIC을 듣고 요약 정리한 글입니다.

Training and Test data sets

Data SetTraining SetTest Set으로 나누어 진행

# Training Set
x_data = [
    [1, 2, 1],
    [1, 3, 2],
    [1, 3, 4],
    [1, 5, 5],
    [1, 7, 5],
    [1, 2, 5],
    [1, 6, 6],
    [1, 7, 7],
]
y_data = [
    [0, 0, 1],
    [0, 0, 1],
    [0, 0, 1],
    [0, 1, 0],
    [0, 1, 0],
    [0, 1, 0],
    [1, 0, 0],
    [1, 0, 0],
]

# Test Set
x_test = [
    [2, 1, 1],
    [3, 1, 2],
    [3, 3, 4],
]
y_test = [
    [0, 0, 1],
    [0, 0, 1],
    [0, 0, 1],
]

이러한 상황에서 placeholder가 유용하다.
placeholder를 이용해서 어떠한 값이 들어올 때
Traiining Setplaceholder에 넣어서 학습시키고
Test Setplaceholder에 넣어서 테스트를 진행하면 된다.

import tensorflow as tf

X = tf.placeholder("float", [None, 3])
Y = tf.placeholder("float", [None, 3])
W = tf.Variable(tf.random_normal([3, 3]))
b = tf.Variable(tf.random_normal([3]))

hypothesis = tf.nn.softmax(tf.matmul(X, W) + b)
cost = tf.reduce_mean(-tf.reduce_sum(Y * tf.log(hypothesis), axis=1))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1).minimize(cost)

# Correct prediction Test model
prediction = tf.argmax(hypothesis, 1)
is_correct = tf.equal(prediction, tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(is_correct, tf.float32))

# Launch graph
with tf.Session() as sess:
    # Initialize Tenserflow variables
    sess.run(tf.global_variables_initializer())

    for step in range(201):
        cost_val, W_val, _ = sess.run([cost, W, optimizer],
                                      feed_dict={X: x_data, Y: y_data})

        if step % 20 == 0:
            print(step, cost_val, W_val)

    # Predict
    print("Prediction :", sess.run(prediction, feed_dict={X: x_test}))
    # Calculate the accuracy
    print("Accuracy :", sess.run(accuracy, feed_dict={X: x_test, Y: y_test}))
0 4.1074553 [[-1.4293944  -0.7224384   2.848082  ]
 [ 1.1112422  -0.3999512  -0.71154207]
 [-0.15248409  0.17528887  0.5048751 ]]
20 0.7546797 [[-1.5631142  -0.5818348   2.841198  ]
 [ 0.72321707 -0.0059972  -0.71747077]
 [-0.10348553  0.66835874 -0.03719316]]
40 0.6795351 [[-1.6549793  -0.5410513   2.8922794 ]
 [ 0.56363654 -0.01940873 -0.54447865]
 [ 0.08512951  0.65142184 -0.20887122]]
60 0.63409156 [[-1.7386525  -0.5033351   2.9382365 ]
 [ 0.4421414  -0.01824449 -0.42414775]
 [ 0.2345651   0.6264432  -0.33332816]]
80 0.6051625 [[-1.8194332  -0.4668002   2.9824824 ]
 [ 0.35451818 -0.01411752 -0.34065136]
 [ 0.34999305  0.60314    -0.42545283]]
100 0.58531666 [[-1.8991535  -0.43095058  3.0263534 ]
 [ 0.29415295 -0.01108143 -0.28332224]
 [ 0.43837747  0.58400047 -0.49469772]]
120 0.57030904 [[-1.9780612  -0.39590144  3.0702124 ]
 [ 0.25370523 -0.00954942 -0.24440636]
 [ 0.50700086  0.5684458  -0.54776627]]
140 0.55791247 [[-2.0558922  -0.3618978   3.1140394 ]
 [ 0.22686873 -0.00888192 -0.2182373 ]
 [ 0.5619558   0.5552294  -0.58950466]]
160 0.54703254 [[-2.13234    -0.32911745  3.1577072 ]
 [ 0.20899145 -0.00845987 -0.20078206]
 [ 0.6076821   0.5433617  -0.6233629 ]]
180 0.5371375 [[-2.2071946  -0.29763928  3.2010844 ]
 [ 0.19691719 -0.00792996 -0.18923767]
 [ 0.64718145  0.53225446 -0.65175486]]
200 0.5279585 [[-2.2803524  -0.26746872  3.2440712 ]
 [ 0.18859148 -0.0071582  -0.18168372]
 [ 0.6824146   0.5216165  -0.6763497 ]]
Prediction : [2 2 2]
Accuracy : 1.0

여기에서 AccuracyPredection의 값은
Training Set을 가지고 학습시킨 모델을 가지고
Test Set을 예측한 값으로 모델 입장에서 한 번도
학습하지 않은 데이터로 예측을한 의미가 있는 결과 값이다.

Large Learning Rate

Learning Rate가 너무 클경우

  • Overshooting
import tensorflow as tf

X = tf.placeholder("float", [None, 3])
Y = tf.placeholder("float", [None, 3])
W = tf.Variable(tf.random_normal([3, 3]))
b = tf.Variable(tf.random_normal([3]))

hypothesis = tf.nn.softmax(tf.matmul(X, W) + b)
cost = tf.reduce_mean(-tf.reduce_sum(Y * tf.log(hypothesis), axis=1))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1.5).minimize(cost)

# Correct prediction Test model
prediction = tf.argmax(hypothesis, 1)
is_correct = tf.equal(prediction, tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(is_correct, tf.float32))

# Launch graph
with tf.Session() as sess:
    # Initialize Tenserflow variables
    sess.run(tf.global_variables_initializer())

    for step in range(201):
        cost_val, W_val, _ = sess.run([cost, W, optimizer],
                                      feed_dict={X: x_data, Y: y_data})

        if step % 20 == 0:
            print(step, cost_val, W_val)

    # Predict
    print("Prediction :", sess.run(prediction, feed_dict={X: x_test}))
    # Calculate the accuracy
    print("Accuracy :", sess.run(accuracy, feed_dict={X: x_test, Y: y_test}))
0 5.1369658 [[-0.16136795 -0.052203    1.2202002 ]
 [ 1.7465985  -3.63817     1.3205297 ]
 [ 0.01285887 -4.4370604   0.21666038]]
20 nan [[nan nan nan]
 [nan nan nan]
 [nan nan nan]]
40 nan [[nan nan nan]
 [nan nan nan]
 [nan nan nan]]
60 nan [[nan nan nan]
 [nan nan nan]
 [nan nan nan]]
80 nan [[nan nan nan]
 [nan nan nan]
 [nan nan nan]]
100 nan [[nan nan nan]
 [nan nan nan]
 [nan nan nan]]
120 nan [[nan nan nan]
 [nan nan nan]
 [nan nan nan]]
140 nan [[nan nan nan]
 [nan nan nan]
 [nan nan nan]]
160 nan [[nan nan nan]
 [nan nan nan]
 [nan nan nan]]
180 nan [[nan nan nan]
 [nan nan nan]
 [nan nan nan]]
200 nan [[nan nan nan]
 [nan nan nan]
 [nan nan nan]]
Prediction : [0 0 0]
Accuracy : 0.0

Learning rate를 1.5로 올린 결과 Overshooting
발생해 학습이 잘 되지않은 모델이 생성되어 예측이 잘 되지않았다.

Learning Rate가 너무 작을 경우

  • Many iterations
  • Local minima에 빠질 수 있다.
import tensorflow as tf

X = tf.placeholder("float", [None, 3])
Y = tf.placeholder("float", [None, 3])
W = tf.Variable(tf.random_normal([3, 3]))
b = tf.Variable(tf.random_normal([3]))

hypothesis = tf.nn.softmax(tf.matmul(X, W) + b)
cost = tf.reduce_mean(-tf.reduce_sum(Y * tf.log(hypothesis), axis=1))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1e-10).minimize(cost)

# Correct prediction Test model
prediction = tf.argmax(hypothesis, 1)
is_correct = tf.equal(prediction, tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(is_correct, tf.float32))

# Launch graph
with tf.Session() as sess:
    # Initialize Tenserflow variables
    sess.run(tf.global_variables_initializer())

    for step in range(201):
        cost_val, W_val, _ = sess.run([cost, W, optimizer],
                                      feed_dict={X: x_data, Y: y_data})

        if step % 20 == 0:
            print(step, cost_val, W_val)

    # Predict
    print("Prediction :", sess.run(prediction, feed_dict={X: x_test}))
    # Calculate the accuracy
    print("Accuracy :", sess.run(accuracy, feed_dict={X: x_test, Y: y_test}))
0 4.4417534 [[ 1.6614894  -2.5569797   1.3669713 ]
 [-0.43707097  1.3632766   1.4774112 ]
 [-0.6081834   0.77286756  0.4197207 ]]
20 4.4417534 [[ 1.6614894  -2.5569797   1.3669713 ]
 [-0.43707097  1.3632766   1.4774112 ]
 [-0.6081834   0.77286756  0.4197207 ]]
40 4.4417534 [[ 1.6614894  -2.5569797   1.3669713 ]
 [-0.43707097  1.3632766   1.4774112 ]
 [-0.6081834   0.77286756  0.4197207 ]]
60 4.4417534 [[ 1.6614894  -2.5569797   1.3669713 ]
 [-0.43707097  1.3632766   1.4774112 ]
 [-0.6081834   0.77286756  0.4197207 ]]
80 4.4417534 [[ 1.6614894  -2.5569797   1.3669713 ]
 [-0.43707097  1.3632766   1.4774112 ]
 [-0.6081834   0.77286756  0.4197207 ]]
100 4.4417534 [[ 1.6614894  -2.5569797   1.3669713 ]
 [-0.43707097  1.3632766   1.4774112 ]
 [-0.6081834   0.77286756  0.4197207 ]]
120 4.4417534 [[ 1.6614894  -2.5569797   1.3669713 ]
 [-0.43707097  1.3632766   1.4774112 ]
 [-0.6081834   0.77286756  0.4197207 ]]
140 4.4417534 [[ 1.6614894  -2.5569797   1.3669713 ]
 [-0.43707097  1.3632766   1.4774112 ]
 [-0.6081834   0.77286756  0.4197207 ]]
160 4.4417534 [[ 1.6614894  -2.5569797   1.3669713 ]
 [-0.43707097  1.3632766   1.4774112 ]
 [-0.6081834   0.77286756  0.4197207 ]]
180 4.4417534 [[ 1.6614894  -2.5569797   1.3669713 ]
 [-0.43707097  1.3632766   1.4774112 ]
 [-0.6081834   0.77286756  0.4197207 ]]
200 4.4417534 [[ 1.6614894  -2.5569797   1.3669713 ]
 [-0.43707097  1.3632766   1.4774112 ]
 [-0.6081834   0.77286756  0.4197207 ]]
Prediction : [0 0 2]
Accuracy : 0.33333334

Learning rate를 1e-10으로 낮추었더니,
cost가 줄어들지 않고 학습이 이루어지지 않은 결과가 생겼다.

Non-normalized inputs

아래와 같이 데이터들 간의 차이가 큰 Data Set을 사용하면
한쪽 방향으로 치우쳐진 왜곡된 그래프가 그려지게 된다.

x_data (xy[:, 0:-1]) y_data (xy[:, [-1])
[828.659973, 833.450012, 908100, 828.349976] [831.659973]
[823.02002, 828.070007, 1828100, 821.655029] [828.070007]
[816, 820.958984, 1008100, 815.48999] [819.23999]
[819.359985, 823, 1188100, 818.469971] [818.97998]
[819, 823, 1198100, 816] [820.450012]
[811.700012, 815.25, 1098100, 809.780029] [813.669983]
[809.51001, 816.659973, 1398100, 804.539978] [809.559998]
import numpy as np

xy = np.array([
    [828.659973, 833.450012, 908100, 828.349976, 831.659973],
    [823.02002, 828.070007, 1828100, 821.655029, 828.070007],
    [819.929993, 824.400024, 1438100, 818.97998, 824.159973],
    [816, 820.958984, 1008100, 815.48999, 819.23999],
    [819.359985, 823, 1188100, 818.469971, 818.97998],
    [819, 823, 1198100, 816, 820.450012],
    [811.700012, 815.25, 1098100, 809.780029, 813.669983],
    [809.51001, 816.659973, 1398100, 804.539978, 809.559998]
])
x_data = xy[:, 0:-1]
y_data = xy[:, [-1]]

# Placeholder for a tensor that will be always frd.
X = tf.placeholder(tf.float32, shape=[None, 4])
Y = tf.placeholder(tf.float32, shape=[None, 1])
W = tf.Variable(tf.random_normal([4, 1]), name='weight')
b = tf.Variable(tf.random_normal([1]), name='bias')

hypothesis = tf.matmul(X, W) + b
cost = tf.reduce_mean(tf.square(hypothesis - Y))

# Minimize
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1e-5)
train = optimizer.minimize(cost)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

for step in range(201):
    cost_val, hy_val, _ = sess.run(
        [cost, hypothesis, train], feed_dict={X: x_data, Y: y_data}
    )

    if step % 40 == 0:
        print(step, "Cost :", cost_val,
             "\nPrediction\n", hy_val)
0 Cost : 272062040000.0
Prediction
 [[-367724.94]
 [-739115.9 ]
 [-581670.8 ]
 [-408077.16]
 [-480746.12]
 [-484782.1 ]
 [-444402.88]
 [-565508.5 ]]
40 Cost : nan
Prediction
 [[nan]
 [nan]
 [nan]
 [nan]
 [nan]
 [nan]
 [nan]
 [nan]]
80 Cost : nan
Prediction
 [[nan]
 [nan]
 [nan]
 [nan]
 [nan]
 [nan]
 [nan]
 [nan]]
120 Cost : nan
Prediction
 [[nan]
 [nan]
 [nan]
 [nan]
 [nan]
 [nan]
 [nan]
 [nan]]
160 Cost : nan
Prediction
 [[nan]
 [nan]
 [nan]
 [nan]
 [nan]
 [nan]
 [nan]
 [nan]]
200 Cost : nan
Prediction
 [[nan]
 [nan]
 [nan]
 [nan]
 [nan]
 [nan]
 [nan]
 [nan]]

예측이 잘 되지않은 이유는 데이터가 Normalized되지 않았기 때문이다.

Normalized inputs (min-max scale)

def min_max_scaler(data):
    numerator = data - np.min(data, 0)
    denominator = np.max(data, 0) - np.min(data, 0)
    # noise term prevents the zero division
    return numerator / (denominator + 1e-7)

xy = min_max_scaler(xy)
print(xy)
[[0.99999999 0.99999999 0.         1.         1.        ]
 [0.70548491 0.70439552 1.         0.71881782 0.83755791]
 [0.54412549 0.50274824 0.57608696 0.606468   0.6606331 ]
 [0.33890353 0.31368023 0.10869565 0.45989134 0.43800918]
 [0.51436    0.42582389 0.30434783 0.58504805 0.42624401]
 [0.49556179 0.42582389 0.31521739 0.48131134 0.49276137]
 [0.11436064 0.         0.20652174 0.22007776 0.18597238]
 [0.         0.07747099 0.5326087  0.         0.        ]]
x_data = xy[:, 0:-1]
y_data = xy[:, [-1]]

# Placeholder for a tensor that will be always frd.
X = tf.placeholder(tf.float32, shape=[None, 4])
Y = tf.placeholder(tf.float32, shape=[None, 1])
W = tf.Variable(tf.random_normal([4, 1]), name='weight')
b = tf.Variable(tf.random_normal([1]), name='bias')

hypothesis = tf.matmul(X, W) + b
cost = tf.reduce_mean(tf.square(hypothesis - Y))

# Minimize
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1e-5)
train = optimizer.minimize(cost)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

for step in range(201):
    cost_val, hy_val, _ = sess.run(
        [cost, hypothesis, train], feed_dict={X: x_data, Y: y_data}
    )

    if step % 40 == 0:
        print(step, "Cost :", cost_val,
             "\nPrediction\n", hy_val)
0 Cost : 2.1135879
Prediction
 [[-0.8375394 ]
 [-1.6286453 ]
 [-1.1022344 ]
 [-0.44793546]
 [-0.832747  ]
 [-0.7305834 ]
 [-0.39647448]
 [-0.3694872 ]]
40 Cost : 2.1074905
Prediction
 [[-0.83465135]
 [-1.6258268 ]
 [-1.099904  ]
 [-0.44615546]
 [-0.83062065]
 [-0.72853124]
 [-0.39512596]
 [-0.36815533]]
80 Cost : 2.101407
Prediction
 [[-0.83176506]
 [-1.6230099 ]
 [-1.0975752 ]
 [-0.44437712]
 [-0.8284959 ]
 [-0.7264808 ]
 [-0.39377916]
 [-0.36682522]]
120 Cost : 2.0953507
Prediction
 [[-0.82888675]
 [-1.620203  ]
 [-1.0952536 ]
 [-0.4426031 ]
 [-0.82637715]
 [-0.72443604]
 [-0.39243543]
 [-0.36549872]]
160 Cost : 2.0893104
Prediction
 [[-0.8260109 ]
 [-1.6173992 ]
 [-1.0929348 ]
 [-0.44083124]
 [-0.8242608 ]
 [-0.72239375]
 [-0.39109373]
 [-0.3641746 ]]
200 Cost : 2.0832868
Prediction
 [[-0.8231386 ]
 [-1.6145985 ]
 [-1.0906188 ]
 [-0.4390618 ]
 [-0.82214713]
 [-0.7203541 ]
 [-0.3897541 ]
 [-0.3628522 ]]

같은 데이터를 MinMax Scaler에 넣어 정규화(Normalized)한 후
사용하니 값이 예측되는 것을 확인할 수 있다.


Written by@Minsu Kim
Software Engineer at KakaoPay Corp.