作業ログ vegetaで負荷テストしてみる 2019/09/08

※この記事に登場するソースコードなどは僕個人が開発しているプロジェクトのもので仕事とは無関係です

負荷テストツールというとJMeterが有名ですが、ググってみたところ最近はVegetaというツールの評判が良いみたいなので試してみようと思います。

executableが公開されているのでこれを使っていきます。

Go製なんですね。

まずはexecutableをダウンロードして解凍します

$ wget https://github.com/tsenart/vegeta/releases/download/v12.7.0/vegeta-12.7.0-linux-amd64.tar.gz

$ tar -xvfz vegeta-12.7.0-linux-amd64.tar.gz

$ ./vegeta --help

# いっぱい出てきます。いろいろコマンドがあるらしいです

ヘルプの一番下にexamplesが出てきたのでまずはこれを試してみます。

examples:
  echo "GET http://localhost/" | vegeta attack -duration=5s | tee results.bin | vegeta report
  vegeta report -type=json results.bin > metrics.json
  cat results.bin | vegeta plot > plot.html
  cat results.bin | vegeta report -type="hist[0,100ms,200ms,300ms]"

いまローカルで192.168.0.106:3000でrailsを動かしてるのでこれでやってみました

$ echo "GET http://192.168.0.106:3000/xxx/xxx" | ./vegeta attack -duration=5s | tee results.bin | ./vegeta report
Requests      [total, rate, throughput]  250, 50.20, 48.30
Duration      [total, attack, wait]      5.1760113s, 4.9800325s, 195.9788ms
Latencies     [mean, 50, 95, 99, max]    49.355204ms, 33.345033ms, 64.9348ms, 348.376ms, 355.9589ms
Bytes In      [total, mean]              675500, 2702.00
Bytes Out     [total, mean]              0, 0.00
Success       [ratio]                    100.00%
Status Codes  [code:count]               200:250
Error Set:

$ ./vegeta report -type=json results.bin > metrics.json
$ cat results.bin | ./vegeta plot > plot.html
$ cat results.bin | ./vegeta report -type="hist[0,100ms,200ms,300ms]"
Bucket           #    %       Histogram
[0s,     100ms]  239  95.60%  #######################################################################
[100ms,  200ms]  0    0.00%
[200ms,  300ms]  0    0.00%
[300ms,  +Inf]   11   4.40%   ###

なるほどわかりやすいですね。

いま試したexampleだと duration=5s で 250 リクエスト投げつけたってことなんでしょう。

最後のレポートでのヒストグラム、300msの部分、いやな予感がするので勘で引数をいじって実行してみます。

$ cat results.bin | ./vegeta report -type="hist[0,100ms,200ms,300ms,400ms,500ms,600ms,700ms,800ms]"
Bucket           #    %       Histogram
[0s,     100ms]  239  95.60%  #######################################################################
[100ms,  200ms]  0    0.00%
[200ms,  300ms]  0    0.00%
[300ms,  400ms]  11   4.40%   ###
[400ms,  500ms]  0    0.00%
[500ms,  600ms]  0    0.00%
[600ms,  700ms]  0    0.00%
[700ms,  800ms]  0    0.00%
[800ms,  +Inf]   0    0.00%

あれ、ほんとに300ms台で返ってたんですね。

次はproductionで試してみます。（まだアクティブユーザーいないので）

ちなみに負荷テスト用にrobots.txtを書いてauthenticationなしでアクセスできるダミーページを作って試しました。

$ echo "GET https://xxx/xxx/xxx" | ./vegeta attack -duration=5s | tee results.bin | ./vegeta report
Requests      [total, rate, throughput]  250, 50.20, 44.70
Duration      [total, attack, wait]      5.1897395s, 4.9801034s, 209.6361ms
Latencies     [mean, 50, 95, 99, max]    1.648695664s, 1.484949505s, 3.1132589s, 3.2826205s, 3.409735s
Bytes In      [total, mean]              712642, 2850.57
Bytes Out     [total, mean]              0, 0.00
Success       [ratio]                    92.80%
Status Codes  [code:count]               200:232  502:18
Error Set:
502 Bad Gateway

あらー、、、さばききれてないみたいですね。

続いてレポートを見てみます。

$ cat results.bin | ./vegeta report -type="hist[0,100ms,200ms,300ms,400ms,500ms,600ms,700ms,800ms,900ms,1000ms,1500ms,2000ms,2500ms,3000ms,3500ms,4000ms,4500ms,5000ms]"
Bucket           #   %       Histogram
[0s,     100ms]  5   2.00%   #
[100ms,  200ms]  1   0.40%
[200ms,  300ms]  5   2.00%   #
[300ms,  400ms]  7   2.80%   ##
[400ms,  500ms]  6   2.40%   #
[500ms,  600ms]  12  4.80%   ###
[600ms,  700ms]  9   3.60%   ##
[700ms,  800ms]  9   3.60%   ##
[800ms,  900ms]  9   3.60%   ##
[900ms,  1s]     11  4.40%   ###
[1s,     1.5s]   52  20.80%  ###############
[1.5s,   2s]     30  12.00%  #########
[2s,     2.5s]   31  12.40%  #########
[2.5s,   3s]     39  15.60%  ###########
[3s,     3.5s]   24  9.60%   #######
[3.5s,   4s]     0   0.00%
[4s,     4.5s]   0   0.00%
[4.5s,   5s]     0   0.00%
[5s,     +Inf]   0   0.00%

ひとりイスコン開催決定じゃーー

せっかくなのでもうちょっといろいろ試してみます。

もっかいヘルプを眺めてみます。

Usage: vegeta [global flags] <command> [command flags]

global flags:
  -cpus int
        Number of CPUs to use (defaults to the number of CPUs you have)
  -profile string
        Enable profiling of [cpu, heap]
  -version
        Print version and exit

attack command:
  -body string
        Requests body file
  -cert string
        TLS client PEM encoded certificate file
  -connections int
        Max open idle connections per target host (default 10000)
  -duration duration
        Duration of the test [0 = forever]
  -format string
        Targets format [http, json] (default "http")
  -h2c
        Send HTTP/2 requests without TLS encryption
  -header value
        Request header
  -http2
        Send HTTP/2 requests when supported by the server (default true)
  -insecure
        Ignore invalid server TLS certificates
  -keepalive
        Use persistent connections (default true)
  -key string
        TLS client PEM encoded private key file
  -laddr value
        Local IP address (default 0.0.0.0)
  -lazy
        Read targets lazily
  -max-body value
        Maximum number of bytes to capture from response bodies. [-1 = no limit] (default -1)
  -max-workers uint
        Maximum number of workers (default 18446744073709551615)
  -name string
        Attack name
  -output string
        Output file (default "stdout")
  -rate value
        Number of requests per time unit [0 = infinity] (default 50/1s)
  -redirects int
        Number of redirects to follow. -1 will not follow but marks as success (default 10)
  -resolvers value
        List of addresses (ip:port) to use for DNS resolution. Disables use of local system DNS. (comma separated list)
  -root-certs value
        TLS root certificate files (comma separated list)
  -targets string
        Targets file (default "stdin")
  -timeout duration
        Requests timeout (default 30s)
  -unix-socket string
        Connect over a unix socket. This overrides the host address in target URLs
  -workers uint
        Initial number of workers (default 10)

encode command:
  -output string
        Output file (default "stdout")
  -to string
        Output encoding [csv, gob, json] (default "json")

plot command:
  -output string
        Output file (default "stdout")
  -threshold int
        Threshold of data points above which series are downsampled. (default 4000)
  -title string
        Title and header of the resulting HTML page (default "Vegeta Plot")

report command:
  -buckets string
        Histogram buckets, e.g.: "[0,1ms,10ms]"
  -every duration
        Report interval
  -output string
        Output file (default "stdout")
  -type string
        Report type to generate [text, json, hist[buckets], hdrplot] (default "text")

examples:
  echo "GET http://localhost/" | vegeta attack -duration=5s | tee results.bin | vegeta report
  vegeta report -type=json results.bin > metrics.json
  cat results.bin | vegeta plot > plot.html
  cat results.bin | vegeta report -type="hist[0,100ms,200ms,300ms]"

rate面白そうですね。

-duration=10s -rate=100 で実行してみます。

$ echo "GET https://www.xxx.net/xxx/xxx" | ./vegeta attack -duration=10s -rate=100 | tee results.bin | ./vegeta report
Requests      [total, rate, throughput]  1000, 100.10, 76.51
Duration      [total, attack, wait]      11.2018003s, 9.9902325s, 1.2115678s
Latencies     [mean, 50, 95, 99, max]    1.180250319s, 1.156802225s, 2.27633799s, 2.5008392s, 2.7556836s
Bytes In      [total, mean]              2742113, 2742.11
Bytes Out     [total, mean]              0, 0.00
Success       [ratio]                    85.70%
Status Codes  [code:count]               200:857  500:13  502:130
Error Set:
502 Bad Gateway
500 Internal Server Error

ちなみに実行中自分でもブラウザでアクセスしてみたら500が返ってきました。

レポートはこんな感じでした。

$ cat results.bin | ./vegeta report -type="hist[0,100ms,200ms,300ms,400ms,500ms,600ms,700ms,800ms,900ms,1000ms,1500ms,2000ms,2500ms,3000ms,3500ms,4000ms]"
Bucket           #    %       Histogram
[0s,     100ms]  81   8.10%   ######
[100ms,  200ms]  10   1.00%
[200ms,  300ms]  1    0.10%
[300ms,  400ms]  7    0.70%
[400ms,  500ms]  18   1.80%   #
[500ms,  600ms]  46   4.60%   ###
[600ms,  700ms]  76   7.60%   #####
[700ms,  800ms]  78   7.80%   #####
[800ms,  900ms]  33   3.30%   ##
[900ms,  1s]     45   4.50%   ###
[1s,     1.5s]   306  30.60%  ######################
[1.5s,   2s]     175  17.50%  #############
[2s,     2.5s]   114  11.40%  ########
[2.5s,   3s]     10   1.00%
[3s,     3.5s]   0    0.00%
[3.5s,   4s]     0    0.00%
[4s,     +Inf]   0    0.00%

500 Internal Server Errorが気になったのでnginxのerror.logを流しながら負荷を掛けてみたところ、負荷後半で以下のようなエラーがドバっと出てきました。

2019/09/08 06:14:09 [crit] 31110#31110: accept4() failed (24: Too many open files)
2019/09/08 06:14:09 [crit] 31110#31110: *32117 open() "/home/xxx/xxx/xxx/public/500.html" failed (24: Too m
any open files), client: xxx.xxx.xxx.xxx, server: www.xxxxx.net, request: "GET /xxx/xxx HTTP/1.1", upst
ream: "http://unix:/home/xxx/xxx/xxx/tmp/sockets/puma.sock/xxx/xxx", host: "www.xxxxx.net"

扱えるファイルディスクリプタの上限を超えて500.htmlを開こうとしているようです。

設定値を変更するのではなく、レスポンスを速くするのが正しい対応な気がするのでこれは手をつけないでおこうと思います。

あと負荷テスト中にtopコマンドで様子を見ていたらCPUが93%くらいまで上がってました(COMMAND: ruby)。

とりあえずIOとか変なところがボトルネックではなさそうで少し安心しました。

ところで負荷テストってなんでこんなに楽しいんでしょうね。
開発がメインだとしたら負荷テストはデザートみたいなものだと思いませんか。

負荷テストはデザート、流行らせていきましょう。