がくしゅう（仮）でがっかり

懲りずにTD(1)-leafで学習させてみた。

3000局程度自己対局(回数が不十分)
1手0.1秒の反復深化
評価項目は
- 駒割
- KKP
- KPP
10対局ごとにパラメータの更新
L1正則化(メタパラメータは0.0005)

という条件で学習してみた。

で対局実験を行った結果、

CSAの大会のプログラム(駒割のみ)に

3勝-29敗-0分

という残念な結果に。

駒割は

と 482
杏 650
圭 665
全 656
馬 1230
龍 1368
金 652
歩 71
香 359
桂 423
銀 545
角 896
飛 1053

とそんなに悪いあたいではないのだけれど。

しかし学習版の評価値を見ると、

の局面で2秒探索させて<1:search depth 6 result -699<1:search depth 6 result -697<1:search depth 7 result -398<1:search depth 7 result -399<1:search depth 8 result -622<1:search depth 9 result -430<1:search depth 10 result -551<1:search depth 11 result -477<1:search depth 12 result -589<1:search depth 13 result -481

といってるので約桂一枚分損と思っているらしい。

位置価値が大きすぎるのかもしれない。

ボナメソに心が揺れる今日この頃。

週末にボナメソ実装予定。