Stockfish Development Versions are build automatically if there are changes on the master branch in the git repository (https://github.com/official-stockfish/Stockfish). Use it at your own risk.
They are compiled with gcc 11.2/mingw 10 on Ubuntu 22.04.

Windows x64 for Haswell CPUs
Windows x64 for modern computers + AVX2
Windows x64 for modern computers
Windows x64 + SSSE3
Windows x64
Windows 32
Linux x64 for Haswell CPUs
Linux x64 for modern computers + AVX2
Linux x64 for modern computers
Linux x64 + SSSE3
Linux x64
Author: SFisGOD
Date: Fri Aug 27 07:51:26 2021 +0200
Timestamp: 1630043486

Update default net to nn-33495fe25081.nnue

STC:
LLR: 2.95 (-2.94,2.94) <-0.50,2.50>
Total: 37368 W: 9621 L: 9391 D: 18356 Elo +2.14
Ptnml(0-2): 117, 4287, 9664, 4481, 135
https://tests.stockfishchess.org/tests/view/612768165318138ee1204977

LTC:
LLR: 2.94 (-2.94,2.94) <0.50,3.50>
Total: 13328 W: 3446 L: 3246 D: 6636 Elo +5.21
Ptnml(0-2): 11, 1383, 3682, 1571, 17
https://tests.stockfishchess.org/tests/view/6127dc8d62d20cf82b5ad196

Closes https://github.com/official-stockfish/Stockfish/pull/3679

Bench: 5179347
see source
Windows x64 for Haswell CPUs
Windows x64 for modern computers + AVX2
Windows x64 for modern computers
Windows x64 + SSSE3
Windows x64
Windows 32
Linux x64 for Haswell CPUs
Linux x64 for modern computers + AVX2
Linux x64 for modern computers
Linux x64 + SSSE3
Linux x64
Author: ppigazzini
Date: Fri Aug 27 07:49:26 2021 +0200
Timestamp: 1630043366

Use "pedantic" flag also for mingw

This will avoid to run in fishtest a test where the linux machines exit from
the building process and only the windows machines run the test.

See:
https://tests.stockfishchess.org/tests/view/61122d732a8a49ac5be79996
https://github.com/SFisGOD/Stockfish/commit/4e422577d6ebd1f6ecf606189190b8f6fb03f6c9#comments

closes https://github.com/official-stockfish/Stockfish/pull/3671

No functional change.
see source
Windows x64 for Haswell CPUs
Windows x64 for modern computers + AVX2
Windows x64 for modern computers
Windows x64 + SSSE3
Windows x64
Windows 32
Linux x64 for Haswell CPUs
Linux x64 for modern computers + AVX2
Linux x64 for modern computers
Linux x64 + SSSE3
Linux x64
Author: Joost VandeVondele
Date: Fri Aug 27 07:48:18 2021 +0200
Timestamp: 1630043298

Fix empty EvalFile option

some GUIs send an empty string for EvalFile, in that case explicitly try the default name

fixes https://github.com/official-stockfish/Stockfish/issues/3675

closes https://github.com/official-stockfish/Stockfish/pull/3678

No functional change.
see source
Windows x64 for Haswell CPUs
Windows x64 for modern computers + AVX2
Windows x64 for modern computers
Windows x64 + SSSE3
Windows x64
Windows 32
Linux x64 for Haswell CPUs
Linux x64 for modern computers + AVX2
Linux x64 for modern computers
Linux x64 + SSSE3
Linux x64
Author: bmc4
Date: Sun Aug 22 09:15:19 2021 +0200
Timestamp: 1629616519

Simplify Declaration on Pawn Move Generation

Removes possible micro-optimization in favor of readability.

STC:
LLR: 2.95 (-2.94,2.94) <-2.50,0.50>
Total: 75432 W: 5824 L: 5777 D: 63831 Elo +0.22
Ptnml(0-2): 178, 4648, 28036, 4657, 197
https://tests.stockfishchess.org/tests/view/611fa7f84977aa1525c9cb75

LTC:
LLR: 2.93 (-2.94,2.94) <-2.50,0.50>
Total: 41200 W: 1156 L: 1106 D: 38938 Elo +0.42
Ptnml(0-2): 13, 981, 18562, 1031, 13
https://tests.stockfishchess.org/tests/view/611fcc694977aa1525c9cb9b

Closes https://github.com/official-stockfish/Stockfish/pull/3669

No functional change
see source
Windows x64 for Haswell CPUs
Windows x64 for modern computers + AVX2
Windows x64 for modern computers
Windows x64 + SSSE3
Windows x64
Windows 32
Linux x64 for Haswell CPUs
Linux x64 for modern computers + AVX2
Linux x64 for modern computers
Linux x64 + SSSE3
Linux x64
Author: SFisGOD
Date: Sun Aug 22 09:09:58 2021 +0200
Timestamp: 1629616198

Update default net to nn-517c4f68b5df.nnue

SPSA: https://tests.stockfishchess.org/tests/view/611cf0da4977aa1525c9ca03
Parameters: 256 net weights and 8 net biases (output layer)
Base net: nn-ac5605a608d6.nnue
New net: nn-517c4f68b5df.nnue

STC:
LLR: 2.93 (-2.94,2.94) <-0.50,2.50>
Total: 11600 W: 998 L: 851 D: 9751 Elo +4.40
Ptnml(0-2): 30, 705, 4186, 846, 33
https://tests.stockfishchess.org/tests/view/611f84524977aa1525c9cb5b

LTC:
LLR: 2.95 (-2.94,2.94) <0.50,3.50>
Total: 9360 W: 338 L: 243 D: 8779 Elo +3.53
Ptnml(0-2): 0, 220, 4151, 303, 6
https://tests.stockfishchess.org/tests/view/611f8c5b4977aa1525c9cb64

closes https://github.com/official-stockfish/Stockfish/pull/3667

Bench: 4844618
see source
Windows x64 for Haswell CPUs
Windows x64 for modern computers + AVX2
Windows x64 for modern computers
Windows x64 + SSSE3
Windows x64
Windows 32
Linux x64 for Haswell CPUs
Linux x64 for modern computers + AVX2
Linux x64 for modern computers
Linux x64 + SSSE3
Linux x64
Author: candirufish
Date: Sun Aug 22 09:05:53 2021 +0200
Timestamp: 1629615953

do more LMR extensions for PV nodes

LMR Pv and depth 6 Extension tweak:

LTC:
LLR: 2.93 (-2.94,2.94) <0.50,3.50>
Total: 52488 W: 1542 L: 1394 D: 49552 Elo +0.98
Ptnml(0-2): 18, 1253, 23552, 1405, 16
https://tests.stockfishchess.org/tests/view/611e49c34977aa1525c9caa7

STC:
LLR: 2.94 (-2.94,2.94) <-0.50,2.50>
Total: 76216 W: 6000 L: 5784 D: 64432 Elo +0.98
Ptnml(0-2): 204, 4745, 28006, 4937, 216
https://tests.stockfishchess.org/tests/view/611e0e254977aa1525c9ca89

closes https://github.com/official-stockfish/Stockfish/pull/3666

Bench: 5046381
see source
Windows x64 for Haswell CPUs
Windows x64 for modern computers + AVX2
Windows x64 for modern computers
Windows x64 + SSSE3
Windows x64
Windows 32
Linux x64 for Haswell CPUs
Linux x64 for modern computers + AVX2
Linux x64 for modern computers
Linux x64 + SSSE3
Linux x64
Author: bmc4
Date: Sun Aug 22 09:00:15 2021 +0200
Timestamp: 1629615615

Simplify Null Move Search Reduction

slightly simpler formula for reduction computation.

first round of tests:
STC:
LLR: 2.97 (-2.94,2.94) <-2.50,0.50>
Total: 15632 W: 1319 L: 1204 D: 13109 Elo +2.56
Ptnml(0-2): 33, 956, 5733, 1051, 43
https://tests.stockfishchess.org/tests/view/60bd03c7457376eb8bcaa600

LTC:
LLR: 3.37 (-2.94,2.94) <-2.50,0.50>
Total: 86296 W: 2814 L: 2779 D: 80703 Elo +0.14
Ptnml(0-2): 33, 2500, 38039, 2551, 25
https://tests.stockfishchess.org/tests/view/60bd1ff0457376eb8bcaa653

recent tests:
STC:
LLR: 2.93 (-2.94,2.94) <-2.50,0.50>
Total: 23936 W: 1895 L: 1793 D: 20248 Elo +1.48
Ptnml(0-2): 40, 1470, 8869, 1526, 63
https://tests.stockfishchess.org/tests/view/611f9b7d4977aa1525c9cb6b

LTC:
LLR: 2.95 (-2.94,2.94) <-2.50,0.50>
Total: 62568 W: 1750 L: 1713 D: 59105 Elo +0.21
Ptnml(0-2): 19, 1560, 28085, 1605, 15
https://tests.stockfishchess.org/tests/view/611fa4814977aa1525c9cb71

functional on high depth

closes https://github.com/official-stockfish/Stockfish/pull/3535

Bench: 5375286
see source
Windows x64 for Haswell CPUs
Windows x64 for modern computers + AVX2
Windows x64 for modern computers
Windows x64 + SSSE3
Windows x64
Windows 32
Linux x64 for Haswell CPUs
Linux x64 for modern computers + AVX2
Linux x64 for modern computers
Linux x64 + SSSE3
Linux x64
Author: Tomasz Sobczyk
Date: Fri Aug 20 08:50:25 2021 +0200
Timestamp: 1629442225

Optimize and tidy up affine transform code.

The new network caused some issues initially due to the very narrow neuron set between the first two FC layers. Necessary changes were hacked together to make it work. This patch is a mature approach to make the affine transform code faster, more readable, and easier to maintain should the layer sizes change again.

The following changes were made:

* ClippedReLU always produces a multiple of 32 outputs. This is about as good of a solution for AffineTransform's SIMD requirements as it can get without a bigger rewrite.

* All self-contained simd helpers are moved to a separate file (simd.h). Inline asm is utilized to work around GCC's issues with code generation and register assignment. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101693, https://godbolt.org/z/da76fY1n7

* AffineTransform has 2 specializations. While it's more lines of code due to the boilerplate, the logic in both is significantly reduced, as these two are impossible to nicely combine into one.
1) The first specialization is for cases when there's >=128 inputs. It uses a different approach to perform the affine transform and can make full use of AVX512 without any edge cases. Furthermore, it has higher theoretical throughput because less loads are needed in the hot path, requiring only a fixed amount of instructions for horizontal additions at the end, which are amortized by the large number of inputs.
2) The second specialization is made to handle smaller layers where performance is still necessary but edge cases need to be handled. AVX512 implementation for this was ommited by mistake, a remnant from the temporary implementation for the new... This could be easily reintroduced if needed. A slightly more detailed description of both implementations is in the code.

Overall it should be a minor speedup, as shown on fishtest:

passed STC:
LLR: 2.96 (-2.94,2.94) <-0.50,2.50>
Total: 51520 W: 4074 L: 3888 D: 43558 Elo +1.25
Ptnml(0-2): 111, 3136, 19097, 3288, 128

and various tests shown in the pull request

closes https://github.com/official-stockfish/Stockfish/pull/3663

No functional change
see source
Windows x64 for Haswell CPUs
Windows x64 for modern computers + AVX2
Windows x64 for modern computers
Windows x64 + SSSE3
Windows x64
Windows 32
Linux x64 for Haswell CPUs
Linux x64 for modern computers + AVX2
Linux x64 for modern computers
Linux x64 + SSSE3
Linux x64
Author: Tomasz Sobczyk
Date: Fri Aug 20 07:57:09 2021 +0200
Timestamp: 1629439029

Improve handling of the debug log file.

Fix handling of empty strings in uci options and reassigning of the log file

Fixes https://github.com/official-stockfish/Stockfish/issues/3650

Closes https://github.com/official-stockfish/Stockfish/pull/3655

No functional change
see source
Windows x64 for Haswell CPUs
Windows x64 for modern computers + AVX2
Windows x64 for modern computers
Windows x64 + SSSE3
Windows x64
Windows 32
Linux x64 for Haswell CPUs
Linux x64 for modern computers + AVX2
Linux x64 for modern computers
Linux x64 + SSSE3
Linux x64
Author: Torsten Hellwig
Date: Wed Aug 18 09:17:22 2021 +0200
Timestamp: 1629271042

Update default net to nn-ac5605a608d6.nnue

This net was created with the nnue-pytorch trainer, it used the previous master net as a starting point.

The training data includes all T60 data (https://drive.google.com/drive/folders/1rzZkgIgw7G5vQMLr2hZNiUXOp7z80613), all T74 data (https://drive.google.com/drive/folders/1aFUv3Ih3-A8Vxw9064Kw_FU4sNhMHZU-) and the wrongNNUE_02_d9.binpack (https://drive.google.com/file/d/1seGNOqcVdvK_vPNq98j-zV3XPE5zWAeq). The Leela data were randomly named and then concatenated. All data was merged into one binpack using interleave_binpacks.py.

python3 train.py \
../data/t60_t74_wrong.binpack \
../data/t60_t74_wrong.binpack \
--resume-from-model ../data/nn-e8321e467bf6.pt \
--gpus 1 \
--threads 4 \
--num-workers 1 \
--batch-size 16384 \
--progress_bar_refresh_rate 300 \
--random-fen-skipping 3 \
--features=HalfKAv2_hm^ \
--lambda=1.0 \
--max_epochs=600 \
--seed $RANDOM \
--default_root_dir ../output/exp_24

STC:
LLR: 2.95 (-2.94,2.94) <-0.50,2.50>
Total: 15320 W: 1415 L: 1257 D: 12648 Elo +3.58
Ptnml(0-2): 50, 1002, 5402, 1152, 54
https://tests.stockfishchess.org/tests/view/611c404a4977aa1525c9c97f

LTC:
LLR: 2.94 (-2.94,2.94) <0.50,3.50>
Total: 9440 W: 345 L: 248 D: 8847 Elo +3.57
Ptnml(0-2): 3, 222, 4175, 315, 5
https://tests.stockfishchess.org/tests/view/611c6c7d4977aa1525c9c996

LTC with UHO_XXL_+0.90_+1.19.epd:
LLR: 2.94 (-2.94,2.94) <0.50,3.50>
Total: 6232 W: 1638 L: 1459 D: 3135 Elo +9.98
Ptnml(0-2): 5, 592, 1744, 769, 6
https://tests.stockfishchess.org/tests/view/611c9b214977aa1525c9c9cb

closes https://github.com/official-stockfish/Stockfish/pull/3664

Bench: 5375286
see source
Windows x64 for Haswell CPUs
Windows x64 for modern computers + AVX2
Windows x64 for modern computers
Windows x64 + SSSE3
Windows x64
Windows 32
Linux x64 for Haswell CPUs
Linux x64 for modern computers + AVX2
Linux x64 for modern computers
Linux x64 + SSSE3
Linux x64
Author: Joost VandeVondele
Date: Tue Aug 17 21:08:34 2021 +0200
Timestamp: 1629227314

Regenerate dependencies on code change

fixes https://github.com/official-stockfish/Stockfish/issues/3658

dependencies are now regenerated for each code change, this adds some 1s overhead in compile time, but avoids potential miscompilations or build problems.

closes https://github.com/official-stockfish/Stockfish/pull/3659

No functional change
see source
Windows x64 for Haswell CPUs
Windows x64 for modern computers + AVX2
Windows x64 for modern computers
Windows x64 + SSSE3
Windows x64
Windows 32
Linux x64 for Haswell CPUs
Linux x64 for modern computers + AVX2
Linux x64 for modern computers
Linux x64 + SSSE3
Linux x64
Author: Tomasz Sobczyk
Date: Sun Aug 15 12:05:43 2021 +0200
Timestamp: 1629021943

New NNUE architecture and net

Introduces a new NNUE network architecture and associated network parameters

The summary of the changes:

* Position for each perspective mirrored such that the king is on e..h files. Cuts the feature transformer size in half, while preserving enough knowledge to be good. See https://docs.google.com/document/d/1gTlrr02qSNKiXNZ_SuO4-RjK4MXBiFlLE6jvNqqMkAY/edit#heading=h.b40q4rb1w7on.
* The number of neurons after the feature transformer increased two-fold, to 1024x2. This is possibly mostly due to the now very optimized feature transformer update code.
* The number of neurons after the second layer is reduced from 16 to 8, to reduce the speed impact. This, perhaps surprisingly, doesn't harm the strength much. See https://docs.google.com/document/d/1gTlrr02qSNKiXNZ_SuO4-RjK4MXBiFlLE6jvNqqMkAY/edit#heading=h.6qkocr97fezq

The AffineTransform code did not work out-of-the box with the smaller number of neurons after the second layer, so some temporary changes have been made to add a special case for InputDimensions == 8. Also additional 0 padding is added to the output for some archs that cannot process inputs by <=8 (SSE2, NEON). VNNI uses an implementation that can keep all outputs in the registers while reducing the number of loads by 3 for each 16 inputs, thanks to the reduced number of output neurons. However GCC is particularily bad at optimization here (and perhaps why the current way the affine transform is done even passed sprt) (see https://docs.google.com/document/d/1gTlrr02qSNKiXNZ_SuO4-RjK4MXBiFlLE6jvNqqMkAY/edit# for details) and more work will be done on this in the following days. I expect the current VNNI implementation to be improved and extended to other architectures.

The network was trained with a slightly modified version of the pytorch trainer (https://github.com/glinscott/nnue-pytorch); the changes are in https://github.com/glinscott/nnue-pytorch/pull/143

The training utilized 2 datasets.

dataset A - https://drive.google.com/file/d/1VlhnHL8f-20AXhGkILujnNXHwy9T-MQw/view?usp=sharing
dataset B - as described in https://github.com/official-stockfish/Stockfish/commit/ba01f4b95448bcb324755f4dd2a632a57c6e67bc

The training process was as following:

train on dataset A for 350 epochs, take the best net in terms of elo at 20k nodes per move (it's fine to take anything from later stages of training).
convert the .ckpt to .pt
--resume-from-model from the .pt file, train on dataset B for <600 epochs, take the best net. Lambda=0.8, applied before the loss function.

The first training command:

python3 train.py \
../nnue-pytorch-training/data/large_gensfen_multipvdiff_100_d9.binpack \
../nnue-pytorch-training/data/large_gensfen_multipvdiff_100_d9.binpack \
--gpus "$3," \
--threads 1 \
--num-workers 1 \
--batch-size 16384 \
--progress_bar_refresh_rate 20 \
--smart-fen-skipping \
--random-fen-skipping 3 \
--features=HalfKAv2_hm^ \
--lambda=1.0 \
--max_epochs=600 \
--default_root_dir ../nnue-pytorch-training/experiment_$1/run_$2

The second training command:

python3 serialize.py \
--features=HalfKAv2_hm^ \
../nnue-pytorch-training/experiment_131/run_6/default/version_0/checkpoints/epoch-499.ckpt \
../nnue-pytorch-training/experiment_$1/base/base.pt

python3 train.py \
../nnue-pytorch-training/data/michael_commit_b94a65.binpack \
../nnue-pytorch-training/data/michael_commit_b94a65.binpack \
--gpus "$3," \
--threads 1 \
--num-workers 1 \
--batch-size 16384 \
--progress_bar_refresh_rate 20 \
--smart-fen-skipping \
--random-fen-skipping 3 \
--features=HalfKAv2_hm^ \
--lambda=0.8 \
--max_epochs=600 \
--resume-from-model ../nnue-pytorch-training/experiment_$1/base/base.pt \
--default_root_dir ../nnue-pytorch-training/experiment_$1/run_$2

STC: https://tests.stockfishchess.org/tests/view/611120b32a8a49ac5be798c4

LLR: 2.97 (-2.94,2.94) <-0.50,2.50>
Total: 22480 W: 2434 L: 2251 D: 17795 Elo +2.83
Ptnml(0-2): 101, 1736, 7410, 1865, 128

LTC: https://tests.stockfishchess.org/tests/view/611152b32a8a49ac5be798ea

LLR: 2.93 (-2.94,2.94) <0.50,3.50>
Total: 9776 W: 442 L: 333 D: 9001 Elo +3.87
Ptnml(0-2): 5, 295, 4180, 402, 6

closes https://github.com/official-stockfish/Stockfish/pull/3646

bench: 5189338
see source
Windows x64 for Haswell CPUs
Windows x64 for modern computers + AVX2
Windows x64 for modern computers
Windows x64 + SSSE3
Windows x64
Windows 32
Linux x64 for Haswell CPUs
Linux x64 for modern computers + AVX2
Linux x64 for modern computers
Linux x64 + SSSE3
Linux x64
Author: Joost VandeVondele
Date: Thu Aug 5 16:41:07 2021 +0200
Timestamp: 1628174467

Revert futility pruning patches

reverts 09b6d28391cf582d99897360b225bcbbe38dd1c6 and
dbd7f602d3c7622df294f87d7239b5aaf31f695f that significantly impact mate
finding capabilities. For example on ChestUCI_23102018.epd, at 1M nodes,
the number of mates found is nearly reduced 2x without these depth conditions:

sf6 2091
sf7 2093
sf8 2107
sf9 2062
sf10 2208
sf11 2552
sf12 2563
sf13 2509
sf14 2427
master 1246
patched 2467

(script for testing at https://github.com/official-stockfish/Stockfish/files/6936412/matecheck.zip)

closes https://github.com/official-stockfish/Stockfish/pull/3641

fixes https://github.com/official-stockfish/Stockfish/issues/3627

Bench: 5467570
see source
Windows x64 for Haswell CPUs
Windows x64 for modern computers + AVX2
Windows x64 for modern computers
Windows x64 + SSSE3
Windows x64
Windows 32
Linux x64 for Haswell CPUs
Linux x64 for modern computers + AVX2
Linux x64 for modern computers
Linux x64 + SSSE3
Linux x64
Author: VoyagerOne
Date: Thu Aug 5 16:32:07 2021 +0200
Timestamp: 1628173927

SEE simplification

Simplified SEE formula by removing std::min. Should also be easier to tune.

STC:
LLR: 2.95 (-2.94,2.94) <-2.50,0.50>
Total: 22656 W: 1836 L: 1729 D: 19091 Elo +1.64
Ptnml(0-2): 54, 1426, 8267, 1521, 60
https://tests.stockfishchess.org/tests/view/610ae62f2a8a49ac5be79449

LTC:
LLR: 2.93 (-2.94,2.94) <-2.50,0.50>
Total: 26248 W: 806 L: 744 D: 24698 Elo +0.82
Ptnml(0-2): 6, 668, 11715, 728, 7
https://tests.stockfishchess.org/tests/view/610b17ad2a8a49ac5be79466

closes https://github.com/official-stockfish/Stockfish/pull/3643

bench: 4915145
see source
Windows x64 for Haswell CPUs
Windows x64 for modern computers + AVX2
Windows x64 for modern computers
Windows x64 + SSSE3
Windows x64
Windows 32
Linux x64 for Haswell CPUs
Linux x64 for modern computers + AVX2
Linux x64 for modern computers
Linux x64 + SSSE3
Linux x64
Author: SFisGOD
Date: Thu Aug 5 08:52:07 2021 +0200
Timestamp: 1628146327

Update default net to nn-46832cfbead3.nnue

SPSA 1: https://tests.stockfishchess.org/tests/view/6100e7f096b86d98abf6a832
Parameters: A total of 256 net weights and 8 net biases were tuned (output layer)
Base net: nn-56a5f1c4173a.nnue
New net: nn-ec3c8e029926.nnue

SPSA 2: https://tests.stockfishchess.org/tests/view/610733caafad2da4f4ae3da7
Parameters: A total of 256 net biases were tuned (hidden layer 2)
Base net: nn-ec3c8e029926.nnue
New net: nn-46832cfbead3.nnue

STC:
LLR: 2.98 (-2.94,2.94) <-0.50,2.50>
Total: 50520 W: 3953 L: 3765 D: 42802 Elo +1.29
Ptnml(0-2): 138, 3063, 18678, 3235, 146
https://tests.stockfishchess.org/tests/view/610a79692a8a49ac5be793f4

LTC:
LLR: 2.94 (-2.94,2.94) <0.50,3.50>
Total: 57256 W: 1723 L: 1566 D: 53967 Elo +0.95
Ptnml(0-2): 12, 1442, 25568, 1589, 17
https://tests.stockfishchess.org/tests/view/610ac5bb2a8a49ac5be79434

Closes https://github.com/official-stockfish/Stockfish/pull/3642

Bench: 5359314
see source
Windows x64 for Haswell CPUs
Windows x64 for modern computers + AVX2
Windows x64 for modern computers
Windows x64 + SSSE3
Windows x64
Windows 32
Linux x64 for Haswell CPUs
Linux x64 for modern computers + AVX2
Linux x64 for modern computers
Linux x64 + SSSE3
Linux x64
Author: Stefan Geschwentner
Date: Thu Aug 5 08:47:33 2021 +0200
Timestamp: 1628146053

Simplify new cmh pruning thresholds by using directly a quadratic formula.

This decouples also the stat bonus updates from the threshold which creates less dependencies for tuning of stat bonus parameters.
Perhaps a further fine tuning of the now separated coefficients for constHist[0] and constHist[1] could give further gains.

STC:
LLR: 2.93 (-2.94,2.94) <-2.50,0.50>
Total: 78384 W: 6134 L: 6090 D: 66160 Elo +0.20
Ptnml(0-2): 207, 5013, 28705, 5063, 204
https://tests.stockfishchess.org/tests/view/6106d235afad2da4f4ae3d4b

LTC:
LLR: 2.93 (-2.94,2.94) <-2.50,0.50>
Total: 38176 W: 1149 L: 1095 D: 35932 Elo +0.49
Ptnml(0-2): 6, 1000, 17030, 1038, 14
https://tests.stockfishchess.org/tests/view/6107a080afad2da4f4ae3def

closes https://github.com/official-stockfish/Stockfish/pull/3639

Bench: 5098146
see source
Windows x64 for Haswell CPUs
Windows x64 for modern computers + AVX2
Windows x64 for modern computers
Windows x64 + SSSE3
Windows x64
Windows 32
Linux x64 for Haswell CPUs
Linux x64 for modern computers + AVX2
Linux x64 for modern computers
Linux x64 + SSSE3
Linux x64
Author: VoyagerOne
Date: Thu Aug 5 08:44:38 2021 +0200
Timestamp: 1628145878

Futile pruning simplification

Remove CMH conditions in futile pruning.

STC:
LLR: 2.94 (-2.94,2.94) <-2.50,0.50>
Total: 93520 W: 7165 L: 7138 D: 79217 Elo +0.10
Ptnml(0-2): 222, 5923, 34427, 5982, 206
https://tests.stockfishchess.org/tests/view/61083104e50a153c346ef8df

LTC:
LLR: 2.93 (-2.94,2.94) <-2.50,0.50>
Total: 59072 W: 1746 L: 1706 D: 55620 Elo +0.24
Ptnml(0-2): 13, 1562, 26353, 1588, 20
https://tests.stockfishchess.org/tests/view/610894f2e50a153c346ef913

closes https://github.com/official-stockfish/Stockfish/pull/3638

Bench: 5229673
see source
Windows x64 for Haswell CPUs
Windows x64 for modern computers + AVX2
Windows x64 for modern computers
Windows x64 + SSSE3
Windows x64
Windows 32
Linux x64 for Haswell CPUs
Linux x64 for modern computers + AVX2
Linux x64 for modern computers
Linux x64 + SSSE3
Linux x64
Author: VoyagerOne
Date: Sat Jul 31 15:29:19 2021 +0200
Timestamp: 1627738159

CMH Pruning Tweak

replace CounterMovePruneThreshold by a depth dependent threshold

STC:
LLR: 2.94 (-2.94,2.94) <-0.50,2.50>
Total: 35512 W: 2718 L: 2552 D: 30242 Elo +1.62
Ptnml(0-2): 66, 2138, 13194, 2280, 78
https://tests.stockfishchess.org/tests/view/6104442fafad2da4f4ae3b94

LTC:
LLR: 2.96 (-2.94,2.94) <0.50,3.50>
Total: 36536 W: 1150 L: 1019 D: 34367 Elo +1.25
Ptnml(0-2): 10, 920, 16278, 1049, 11
https://tests.stockfishchess.org/tests/view/6104b033afad2da4f4ae3bbc

closes https://github.com/official-stockfish/Stockfish/pull/3636

Bench: 5848718
see source
Windows x64 for Haswell CPUs
Windows x64 for modern computers + AVX2
Windows x64 for modern computers
Windows x64 + SSSE3
Windows x64
Windows 32
Linux x64 for Haswell CPUs
Linux x64 for modern computers + AVX2
Linux x64 for modern computers
Linux x64 + SSSE3
Linux x64
Author: Tomasz Sobczyk
Date: Fri Jul 30 17:15:52 2021 +0200
Timestamp: 1627658152

Avoid unnecessary stores in the affine transform

This patch improves the codegen in the AffineTransform::forward function for architectures >=SSSE3. Current code works directly on memory and the compiler cannot see that the stores through outptr do not alias the loads through weights and input32. The solution implemented is to perform the affine transform with local variables as accumulators and only store the result to memory at the end. The number of accumulators required is OutputDimensions / OutputSimdWidth, which means that for the 1024->16 affine transform it requires 4 registers with SSSE3, 2 with AVX2, 1 with AVX512. It also cuts the number of stores required by NumRegs * 256 for each node evaluated. The local accumulators are expected to be assigned to registers, but even if this cannot be done in some case due to register pressure it will help the compiler to see that there is no aliasing between the loads and stores and may still result in better codegen.

See https://godbolt.org/z/59aTKbbYc for codegen comparison.

passed STC:
LLR: 2.94 (-2.94,2.94) <-0.50,2.50>
Total: 140328 W: 10635 L: 10358 D: 119335 Elo +0.69
Ptnml(0-2): 302, 8339, 52636, 8554, 333

closes https://github.com/official-stockfish/Stockfish/pull/3634

No functional change
see source
Windows x64 for Haswell CPUs
Windows x64 for modern computers + AVX2
Windows x64 for modern computers
Windows x64 + SSSE3
Windows x64
Windows 32
Linux x64 for Haswell CPUs
Linux x64 for modern computers + AVX2
Linux x64 for modern computers
Linux x64 + SSSE3
Linux x64
Author: SFisGOD
Date: Thu Jul 29 07:35:13 2021 +0200
Timestamp: 1627536913

Update default net to nn-56a5f1c4173a.nnue

SPSA 1: https://tests.stockfishchess.org/tests/view/60fd24efd8a6b65b2f3a796e
Parameters: A total of 256 net biases were tuned (hidden layer 2)
New best values: Half of the changes from the tuning run
New net: nn-5992d3ba79f3.nnue

SPSA 2: https://tests.stockfishchess.org/tests/view/60fec7d6d8a6b65b2f3a7aa2
Parameters: A total of 128 net biases were tuned (hidden layer 1)
New best values: Half of the changes from the tuning run
New net: nn-56a5f1c4173a.nnue

STC:
LLR: 2.94 (-2.94,2.94) <-0.50,2.50>
Total: 140392 W: 10863 L: 10578 D: 118951 Elo +0.71
Ptnml(0-2): 347, 8754, 51718, 9021, 356
https://tests.stockfishchess.org/tests/view/610037e396b86d98abf6a79e

LTC:
LLR: 2.95 (-2.94,2.94) <0.50,3.50>
Total: 14216 W: 454 L: 355 D: 13407 Elo +2.42
Ptnml(0-2): 4, 323, 6356, 420, 5
https://tests.stockfishchess.org/tests/view/61019995afad2da4f4ae3a3c

Closes #3633

Bench: 4801359
see source
Windows x64 for Haswell CPUs
Windows x64 for modern computers + AVX2
Windows x64 for modern computers
Windows x64 + SSSE3
Windows x64
Windows 32
Linux x64 for Haswell CPUs
Linux x64 for modern computers + AVX2
Linux x64 for modern computers
Linux x64 + SSSE3
Linux x64
Author: SFisGOD
Date: Mon Jul 26 07:52:59 2021 +0200
Timestamp: 1627278779

Update default net to nn-26abeed38351.nnue

SPSA: https://tests.stockfishchess.org/tests/view/60fba335d8a6b65b2f3a7891

New best values: Half of the changes from the tuning run.
Setting: nodestime=300 with 10+0.1 (approximate real TC is 2.5 seconds)
The rest is the same as described in #3593

The change from nodestime=600 to 300 was suggested by gekkehenker to prevent time losses for some slow workers
SFisGOD@94cd757#commitcomment-53324840

STC:
LLR: 2.96 (-2.94,2.94) <-0.50,2.50>
Total: 67448 W: 5241 L: 5036 D: 57171 Elo +1.06
Ptnml(0-2): 151, 4198, 24827, 4391, 157
https://tests.stockfishchess.org/tests/view/60fd50f2d8a6b65b2f3a798e

LTC:
LLR: 2.93 (-2.94,2.94) <0.50,3.50>
Total: 48752 W: 1504 L: 1358 D: 45890 Elo +1.04
Ptnml(0-2): 13, 1226, 21754, 1368, 15
https://tests.stockfishchess.org/tests/view/60fd7bb2d8a6b65b2f3a79a9

Closes https://github.com/official-stockfish/Stockfish/pull/3630

Bench: 5124774
see source
Windows x64 for Haswell CPUs
Windows x64 for modern computers + AVX2
Windows x64 for modern computers
Windows x64 + SSSE3
Windows x64
Windows 32
Linux x64 for Haswell CPUs
Linux x64 for modern computers + AVX2
Linux x64 for modern computers
Linux x64 + SSSE3
Linux x64
Author: Giacomo Lorenzetti
Date: Mon Jul 26 07:48:58 2021 +0200
Timestamp: 1627278538

Simplification in LMR

This commit removes the `!captureOrPromotion` condition from ttCapture reduction and from good/bad history reduction (similar to #3619).

passed STC:
https://tests.stockfishchess.org/tests/view/60fc734ad8a6b65b2f3a7922
LLR: 2.97 (-2.94,2.94) <-2.50,0.50>
Total: 48680 W: 3855 L: 3776 D: 41049 Elo +0.56
Ptnml(0-2): 118, 3145, 17744, 3206, 127

passed LTC:
https://tests.stockfishchess.org/tests/view/60fce7d5d8a6b65b2f3a794c
LLR: 2.93 (-2.94,2.94) <-2.50,0.50>
Total: 86528 W: 2471 L: 2450 D: 81607 Elo +0.08
Ptnml(0-2): 28, 2203, 38777, 2232, 24

closes https://github.com/official-stockfish/Stockfish/pull/3629

Bench: 4951406
see source
Windows x64 for Haswell CPUs
Windows x64 for modern computers + AVX2
Windows x64 for modern computers
Windows x64 + SSSE3
Windows x64
Windows 32
Linux x64 for Haswell CPUs
Linux x64 for modern computers + AVX2
Linux x64 for modern computers
Linux x64 + SSSE3
Linux x64
Author: MichaelB7
Date: Sat Jul 24 18:04:59 2021 +0200
Timestamp: 1627142699

Update the default net to nn-76a8a7ffb820.nnue.

combined work by Serio Vieri, Michael Byrne, and Jonathan D (aka SFisGod) based on top of previous developments, by restarts from good nets.

Sergio generated the net https://tests.stockfishchess.org/api/nn/nn-d8609abe8caf.nnue:

The initial net nn-d8609abe8caf.nnue is trained by generating around 16B of training data from the last master net nn-9e3c6298299a.nnue, then trained, continuing from the master net, with lambda=0.2 and sampling ratio of 1. Starting with LR=2e-3, dropping LR with a factor of 0.5 until it reaches LR=5e-4. in_scaling is set to 361. No other significant changes made to the pytorch trainer.

Training data gen command (generates in chunks of 200k positions):

generate_training_data min_depth 9 max_depth 11 count 200000 random_move_count 10 random_move_max_ply 80 random_multi_pv 12 random_multi_pv_diff 100 random_multi_pv_depth 8 write_min_ply 10 eval_limit 1500 book noob_3moves.epd output_file_name gendata/$(date +"%Y%m%d-%H%M")_${HOSTNAME}.binpack

PyTorch trainer command (Note that this only trains for 20 epochs, repeatedly train until convergence):

python train.py --features "HalfKAv2^" --max_epochs 20 --smart-fen-skipping --random-fen-skipping 500 --batch-size 8192 --default_root_dir $dir --seed $RANDOM --threads 4 --num-workers 32 --gpus $gpuids --track_grad_norm 2 --gradient_clip_val 0.05 --lambda 0.2 --log_every_n_steps 50 $resumeopt $data $val

See https://github.com/sergiovieri/Stockfish/tree/tools_mod/rl for the scripts used to generate data.

Based on that Michael generated nn-76a8a7ffb820.nnue in the following way:

The net being submitted was trained with the pytorch trainer: https://github.com/glinscott/nnue-pytorch

python train.py i:/bin/all.binpack i:/bin/all.binpack --gpus 1 --threads 4 --num-workers 30 --batch-size 16384 --progress_bar_refresh_rate 30 --smart-fen-skipping --random-fen-skipping 3 --features=HalfKAv2^ --auto_lr_find True --lambda=1.0 --max_epochs=240 --seed %random%%random% --default_root_dir exp/run_109 --resume-from-model ./pt/nn-d8609abe8caf.pt

This run is thus started from Segio Vieri's net nn-d8609abe8caf.nnue

all.binpack equaled 4 parts Wrong_NNUE_2.binpack https://drive.google.com/file/d/1seGNOqcVdvK_vPNq98j-zV3XPE5zWAeq/view?usp=sharing plus two parts of Training_Data.binpack https://drive.google.com/file/d/1RFkQES3DpsiJqsOtUshENtzPfFgUmEff/view?usp=sharing
Each set was concatenated together - making one large Wrong_NNUE 2 binpack and one large Training so the were approximately equal in size. They were then interleaved together. The idea was to give Wrong_NNUE.binpack closer to equal weighting with the Training_Data binpack

model.py modifications:
loss = torch.pow(torch.abs(p - q), 2.6).mean()
LR = 8.0e-5 calculated as follows: 1.5e-3*(.992^360) - the idea here was to take a highly trained net and just use all.binpack as a finishing micro refinement touch for the last 2 Elo or so. This net was discovered on the 59th epoch.
optimizer = ranger.Ranger(train_params, betas=(.90, 0.999), eps=1.0e-7, gc_loc=False, use_gc=False)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.992)
For this micro optimization, I had set the period to "5" in train.py. This changes the checkpoint output so that every 5th checkpoint file is created

The final touches were to adjust the NNUE scale, as was done by Jonathan in tests running at the same time.

passed LTC
https://tests.stockfishchess.org/tests/view/60fa45aed8a6b65b2f3a77a4
LLR: 2.94 (-2.94,2.94) <0.50,3.50>
Total: 53040 W: 1732 L: 1575 D: 49733 Elo +1.03
Ptnml(0-2): 14, 1432, 23474, 1583, 17

passed STC
https://tests.stockfishchess.org/tests/view/60f9fee2d8a6b65b2f3a7775
LLR: 2.94 (-2.94,2.94) <-0.50,2.50>
Total: 37928 W: 3178 L: 3001 D: 31749 Elo +1.62
Ptnml(0-2): 100, 2446, 13695, 2623, 100.

closes https://github.com/official-stockfish/Stockfish/pull/3626

Bench: 5169957
see source
Windows x64 for Haswell CPUs
Windows x64 for modern computers + AVX2
Windows x64 for modern computers
Windows x64 + SSSE3
Windows x64
Windows 32
Linux x64 for Haswell CPUs
Linux x64 for modern computers + AVX2
Linux x64 for modern computers
Linux x64 + SSSE3
Linux x64
Author: Giacomo Lorenzetti
Date: Fri Jul 23 19:02:58 2021 +0200
Timestamp: 1627059778

Apply good/bad history reduction also when inCheck

Main idea is that, in some cases, 'in check' situations are not so different from 'not in check' ones.
Trying to use piece count in order to select only a few 'in check' situations have failed LTC testing.
It could be interesting to apply one of those ideas in other parts of the search function.

passed STC:
https://tests.stockfishchess.org/tests/view/60f1b68dd1189bed71812d40
LLR: 2.93 (-2.94,2.94) <-2.50,0.50>
Total: 53472 W: 4078 L: 4008 D: 45386 Elo +0.45
Ptnml(0-2): 127, 3297, 19795, 3413, 104

passed LTC:
https://tests.stockfishchess.org/tests/view/60f291e6d1189bed71812de3
LLR: 2.92 (-2.94,2.94) <-2.50,0.50>
Total: 89712 W: 2651 L: 2632 D: 84429 Elo +0.07
Ptnml(0-2): 60, 2261, 40188, 2294, 53

closes https://github.com/official-stockfish/Stockfish/pull/3619

Bench: 5185789
see source
Windows x64 for Haswell CPUs
Windows x64 for modern computers + AVX2
Windows x64 for modern computers
Windows x64 + SSSE3
Windows x64
Windows 32
Linux x64 for Haswell CPUs
Linux x64 for modern computers + AVX2
Linux x64 for modern computers
Linux x64 + SSSE3
Linux x64
Author: pb00067
Date: Fri Jul 23 18:53:03 2021 +0200
Timestamp: 1627059183

Simplify lowply-history scoring logic

STC:
https://tests.stockfishchess.org/tests/view/60eee559d1189bed71812b16
LLR: 2.97 (-2.94,2.94) <-2.50,0.50>
Total: 33976 W: 2523 L: 2431 D: 29022 Elo +0.94
Ptnml(0-2): 66, 2030, 12730, 2070, 92

LTC:
https://tests.stockfishchess.org/tests/view/60eefa12d1189bed71812b24
LLR: 2.93 (-2.94,2.94) <-2.50,0.50>
Total: 107240 W: 3053 L: 3046 D: 101141 Elo +0.02
Ptnml(0-2): 56, 2668, 48154, 2697, 45

closes https://github.com/official-stockfish/Stockfish/pull/3616

bench: 5199177
see source

< prev page next page >