Help needed: Stockfish for Haswell

It seems that the special SSE4.2 compile is not faster then the compile for modern computers. I decided to replace the SSE4.2 version by a special version for Haswell which I expact to be measurable faster. Since I dont have access to an Haswell computer I need your help to verify my assumptions.
If you have an Haswell computer, start any of the following executables and type “bench”. After that post the results for each version as reply.

Stockfish Windows x64 for Haswell + profiling
Stockfish Windows x64 for Haswell
Stockfish Windows x64 for modern computers + sse4.2
Stockfish Windows x64 for modern computers

18 thoughts on “Help needed: Stockfish for Haswell

  1. I think your compiler is not making haswell -bmi2 look very good.

    Here are results of running those four progs on Windows 8 on i7-4771:

    Stockfish Windows x64 for Haswell + profiling:
    bench nps: 2125874, 2153784, 2163453, 2173209, 2154386, 2164060

    Stockfish Windows x64 for Haswell:
    bench nps: 2242090, 2263148, 2263148, 2242742, 2232353, 2232353

    Stockfish Windows x64 for modern computers + sse4.2:
    bench nps: 2173209, 2232353, 2242090, 2232353, 2232353, 2232353

    Stockfish Windows x64 for modern computers:
    bench nps: 2183054, 2202384, 2192988, 2212495, 2192365, 2211861

    I compiled on my machine two versions of the stockfish
    5% better on perft 7 while 3.6% better on bench:

    mingw ‘haswell’ compile:
    bench nps: 2321879, 2354503, 2343992, 2343294, 2354503, 2332883
    perft 7 nps: 232683062

    mingw ‘x64 for modern computers’ compile:
    bench nps: 2239992, 2269687, 2259917, 2269687, 2249589, 2269687
    perft 7 nps: 221583710

    • Thanks. I had to use profiling information from a non bmi2 version. Thats probably why the profiling version is relativly slow.

  2. Via Parallels since there is no Mac equivalent bench :

    stockfish_14062318_x64_bmi2_prof

    total time : 4753
    nodes searched : 7710548
    nodes / sec : 1622248

    stockfish_14062318_x64_bmi2

    total time : 4482
    nodes searched : 7710548
    nodes / sec : 1720336

    stockfish_14062318_x64_modern_sse42

    total time : 4589
    nodes searched : 7710548
    nodes / sec : 1680224

    stockfish_14062318_x64_modern

    total time : 4563
    nodes searched : 7710548
    nodes / sec : 1689797

  3. Oops should have added this too :

    Intel(R) Core(TM) i7-4770S CPU @ 3.10GHz (Haswell) 8 CORES 3092 MHz

  4. i4770K – Haswell
    I am able to compile my own BMI2. Your BMIs did not work.

    Stockfish Windows x64 for Haswell + profiling:
    stockfish_14062318_x64_bmi2_prof
    crashed

    Stockfish Windows x64 for Haswell:
    stockfish_14062318_x64_bmi2
    crashed

    Stockfish Windows x64 for modern computers + sse4.2:
    stockfish_14062318_x64_modern_sse42
    Total time (ms): 3,245
    Nodes searched : 7,710,548
    Nodes/second : 2,376,131

    Stockfish Windows x64 for modern computers:
    stockfish_14062318_x64_modern
    Total time (ms): 3,259
    Nodes searched : 7,710,548
    Nodes/second : 2,365,924

    • I can not explain why they dont work for you while on other 4770Ks they work. Strange.

  5. Haswell i5-4670k @ 4.4 GHz

    Haswell + profiling
    total time (ms): 3121
    nodes searched: 7710548
    nodes/second: 2470537

    Haswell
    total time (ms): 2981
    nodes searched: 7710548
    nodes/second: 2586564

    modern + sse4.2
    total time (ms): 3028
    nodes searched: 7710548
    nodes/second: 2546416

    modern
    total time (ms): 3043
    nodes searched: 7710548
    nodes/second: 2533863

  6. Is this from ‘Louis Zulli? If so (or even if it’s no so … ;)) … I have i7- 4770k … on talk chess forum http://talkchess.com/forum/viewtopic.php?t=52545 we have exchanged a few ideas … & I am grateful for that … My haswell system is at your disposal … just tell me how can be of any help … btw here are the results of what has been asked for now :

    a) haswell + profiling : time 2876
    bench 7710548
    nps 2680997

    b) sf for haswell time 3032
    bench 7710548
    nps 2543056

    c)modern + sse4.2 time 2939
    bench 7710548
    nps 2623527

    d) modern time 2955
    bench 7710548
    nps 2609322

    My own latest 210614 compile (bench is different) (default latest with upgain hack i.e. – reversal of MC’s solution to upgain hack also no mate detecting patch + LP & syzygy – no other change – compiled on 4.7.3)

    time 2532
    bench 7558254
    nps 2985092

    • Thanks. I dont know Louis Zulli. Strange that haswell+profile ist the fastest in your test while haswell w/o profile is the slowest. Based on the other results, I would have expected the opposite. Can you confirm that by repeating the test a few times?

      • Hw (bmi2 No profiling)
        Time : 2860
        Bench : 7710548
        n/s : 2695995

        Hw (bmi2 + profiling)
        Time : 3016
        Bench : 7710548
        n/s : 2556547
        (This is average of 10)
        Seems in my earlier feedback I have (inadvertently) typed opposite names (profiled vis-à-vis non profiled) … please accept my apologies ….

  7. i5-4200M, 8 GB RAM.

    BMI2 ~ 1835k
    BMI2 + Profiling ~ 1740k
    Modern ~ 1795k
    Modern + SSE4.2 ~ 1793k

  8. Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz
    8.00 GB DDR3

    Stockfish Windows x64 for Haswell + profiling
    bench 7710548
    nps 1328259

    Stockfish Windows x64 for Haswell
    bench 7710548
    nps 1448534

    Stockfish Windows x64 for modern computers + sse4.2
    bench 7710548
    nps 1421299

    Stockfish Windows x64 for modern computers
    bench 7710548
    nps 1421037

    test13b by Brice Allenbrand
    16 bench runs (mean and std deviation calculated)
    01 stockfish_14062318_x64_bmi2.exe 3954 +/-28 Mnps
    02 stockfish_14062318_x64_modern.exe 3954 +/-33 Mnps
    03 stockfish_14062318_x64_modern_sse42.exe 3932 +/-35 Mnps
    04 stockfish_14062318_x64_bmi2_prof.exe 3808 +/-36 Mnps

    It is proved in some forums that GCC 4.7.4 compiles are LOT faster than GCC 4.8x and 4.9.x compiles for most CPU architectures.

    • Where can one get 4.7.4 (mingw 64 bit – I am on Windows 8.1 pro) … yes 4.7 are much speedier … I have 4.7.3, 4.8.3, 4.9.0 …. but due to a bug in 4.7.3 can not compile the fastest using make file … I have to use gcc script for work around – deleting .gcda files .. hopefully it’s fixed in 4.7.4 (4.8.2 & above don’t have this problem but as has been pointed out they are around 3% slower) …

  9. The results showed that the haswell compile (without profiling) is 2.5% faster than the normal compile, which is not much but its measurable. So I replaced the SSE4.2 compile which did not show any speedup.

  10. Red748 on July 1, 2014 at 04:44 said:
    Hi Roman
    Have you read the comments page lately?
    Your decision to remove the popular SSE4.2 compile has created a bit of a stir!

    Roman’s request for help at

    http://blog.abrok.eu/help-needed-stockfish-for-haswell/

    mostly shows an improvement with SSE4.2 for SOME of the volunteers.

    At the very least it’s not conclusive enough to drop the SSE4.2 compile.

    Those of us who also saw a benefit from it now must make do with the clearly inferior “modern” (on our systems anyway).

    I think as more people realise it’s gone, they too will mourn it’s loss.

    Roman, if you read this, please reconsider, if you could have a change of heart and restore the much missed SSE4.2 it would be warmly welcomed back and much appreciated.

Comments are closed.