Apple M1 Macs: Not a Mature Solution for General Usage eg Java Virtual Machine
re: Java for Apple M1 Macs
re: SHA-512 Hashing Speed in Java: 2019 Mac Pro, 2019 iMac 5K, 2020 iMac 5K, 2021 MacBook Pro M1 Max
Those using email and a web browser and a few apps... all good, so it seems.
But Apple M1 Macs are not a mature solution for some uses.
Take Java, used by IntegrityChecker Java and Zerene Stacker and certain other specialty software. While there is a native Java virtual machine for Apple ARM (M1, M1 Max, M1 Pro) machines, that does not mean it has robust support for optimal performance.
It’s not terrible; IntegrityChecker Java can still do 2.6GB/sec on an 8+2 core Apple MacBook Pro M1 Max. But that compares to 3.3GB/sec on my 2019 iMac 5K 8-core.
Reader Adam S did some nice research on just how much native-code support there is in the ARM Java for M1 Macs.
Poor Java native-code support for math libraries for M1 Mac
openjdk % find . -name macroAssembler_arm\*.cpp ./jdk/src/hotspot/cpu/arm/macroAssembler_arm.cpp <=== single source file vs 14 for Intel openjdk % find . -name macroAssembler_x86\*.cpp ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_log10.cpp ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_arrayCopy_avx3.cpp ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_cos.cpp ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_sin.cpp ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_aes.cpp ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_adler.cpp ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_md5.cpp ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_log.cpp ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_exp.cpp ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_tan.cpp ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_pow.cpp ./jdk/src/hotspot/cpu/x86/macroAssembler_x86.cpp ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp
Poor Java native-code support for SHA-512 for M1 Mac
Intel CPUs have strong support for the SHA-512 hash; see Fast SHA512 Implementations ® Architecture Processors.
Feels like the ARM JDK is just immature.
I grepped the OpenJDK source for the ARM SHA-512 update instruction. There are some hits, but I’m not sure what they mean yet.
openjdk % git clone https://git.openjdk.java.net/jdk/ openjdk % find . -type f -exec grep -iH SHA512H "{}" \; ./jdk/test/hotspot/gtest/aarch64/asmtest.out.h: __ sha512h(v14, __ T2D, v3, v25); // sha512h q14, q3, v25.2D ./jdk/test/hotspot/gtest/aarch64/asmtest.out.h: __ sha512h2(v8, __ T2D, v27, v21); // sha512h2 q8, q27, v21.2D ./jdk/test/hotspot/gtest/aarch64/aarch64-asmtest.py:generate(SHA512SIMDOp, ["sha512h", "sha512h2", "sha512su0", "sha512su1"]) ./jdk/src/hotspot/cpu/aarch64/assembler_aarch64.hpp: INSN(sha512h, 0b100000); ./jdk/src/hotspot/cpu/aarch64/assembler_aarch64.hpp: INSN(sha512h2, 0b100001); ./jdk/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp: __ sha512h(v##i3, __ T2D, v6, v7); \ ./jdk/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp: __ sha512h2(v##i3, __ T2D, v##i1, v##i0); \
A wild guess is that they’re testing that you can assemble SHA512H (from C?) and that an SHA512H actually comes out. That’s different than calling the instruction in JDK SHA code, obviously. Intel SHA instructions get a lot more love:
adam@Adams-MacBook-Pro openjdk % find . -type f -exec grep -iH sha512_sse4 "{}" \; adam@Adams-MacBook-Pro openjdk % find . -type f -exec grep -iH sha512_avx "{}" \; ./jdk/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp: __ sha512_AVX2(msg, state0, state1, msgtmp0, msgtmp1, msgtmp2, msgtmp3, msgtmp4, ./jdk/src/hotspot/cpu/x86/stubRoutines_x86.cpp:// used in MacroAssembler::sha512_AVX2 ./jdk/src/hotspot/cpu/x86/macroAssembler_x86.hpp: void sha512_AVX2_one_round_compute(Register old_h, Register a, Register b, Register c, Register d, ./jdk/src/hotspot/cpu/x86/macroAssembler_x86.hpp: void sha512_AVX2_one_round_and_schedule(XMMRegister xmm4, XMMRegister xmm5, XMMRegister xmm6, XMMRegister xmm7, ./jdk/src/hotspot/cpu/x86/macroAssembler_x86.hpp: void sha512_AVX2(XMMRegister msg, XMMRegister state0, XMMRegister state1, XMMRegister msgtmp0, ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp:void MacroAssembler::sha512_AVX2_one_round_compute(Register old_h, Register a, Register b, Register c, ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp:void MacroAssembler::sha512_AVX2_one_round_and_schedule( ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp:void MacroAssembler::sha512_AVX2(XMMRegister msg, XMMRegister state0, XMMRegister state1, XMMRegister msgtmp0, ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp: //Schedule 64 input dwords, by calling sha512_AVX2_one_round_and_schedule ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp: sha512_AVX2_one_round_and_schedule(xmm4, xmm5, xmm6, xmm7, a, b, c, d, e, f, g, h, 0); ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp: sha512_AVX2_one_round_and_schedule(xmm4, xmm5, xmm6, xmm7, h, a, b, c, d, e, f, g, 1); ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp: sha512_AVX2_one_round_and_schedule(xmm4, xmm5, xmm6, xmm7, g, h, a, b, c, d, e, f, 2); ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp: sha512_AVX2_one_round_and_schedule(xmm4, xmm5, xmm6, xmm7, f, g, h, a, b, c, d, e, 3); ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp: sha512_AVX2_one_round_and_schedule(xmm5, xmm6, xmm7, xmm4, e, f, g, h, a, b, c, d, 0); ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp: sha512_AVX2_one_round_and_schedule(xmm5, xmm6, xmm7, xmm4, d, e, f, g, h, a, b, c, 1); ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp: sha512_AVX2_one_round_and_schedule(xmm5, xmm6, xmm7, xmm4, c, d, e, f, g, h, a, b, 2); ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp: sha512_AVX2_one_round_and_schedule(xmm5, xmm6, xmm7, xmm4, b, c, d, e, f, g, h, a, 3); ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp: sha512_AVX2_one_round_and_schedule(xmm6, xmm7, xmm4, xmm5, a, b, c, d, e, f, g, h, 0); ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp: sha512_AVX2_one_round_and_schedule(xmm6, xmm7, xmm4, xmm5, h, a, b, c, d, e, f, g, 1); ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp: sha512_AVX2_one_round_and_schedule(xmm6, xmm7, xmm4, xmm5, g, h, a, b, c, d, e, f, 2); ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp: sha512_AVX2_one_round_and_schedule(xmm6, xmm7, xmm4, xmm5, f, g, h, a, b, c, d, e, 3); ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp: sha512_AVX2_one_round_and_schedule(xmm7, xmm4, xmm5, xmm6, e, f, g, h, a, b, c, d, 0); ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp: sha512_AVX2_one_round_and_schedule(xmm7, xmm4, xmm5, xmm6, d, e, f, g, h, a, b, c, 1); ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp: sha512_AVX2_one_round_and_schedule(xmm7, xmm4, xmm5, xmm6, c, d, e, f, g, h, a, b, 2); ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp: sha512_AVX2_one_round_and_schedule(xmm7, xmm4, xmm5, xmm6, b, c, d, e, f, g, h, a, 3); ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp: sha512_AVX2_one_round_compute(a, a, b, c, d, e, f, g, h, 0); ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp: sha512_AVX2_one_round_compute(h, h, a, b, c, d, e, f, g, 1); ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp: sha512_AVX2_one_round_compute(g, g, h, a, b, c, d, e, f, 2); ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp: sha512_AVX2_one_round_compute(f, f, g, h, a, b, c, d, e, 3); ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp: sha512_AVX2_one_round_compute(e, e, f, g, h, a, b, c, d, 0); ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp: sha512_AVX2_one_round_compute(d, d, e, f, g, h, a, b, c, 1); ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp: sha512_AVX2_one_round_compute(c, c, d, e, f, g, h, a, b, 2); ./jdk/src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp: sha512_AVX2_one_round_compute(b, b, c, d, e, f, g, h, a, 3); adam@Adams-MacBook-Pro openjdk % find . -type f -exec grep -iH sha512_avx2_rorx "{}" \;