SIMD Compile-time vs. run-time

Compile-time
- Use intrinsics and/or autovectorization by telling the compiler which extensions are supported on the target host. The executable can not be used on hosts that do not support the included instructions.
  - Autovectorization will try to use the extensions you tell the compiler you support.
  - For intrinsics if you want multiple SIMD extensions / host machines support, then you can rely on the __AVX512F__ macro definition set by the compiler to manually ensure you only include the correct implementation. E.g., see how Kuffo’s PDX.
    - In Kuffo’s PDX, C++‘s preprocessor is used to include/exclude architecture specific headers. Thus, if you say -march=native and I have AVX512, then the compiler will set the __AVX512F__ definition, which then in the source code with an #ifdef will ensure that the desired implementation header is included (which in turn defines an implementation of some function).
      - In this case the neon_computers.hpp header directly uses the NEON intrinsics.
      - #ifdef
      - `__ARM_NEON #include “pdxearch/distance_computers/neon_computers.hpp”
      - #endif
Run-time
- Dynamic CPU dispatching
  - DIY: The source code can perform dynamic dispatch (e.g., run CPUID then choose one out of a few “backends” (this seems to be what SimSIMD does in its dynamic dispatch mode: you “prove” your capabilities once, after which the dynamic dispatch mechanism is initialized))
  - (Better) Or utilize modern compiler (GCC, Clang) support (e.g., __attribute__((target_clones("avx2", "sse4.2", "default")))) to compile multiple versions of a function and insert the dispatch logic for you automatically.
    - At process load time it will determine the supported extensions and store the correct function pointer into the procedure linkage table (PLT).
      - This is better than doing it at runtime (AKA in the DIY approach above).
    - https://stackoverflow.com/a/61005989
    - This is also known as function multiversioning (FMV) in GCC.
      - LWN - Function multi-versioning in GCC 6.
        
        Great overview of target_clones and target.
      - LLVM Clang
        
        https://stackoverflow.com/questions/39958935/does-clang-offer-anything-similar-to-gcc-6-xs-function-multi-versioning-target
        
        LLVM 7.0: “…function multiversioning in Clang with the ‘target’ attribute for ELF-based x86/x86_64 targets…”
        
        https://lf-rise.atlassian.net/wiki/spaces/HOME/pages/8586554/CT_01_009+-+Target+Attribute+Support+LLVM
        
        attribute(target) is supported in LLVM 18.
        
        target_clones support is in progess https://github.com/llvm/llvm-project/pull/85786
    - It seems there is also a target attribute where you provide the implementation yourself.
- JIT autovectorization can use the widest SIMD instructions available on the host machine if you use specific types (e.g., .NET’s Vector<T>).