head 1.8; access; symbols pkgsrc-2026Q1:1.8.0.2 pkgsrc-2026Q1-base:1.8 pkgsrc-2025Q4:1.7.0.18 pkgsrc-2025Q4-base:1.7 pkgsrc-2025Q3:1.7.0.16 pkgsrc-2025Q3-base:1.7 pkgsrc-2025Q2:1.7.0.14 pkgsrc-2025Q2-base:1.7 pkgsrc-2025Q1:1.7.0.12 pkgsrc-2025Q1-base:1.7 pkgsrc-2024Q4:1.7.0.10 pkgsrc-2024Q4-base:1.7 pkgsrc-2024Q3:1.7.0.8 pkgsrc-2024Q3-base:1.7 pkgsrc-2024Q2:1.7.0.6 pkgsrc-2024Q2-base:1.7 pkgsrc-2024Q1:1.7.0.4 pkgsrc-2024Q1-base:1.7 pkgsrc-2023Q4:1.7.0.2 pkgsrc-2023Q4-base:1.7 pkgsrc-2023Q3:1.5.0.2 pkgsrc-2023Q3-base:1.5 pkgsrc-2023Q2:1.4.0.18 pkgsrc-2023Q2-base:1.4 pkgsrc-2023Q1:1.4.0.16 pkgsrc-2023Q1-base:1.4 pkgsrc-2022Q4:1.4.0.14 pkgsrc-2022Q4-base:1.4 pkgsrc-2022Q3:1.4.0.12 pkgsrc-2022Q3-base:1.4 pkgsrc-2022Q2:1.4.0.10 pkgsrc-2022Q2-base:1.4 pkgsrc-2022Q1:1.4.0.8 pkgsrc-2022Q1-base:1.4 pkgsrc-2021Q4:1.4.0.6 pkgsrc-2021Q4-base:1.4 pkgsrc-2021Q3:1.4.0.4 pkgsrc-2021Q3-base:1.4 pkgsrc-2021Q2:1.4.0.2 pkgsrc-2021Q2-base:1.4 pkgsrc-2021Q1:1.2.0.2 pkgsrc-2021Q1-base:1.2 pkgsrc-2020Q4:1.1.0.2 pkgsrc-2020Q4-base:1.1; locks; strict; comment @# @; 1.8 date 2026.02.24.23.09.02; author thor; state Exp; branches; next 1.7; commitid LkkOtNI91FnUtGvG; 1.7 date 2023.10.15.22.08.50; author thor; state Exp; branches; next 1.6; commitid pVzGR9g5e1SKSLIE; 1.6 date 2023.10.08.15.41.33; author thor; state Exp; branches; next 1.5; commitid cVpLXSMggp61YPHE; 1.5 date 2023.09.17.08.56.19; author adam; state Exp; branches; next 1.4; commitid t3wNd8lQK754p6FE; 1.4 date 2021.06.07.15.52.04; author adam; state Exp; branches; next 1.3; commitid V3YyjInGz1mZrdWC; 1.3 date 2021.05.29.19.57.21; author thor; state Exp; branches; next 1.2; commitid LGqoDHv3mwFW45VC; 1.2 date 2021.03.25.23.22.35; author thor; state Exp; branches; next 1.1; commitid sfNc1BNbwi9fiKMC; 1.1 date 2020.11.05.16.31.45; author bacon; state Exp; branches; next ; commitid QbQki5HSY1E5yIuC; desc @@ 1.8 log @openblas: update to 0.3.31 OpenBLAS ChangeLog ==================================================================== Version 0.3.31 15-Jan-2025 general: - reverted a matrix partitioning optimization from 0.3.30 that could lead to race conditions and subsequent invalid results in GEMM - added the bfloat16 extensions BGEMM and BGEMV - added a BLAS interface for the ?GEMM_BATCH extensions - added the BLAS extensions ?GEMM_BATCH_STRIDED and their CBLAS interface - added the basic infrastructure for half-precision float (FP16) format using SH prefix - reimplemented the LAPACK SLAED3/DLAED3 function using multithreading, thereby improving the performance of the SSYEVD/DSYEVD eigensolver for symmetric matrices on all platforms - limited the number of retries for initial memory allocation to avoid infinite hanging on low-memory systems - fixed a thread lockup situation encountered with python 3.9 or older and numpy - introduced a problem size threshold for multithreading in STRMV/DTRMV - introduced a problem size threshold for multithreading in CHER/CHER2/CHPR/CHPR2 and ZHER/ZHER2/ZHPR/ZHPR2 - improved the problem size thresholds for multithreading in SGER/DGER - improved autodetection of the Fortran compiler - fixed passing of the INTERFACE64=1 option to the flang-new compiler - fixed a potential deadlock in multithreaded code after calling fork() - fixed builds using CMake on FreeBSD - fixed builds using CMake from within Cygwin on Windows - fixed builds using CMake and the NVHPC compiler on ARM64 - fixed CMake build error from misdetecting compiler or OpenMP versions - improved contents of the CMake-generated OpenBLASConfig.cmake file - added support for cross-compilation to RISCV targets via CMake - fixed cross-compilation to x86 targets from non-x86 architectures - fixed failure to install cblas.h if NO_CBLAS=0 was specified - fixed missing user-defined pre- and postfixes on functions in lapack.h,lapacke.h - included fixes from the Reference-LAPACK project: - fix ordering bug in ?LAED/?LASD (Reference-LAPACK PR 1140) - revert changes in ?GEEV from PR 1129 (Reference-LAPACK PR 1142) - fix workspace allocation in LAPACKE_?TRSEN (Reference-LAPACK PR 1144) riscv: - added optimized SBGEMM kernels for ZVL128B and ZVL256B targets - added optimized SHGEMM kernels for ZVL128B and ZVL256B targets - added optimized SBGEMV and SHGEMV kernels for ZVL128B/ZVL256B - improved performance of the GEMV kernel for ZVL256B - improved the performance of the CROT and ZROT kernels for ZVL128B and x280 - improved the detection of RVV1.0 capability - improved performance of the matrix packing helper functions for ZVL128B and ZVL256B - improved performance of OMATCOPY for ZVL128B and ZVL256B arm: - fixed spurious executable stack in the getarch utility arm64: - fixed spurious executable stack in the getarch utility - fixed compiler warnings arising from the timer macro RPCC - fixed cache size detection for Qualcomm Oryon under Windows on Arm - fixed argument handling in the default SVE kernel for SDOT/DDOT - building the BFLOAT16 kernels is now enabled by default - improved the overall performance of GEMM,SYMM and HEMM on A64FX - improved the performance of SDOT/DDOT on A64FX - improved the multithreading performance of SDOT/DDOT on A64FX by introduction of a throttling table matching thread count to problem size - improved the performance of SGER/DGER on A64FX and NEOVERSEV1 - improved the multithreading performance of GEMM on A64FX and NEOVERSEV1 - improved the performance of the GEMV kernel for SVE-capable targets - improved the multithreading performance of SGEMM on NEOVERSEV1 and V2 - added optimized SAXPY/DAXPY SVE kernels for A64FX and NEOVERSEV1 - added optimized BGEMM and BGEMV kernels for NEOVERSEV1 - added an optimized BGEMM kernel for NEOVERSEN2 - added support for the NEOVERSEV2 cpu - added dedicated support for the Apple M4 cpu as VORTEXM4 - added optimized SGEMM/SSYMM/STRMM/SSYRK/SSYR2K for SME-capable targets (ARMV9SME and VORTEXM4) - improved the precision of the SNRM2 kernel - added cpu autodetection and compiler settings for Ampere One processors - fixed cpu autodetection for Apple M systems running Linux - fixed building on MacOS with AppleClang,gfortran and xcode v16 or newer - fixed several errors in the C code replacements for the complex and double precision complex LAPACK functions that get used (only) when compiling with Microsoft C and NOFORTRAN=1 under MS Windows power: - added initial support for the POWER11 architecture - improved performance of DGEMM and DGEMV on POWER10 - fixed the default compiler flags to use "-O3" instead of the possibly unsafe "-Ofast" - fixed building under MacOS (for old G4 Macs) with CMake - fixed potential miscompilation of DGEMV and other assembly kernels by gcc15.1 - fixed compilation with recent versions of flang loongarch64: - fixed warnings and potential inaccuracies arising from incorrect saving of registers - fixed enumeration of logical cores on big NUMA servers - fixed building with LLVM and the INTERFACE64=1 option x86: - fixed building the GEMM3M kernels for the GENERIC target - fixed several errors in the C code replacements for the complex and double precision complex LAPACK functions that get used (only) when compiling with Microsoft C and NOFORTRAN=1 under MS Windows x86_64: - added cpu autodetection for Intel Lunar Lake (Core Ultra 200V) - changed all ?MIN and ?MAX assembly kernels to use unaligned operations - fixed several errors in the C code replacements for the complex and double precision complex LAPACK functions that get used (only) when compiling with Microsoft C and NOFORTRAN=1 under MS Windows - fixed potential crashes in builds for Cooper Lake, Sapphire Rapids or Zen5 cpus under MS Windows zarch: - added support for building with CMake sparc: - fixed a potential crash in the DNRM2 kernel ==================================================================== Version 0.3.30 19-Jun-2025 general: - fixed an installation problem with the thread safety test in gmake builds - fixed spurious overwriting of an input array in complex GEMMT/GEMMTR - fixed naming of GEMMTR in error messages from XERBLA - fixed compilation of SBGEMMT/SBGEMMTR in CMake builds - fixed the implementation of ?NRM2 to handle INCX=0 correctly - removed tests for CSROT and ZDROT that relied on unspecified behavior - fixed a performance regression in multithreaded GEMM that was particularly serious on POWER targets - fixed linking issues when using LLVM's flang-new with gmake - fixed a potential thread safety problem with C11 atomic operations - further improved the workload partitioning in parallel GEMM - fixed omission of LAPACKE interfaces for CGESVDQ,CTRSYL3 and ?GEQPF in CMake builds - fixed mishandling of setting NO_LAPACK to FALSE, and incorrect dependencies for LAPACK function SPMV in CMake builds - added explicit CMake options for building LAPACKE and shared libraries - simplified and improved handling of OpenMP options in CMake builds - reworked Windows DLL generation in CMake builds to ensure correct symbol renaming (pre/postfixing) and optional generation of PDB files for debugging - updated the Perl script version of the gensymbol utility for use with Windows-on-Arm - Fixed building with (Mingw) gmake on Windows to ensure completeness of the LAPACK included in the static library (potential race condition due to the Windows version of the "ln" utility creating snapshot copies rather than links) - fixed unwanted deletion of the lapacke_mangling.h file by "make clean" - fixed potential duplication of a _64 suffix on library names in CMake builds - fixed compilation of the C fallback copies of the LAPACK code with GCC 15 - included fixed from the Reference-LAPACK project: - fixed a truncated error message in the EIG part of the testsuite (Reference-LAPACK PR 1119) - fixed too strict check in LAPACKE_?gesdd_work (PR #1126) - fixed memory corruption when calling ?GEEV with non-finite data (PR #1128) - fixed missing initialization of a variable in C/GEQP3RK (PR #1131) - fixed 2nd dimension chosen in C/ZUNMLQ transposition operation (PR #1135) x86_64: - fixed an error in the SBGEMV kernel for Cooper Lake/Sapphire Rapids - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL - improved the compiler identification code for flang-new - fixed a potential build issue in the ZSUM kernel - fixed "argument list too long" errors when building on MacOS - added cpu autodetection support for several new Arrow Lake models - fixed conditional inclusion of the fast path SGEMM kernel in DYNAMIC_ARCH - fixed compilation with the MinGW build of GCC 15 arm64: - fixed cpu type detection of A64FX and some ThunderX models (broken in 0.3.29) - added support for the AmpereOne/1A cpus in DYNAMIC_ ARCH builds - added an optimized SBGEMM kernel for NEOVERSEV1 - improved 1xN SBGEMM performance by forwarding to SBGEMV - introduced a stepwise increase of the thread count used for SGEMM and SGEMV on NEOVERSEV1/V2 in relation to problem size - introduced a stepwise increase of the thread count used for DGEMV on NEOVERSEV1 in relation to problem size - introduced a stepwise increase of the thread count used for SDOT and DDOT on NEOVERSEV1 in relation to problem size - worked around assembler limitations in LLVM for Windows-on-Arm - enabled cpu type autodetection from the registry on Windows-on-Arm - improved multithreading threshold for GEMV and GESV on Windows-on-Arm - fixed overoptimization issues with LLVM's flang in Windows-on-Arm - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL - added a fast path SGEMM kernel for small workloads on SME capable targets - improved performance of SGEMM and DGEMM kernels for small workloads - improved performance of SGEMV and DGEMV on SVE-capable targets - improved performance of SGEMV on NEOVERSEN1 and Apple M - added optimized SSYMV and DSYMV kernels for NEOVERSEN1, Apple M and all SVE capable targets - added optimized SBGEMV kernels for NEOVERSEV1/V2/N2 - improved performance of SGEMM through faster NCOPY kernels - added compiler options for the NVIDIA HPC Compiler Suite - fixed compilation on OSX with XCode 16.3 and later - fixed cpu core type and cache size detection on Apple M4 - updated GEMM parameter settings for Neoverse cpus in cross-builds with CMake - fixed default compiler options for NEOVERSEN1 and CORTEXX2 in CMake builds - fixed conditional inclusion of the fast path SGEMM kernel in DYNAMIC_ARCH - fixed potential miscompilation of the non-SVE SDOT kernel riscv64: - added optimized SROTM and DROTM kernels for x280 - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL - improved performance of GEMM_TCOPY on RVV1.0 targets with VLEN of 128 or 256 - improved performance of OMATCOPY on targets with VLEN 256 - greatly improved performance of SGEMV/DGEMV - improved performance of CGEMV and ZGEMV on C910V and all RVV targets with VLEN 256 - improved performance of SAXPBY and DAXPBY on C910V and all RVV targets with VLEN 256 - improved performance of AXPY and DOT on C910V and ZVL256B targets by falling back to non-vectorized code for very small N. (Thereby fixing poor performance of CHBMV/ZHBMV for very small K) - fixed CMake build failures of the TRMM kernels loongarch64: - improved performance of the LSX versions of SSYMV/DSYMV - made the LASX versions of the DSYMV and SSYMV kernels compatible with hardware changes in LA664 and future targets - fixed inaccuracies in several LASX kernels - improved compatibility of LSX kernels with LA264 targets - fixed handling of deprecated target names in CMake builds - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL power: - fixed building for PPCG4 with CMake - fixed SSCAL/DSCAL on PPC970 running FreeBSD - fixed a potential alignment issue in the POWER8 SGEMV kernel - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL zarch: - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL - fixed unwanted generation of object files with a writable stack x86: - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL - worked around potential miscompilation of CDOT with very old binutils arm: - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL - fixed unwanted generation of object files with a writable stack sparc: - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL alpha: - fixed build failure caused by spurious Windows-only typecasts cell: - fixed probable build issue caused by spurious Windows-only typecasts ==================================================================== Version 0.3.29 12-Jan-2025 general: - fixed a potential NULL pointer dereference in multithreaded builds - added function aliases for GEMMT using its new name GEMMTR adopted by Reference-BLAS - fixed a build failure when building without LAPACK_DEPRECATED functions - the minimum required CMake version for CMake-based builds was raised to 3.16.0 in order to remove many compatibility and deprecation warnings - added more detailed CMake rules for OpenMP builds (mainly to support recent LLVM) - fixed the behavior of the recently added CBLAS_?GEMMT functions with row-major data - improved thread scaling of multithreaded SBGEMV - improved thread scaling of multithreaded TRTRI - fixed compilation of the CBLAS testsuite with gcc14 (and no Fortran compiler) - added support for option handling changes in flang-new from LLVM18 onwards - added support for recent calling conventions changes in Cray and NVIDIA compilers - added support for compilation with the NAG Fortran compiler - fixed placement of the -fopenmp flag and libsuffix in the generated pkgconfig file - improved the CMakeConfig file generated by the Makefile build - fixed const-correctness of cblas_?geadd in cblas.h - fixed a potential inaccuracy in multithreaded BLAS3 calls - fixed empty implementations of get/set_affinity that print a warning in OpenMP builds - fixed function signatures for TRTRS in the converted C version of LAPACK - fixed omission of several single-precision LAPACK symbols in the shared library - improved build instructions for the provided "pybench" benchmarks - improved documentation, including added build instructions for WoA and HarmonyOS as well as descriptions of environment variables that affect build and runtime behavior - added a separate "make install_tests" target for use with cross-compilations - integrated improvements and corrections from Reference-LAPACK: - removed a comparison in LAPACKE ?tpmqrt that is always false (LAPACK PR 1062) - fixed the leading dimension for B in tests for GGEV (LAPACK PR 1064) - replaced the ?LARFT functions with a recursive implementation (LAPACK PR 1080) arm: - fixed build with recent versions of the NDK (missing .type declaration of symbols) arm64: - fixed a long-standing bug in the (generic) c/zgemm_beta kernel that could lead to reads and writes outside the array bounds in some circumstances - rewrote cpu autodetection to scan all cores and return the highest performing type - improved the DGEMM performance for SVE targets and small matrix sizes - improved dimension criteria for forwarding from GEMM to GEMV kernels - added SVE kernels for ROT and SWAP - improved SVE kernels for SGEMV and DGEMV on A64FX and NEOVERSEV1 - added support for using the "small matrix" kernels with CMake as well - fixed compilation on Windows on Arm - improved compile-time detection of SVE capability - added cpu autodetection and initial support for Apple M4 - added support for compilation on systems running IOS - added support for compilation on NetBSD ("evbarm" architecture) - fixed NRM2 implementations for generic SVE targets and the Neoverse N2 - fixed compilation for SVE-capable targets with the NVIDIA compiler x86_64: - fixed a wrong storage size in the SBGEMV kernel for Cooper Lake - added cpu autodetection for Intel Granite Rapids - added cpu autodetection for AMD Ryzen 5 series - added optimized SOMATCOPY_CT for AVX-capable targets - fixed the fallback implementation of GEMM3M in GENERIC builds - tentatively re-enabled builds with the EXPRECISION option - worked around a miscompilation of tests with mingw32-gfortran14 - added support for compilation with the Intel oneAPI 2025.0 compiler on Windows power: - fixed multithreaded SBGEMM - fixed a CMake build problem on POWER10 - improved the performance of SGEMV - added vectorized implementations of SBGEMV and support for forwarding 1xN SBGEMM to them - fixed illegal instructions and potential memory overflow in SGEMM on PPCG4 - fixed handling of NaN and Inf arguments in SSCAL and DSCAL on PPC440,G4 and 970 - added improved CGEMM and ZGEMM kernels for POWER10 - added Makefile logic to remove all optimization flags in DEBUG builds mips64: - fixed compilation with gcc14 - fixed GEMM parameter selection for the MIPS64_GENERIC target - fixed a potential build failure when compiling with OpenMP loongarch64: - fixed compilation for Loongson3 with recent versions of gmake - fixed a potential loss of precision in Loongson3A GEMM - fixed a potential build failure when compiling with OpenMP - added optimized SOMATCOPY for LASX-capable targets - introduced a new cpu naming scheme while retaining compatibility - added support for cross-compiling Loongarch64 targets with CMake - added support for compilation with LLVM riscv64: - removed thread yielding overhead caused by sched_yield - replaced some non-standard intrinsics with their official names - fixed and sped up the implementations of CGEMM/ZGEMM TCOPY for vector lenghts 128 and 256 - improved the performance of SNRM2/DNRM2 for RVV1.0 targets - added optimized ?OMATCOPY_CN kernels for RVV1.0 targets ==================================================================== Version 0.3.28 8-Aug-2024 general: - Reworked the unfinished implementation of HUGETLB from GotoBLAS for allocating huge memory pages as buffers on suitable systems - Changed the unfinished implementation of GEMM3M for the generic target on all architectures to at least forward to regular GEMM - Improved multithreaded GEMM performance for large non-skinny matrices - Improved BLAS3 performance on larger multicore systems through improved parallelism - Improved performance of the initial memory allocation by reducing locking overhead - Improved performance of GBMV at small problem sizes by introducing a size barrier for the switch to multithreading - Added an implementation of the CBLAS_GEMM_BATCH extension - Fixed miscompilation of CAXPYC and ZAXPYC on all architectures in CMAKE builds (error introduced in 0.3.27) - Fixed corner cases involving the handling of NAN and INFINITY arguments in ?SCAL on all architectures - Added support for cross-compiling to WEBM with CMAKE (in addition to the already present makefile support) - Fixed NAN handling and potential accuracy issues in compilations with Intel ICX by supplying a suitable fp-model option by default - The contents of the github project wiki have been converted into a new set of documentation included with the source code. - It is now possible to register a callback function that replaces the built-in support for multithreading with an external backend like TBB (openblas_set_threads_callback_function) - Fixed potential duplication of suffixes in shared library naming - Improved C compiler detection by the build system to tolerate more naming variants for gcc builds - Fixed an unnecessary dependency of the utest on CBLAS - Fixed spurious error reports from the BLAS extensions utest - Fixed unwanted invocation of the GEMM3M tests in cross-compilation - Fixed a flaw in the makefile build that could lead to the pkgconfig file containing an entry of UNKNOWN for the target cpu after installing - Integrated fixes from the Reference-LAPACK project: - Fixed uninitialized variables in the LAPACK tests for ?QP3RK (PR 961) - Fixed potential bounds error in ?UNHR_COL/?ORHR_COL (PR 1018) - Fixed potential infinite loop in the LAPACK testsuite (PR 1024) - Make the variable type used for hidden length arguments configurable (PR 1025) - Fixed SYTRD workspace computation and various typos (PR 1030) - Prevent compiler use of FMA that could increase numerical error in ?GEEVX (PR 1033) x86-64: - reverted thread management under Windows to its state before 0.3.26 due to signs of race conditions in some circumstances now under study - fixed accidental selection of the unoptimized generic SBGEMM kernel in CMAKE builds for CooperLake and SapphireRapids targets - fixed a potential thread buffer overrun in SBSTOBF16 on small systems - fixed an accuracy issue in ZSCAL introduced in 0.3.26 - fixed compilation with CMAKE and recent releases of LLVM - added support for Intel Emerald Rapids and Meteor Lake cpus - added autodetection support for the Zhaoxin KX-7000 cpu - fixed autodetection of Intel Prescott (probably broken since 0.3.19) - fixed compilation for older targets with the Yocto SDK - fixed compilation of the converter-generated C versions of the LAPACK sources with gcc-14 - improved compiler options when building with CMAKE and LLVM for AVX512-capable targets - added support for supplying the L2 cache size via an environment variable (OPENBLAS_L2_SIZE) in case it is not correctly reported (as in some VM configurations) - improved the error message shown when thread creation fails on startup - fixed setting the rpath entry of the dylib in CMAKE builds on MacOS arm: - fixed building for baremetal targets with make arm64: - Added a fast path forwarding SGEMM and DGEMM calls with a 1xN or Mx1 matrix to the corresponding GEMV kernel - added optimized SGEMV and DGEMV kernels for A64FX - added optimized SVE kernels for small-matrix GEMM - added A64FX to the cpu list for DYNAMIC_ARCH - fixed building with support for cpu affinity - worked around accuracy problems with C/ZNRM2 on NeoverseN1 and Apple M targets - improved GEMM performance on Neoverse V1 - fixed compilation for NEOVERSEN2 with older compilers - fixed potential miscompilation of the SVE SDOT and DDOT kernels - fixed potential miscompilation of the non-SVE CDOT and ZDOT kernels - fixed a potential overflow when using very large user-defined BUFFERSIZE - fixed setting the rpath entry of the dylib in CMAKE builds on MacOS power: - Added a fast path forwarding SGEMM and DGEMM calls with a 1xN or Mx1 matrix to the corresponding GEMV kernel - significantly improved performance of SBGEMM on POWER10 - fixed compilation with OpenMP and the XLF compiler - fixed building of the BLAS extension utests under AIX - fixed building of parts of the LAPACK testsuite with XLF - fixed CSWAP/ZSWAP on big-endian POWER10 targets - fixed a performance regression in SAXPY on POWER10 with OpenXL - fixed accuracy issues in CSCAL/ZSCAL when compiled with LLVM - fixed building for POWER9 under FreeBSD - fixed a potential overflow when using very large user-defined BUFFERSIZE - fixed an accuracy issue in the POWER6 kernels for GEMM and GEMV riscv64: - Added a fast path forwarding SGEMM and DGEMM calls with a 1xN or Mx1 matrix to the corresponding GEMV kernel - fixed building for RISCV64_GENERIC with OpenMP enabled - added DYNAMIC_ARCH support (comprising GENERIC_RISCV64 and the two RVV 1.0 targets with vector length of 128 and 256) - worked around the ZVL128B kernels for AXPBY mishandling the special case of zero Y increment loongarch64: - improved GEMM performance on servers of the 3C5000 generation - improved performance and stability of DGEMM - improved GEMV and TRSM kernels for LSX and LASX vector ABIs - fixed CMAKE compilation with the INTERFACE64 option set - fixed compilation with CMAKE - worked around spurious errors flagged by the BLAS3 tests - worked around a miscompilation of the POTRS utest by gcc 14.1 mips64: - fixed ASUM and SUM kernels to accept negative step sizes in X - fixed complex GEMV kernels for MSA ==================================================================== Version 0.3.27 4-Apr-2024 general: - added initial (generic) support for the CSKY architecture - capped the maximum number of threads used in GEMM, GETRF and POTRF to avoid creating underutilized or idle threads - sped up multithreaded POTRF on all platforms - added extension openblas_set_num_threads_local() that returns the previous thread count - re-evaluated the SGEMV and DGEMV load thresholds to avoid activating multithreading for too small workloads - improved the fallback code used when the precompiled number of threads is exceeded, and made it callable multiple times during the lifetime of an instance - added CBLAS interfaces for the BLAS extensions ?AMIN,?AMAX, CAXPYC and ZAXPYC - fixed a potential buffer overflow in the interface to the GEMMT kernels - fixed use of incompatible pointer types in GEMMT and C/ZAXPBY as flagged by GCC-14 - fixed unwanted case sensitivity of the character parameters in ?TRTRS - sped up the OpenMP thread management code - fixed sizing of logical variables in INTERFACE64 builds of the C version of LAPACK - fixed inclusion of new LAPACK and LAPACKE functions from LAPACK 3.11 in the shared library - added a testsuite for the BLAS extensions - modified the error thresholds for SGS/DGS functions in the LAPACK testsuite to suppress spurious errors - added support for building the benchmark collection with CMAKE - added rewriting of linker options to avoid linking both libgomp and libomp in CMAKE builds with OpenMP enabled that use clang with gfortran - fixed building on systems with ucLibc - added support for calling ?NRM2 with a negative increment value on all architectures - added support for the LLVM18 version of the flang-new compiler - fixed handling of the OPENBLAS_LOOPS variable in several benchmarks - Integrated fixes from the Reference-LAPACK project: - Increased accuracy in C/ZLARFGP (Reference-LAPACK PR 981) x86: - fixed handling of NaN and Inf arguments in ZSCAL - fixed GEMM3M functions failing in CMAKE builds x86-64: - removed all instances of sched_yield() on Linux and BSD - fixed a potential deadlock in the thread server on MSWindows (introduced in 0.3.26) - fixed GEMM3M functions failing in CMAKE builds - fixed handling of NaN and Inf arguments in ZSCAL - added compiler checks for AVX512BF16 compatibility - fixed LLVM compiler options for Sapphire Rapids - fixed cpu handling fallbacks for Sapphire Rapids with disabled AVX2 in DYNAMIC_ARCH mode - fixed extensions SCSUM and DZSUM - improved GEMM performance for ZEN targets arm: - fixed handling of NaN and Inf arguments in ZSCAL arm64: - added initial support for the Cortex-A76 cpu - fixed handling of NaN and Inf arguments in ZSCAL - fixed default compiler options for gcc (-march and -mtune) - added support for ArmCompilerForLinux - added support for the NeoverseV2 cpu in DYNAMIC_ARCH builds - fixed mishandling of the INTERFACE64 option in CMAKE builds - corrected SCSUM kernels (erroneously duplicating SCASUM behaviour) - added SVE-enabled kernels for CSUM/ZSUM - worked around an inaccuracy in the NRM2 kernels for NeoverseN1 and Apple M power: - improved performance of SGEMM on POWER8/9/10 - improved performance of DGEMM on POWER10 - added support for OpenMP builds with xlc/xlf on AIX - improved cpu autodetection for DYNAMIC_ARCH builds on older AIX - fixed cpu core counting on AIX - added support for building a shared library on AIX riscv64: - added support for the X280 cpu - added support for semi-generic RISCV models with vector length 128 or 256 - added support for compiling with either RVV 0.7.1 or RVV 1.0 standard compilers - fixed handling of NaN and Inf arguments in ZSCAL - improved cpu model autodetection - fixed corner cases in ?AXPBY for C910V - fixed handling of zero increments in ?AXPY kernels for C910V loongarch64: - added optimized kernels for ?AMIN and ?AMAX - fixed handling of NaN and Inf arguments in ZSCAL - fixed handling of corner cases in ?AXPBY - fixed computation of SAMIN and DAMIN in LSX mode - fixed computation of ?ROT - added optimized SSYMV and DSYMV kernels for LSX and LASX mode - added optimized CGEMM and ZGEMM kernels for LSX and LASX mode - added optimized CGEMV and ZGEMV kernels mips: - fixed utilizing MSA on P5600 and related cpus (broken in 0.3.22) - fixed handling of NaN and Inf arguments in ZSCAL - fixed mishandling of the INTERFACE64 option in CMAKE builds zarch: - fixed handling of NaN and Inf arguments in ZSCAL - fixed calculation of ?SUM on Z13 @ text @$NetBSD$ Reform library linking commands to work for any library naming setup, ensuring that we got the generic libopenblas.so.0 and libopenblas.so names, either as symlink or as library file. Remove duplicate definition of SUFFIX64. Simplify and fix the pkg-config file for FIXED_LIBNAME. --- Makefile.install.orig 2026-01-15 22:57:26.000000000 +0000 +++ Makefile.install @@@@ -2,15 +2,11 @@@@ TOPDIR = . export GOTOBLAS_MAKEFILE = 1 -include $(TOPDIR)/Makefile.conf_last include ./Makefile.system -LNCMD = ln -fs ifdef THELIBNAME LIBNAME=$(THELIBNAME) LIBSONAME=$(THELIBSONAME) endif -ifeq ($(FIXED_LIBNAME), 1) -LNCMD = true -endif ifeq ($(INTERFACE64),1) USE_64BITINT=1 endif @@@@ -32,7 +28,7 @@@@ PKG_EXTRALIB := $(EXTRALIB) ifeq ($(INTERFACE64),1) SUFFIX64=64 endif -PKGFILE="$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)/$(LIBSONAMEBASE)$(SUFFIX64).pc" +PKGFILE="$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)/$(LIBNAMEBASE).pc" ifeq ($(USE_OPENMP), 1) ifeq ($(C_COMPILER), PGI) @@@@ -124,29 +120,23 @@@@ ifneq ($(NO_STATIC),1) @@echo Copying the static library to $(DESTDIR)$(OPENBLAS_LIBRARY_DIR) @@install -m644 $(LIBNAME) "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)" @@cd "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)" ; \ - $(LNCMD) $(LIBNAME) $(LIBPREFIX).$(LIBSUFFIX) + $(call LNCMD,$(LIBNAME),$(LIBPREFIX).$(LIBSUFFIX)) endif #for install shared library ifneq ($(NO_SHARED),1) @@echo Copying the shared library to $(DESTDIR)$(OPENBLAS_LIBRARY_DIR) -ifeq ($(OSNAME), $(filter $(OSNAME),Linux SunOS Android Haiku FreeBSD DragonFly)) +ifeq ($(OSNAME), $(filter $(OSNAME),Linux SunOS Android Haiku FreeBSD DragonFly OpenBSD NetBSD)) @@install -m755 $(LIBSONAME) "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)" @@cd "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)" ; \ - $(LNCMD) $(LIBSONAME) $(LIBPREFIX).so ; \ - $(LNCMD) $(LIBSONAME) $(LIBPREFIX).so.$(MAJOR_VERSION) -endif - -ifeq ($(OSNAME), $(filter $(OSNAME),OpenBSD NetBSD)) - @@cp $(LIBSONAME) "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)" - @@cd "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)" ; \ - $(LNCMD) $(LIBSONAME) $(LIBPREFIX).so + $(call LNCMD,$(LIBSONAME),$(LIBPREFIX).so) ; \ + $(call LNCMD,$(LIBSONAME),$(LIBPREFIX).so.$(MAJOR_VERSION)) endif ifeq ($(OSNAME), Darwin) @@-cp $(LIBDYNNAME) "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)" @@-install_name_tool -id "$(OPENBLAS_LIBRARY_DIR)/$(LIBPREFIX).$(MAJOR_VERSION).dylib" "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)/$(LIBDYNNAME)" @@cd "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)" ; \ - $(LNCMD) $(LIBDYNNAME) $(LIBPREFIX).dylib ; \ - $(LNCMD) $(LIBDYNNAME) $(LIBPREFIX).$(MAJOR_VERSION).dylib + $(call LNCMD,$(LIBDYNNAME),$(LIBPREFIX).dylib) ; \ + $(call LNCMD,$(LIBDYNNAME),$(LIBPREFIX).$(MAJOR_VERSION).dylib) endif ifeq ($(OSNAME), WINNT) @@-cp $(LIBDLLNAME) "$(DESTDIR)$(OPENBLAS_BINARY_DIR)" @@@@ -174,30 +164,23 @@@@ ifneq ($(NO_STATIC),1) @@echo Copying the static library to $(DESTDIR)$(OPENBLAS_LIBRARY_DIR) @@installbsd -c -m 644 $(LIBNAME) "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)" @@cd "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)" ; \ - $(LNCMD) $(LIBNAME) $(LIBPREFIX).$(LIBSUFFIX) + $(call LNCMD,$(LIBNAME),$(LIBPREFIX).$(LIBSUFFIX)) endif #for install shared library ifneq ($(NO_SHARED),1) @@echo Copying the shared library to $(DESTDIR)$(OPENBLAS_LIBRARY_DIR) @@installbsd -c -m 755 $(LIBSONAME) "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)" @@cd "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)" ; \ - $(LNCMD) $(LIBSONAME) $(LIBPREFIX).so ; \ - $(LNCMD) $(LIBSONAME) $(LIBPREFIX).so.$(MAJOR_VERSION) + $(call LNCMD,$(LIBSONAME),$(LIBPREFIX).so) ; \ + $(call LNCMD,$(LIBSONAME),$(LIBPREFIX).so.$(MAJOR_VERSION)) endif endif #Generating openblas.pc -ifeq ($(INTERFACE64),1) - SUFFIX64=64 -endif - PKGFILE="$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)/$(LIBSONAMEBASE)$(SUFFIX64).pc" - - @@echo Generating $(LIBSONAMEBASE)$(SUFFIX64).pc in "$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)" + @@echo Generating "$$(basename "$(PKGFILE)")" in "$$(dirname "$(PKGFILE)")" @@echo 'libdir='$(OPENBLAS_LIBRARY_DIR) > "$(PKGFILE)" - @@echo 'libprefix='$(LIBNAMEPREFIX) >> "$(PKGFILE)" - @@echo 'libnamesuffix='$(LIBNAMESUFFIX) >> "$(PKGFILE)" - @@echo 'libsuffix='$(SYMBOLSUFFIX) >> "$(PKGFILE)" + @@echo 'libname='$(LIBLINKPREFIX) >> "$(PKGFILE)" @@echo 'includedir='$(OPENBLAS_INCLUDE_DIR) >> "$(PKGFILE)" @@echo 'omp_opt='$(FOMP_OPT) >> "$(PKGFILE)" @@echo 'openblas_config= USE_64BITINT='$(INTERFACE64) 'DYNAMIC_ARCH='$(DYNAMIC_ARCH) 'DYNAMIC_OLDER='$(DYNAMIC_OLDER) 'NO_CBLAS='$(NO_CBLAS) 'NO_LAPACK='$(NO_LAPACK) 'NO_LAPACKE='$(NO_LAPACKE) 'NO_AFFINITY='$(NO_AFFINITY) 'USE_OPENMP='$(USE_OPENMP) $(TARGET) 'MAX_THREADS='$(NUM_THREADS)>> "$(PKGFILE)" @ 1.7 log @math/openblas*: more portable sed for .pc modification The old path added \b, which is not POSIX BRE. [:space:] works better with differing seds. It removes more than \b, but in our installs, the following suffix variable is emtpy, anyway. @ text @d1 1 a1 1 $NetBSD: patch-Makefile.install,v 1.6 2023/10/08 15:41:33 thor Exp $ d3 3 a5 2 Second part of removing the special library names. Separate options for "install" (needed at least on Darwin). d7 5 a11 1 --- Makefile.install.orig 2023-09-03 20:58:32.000000000 +0000 d13 17 a29 1 @@@@ -17,7 +17,7 @@@@ PKG_EXTRALIB := $(EXTRALIB) d38 1 a38 2 @@@@ -90,29 +90,37 @@@@ endif ifneq ($(NO_STATIC),1) a40 1 +ifneq ($(LIBNAME), $(LIBPREFIX).$(LIBSUFFIX)) d42 2 a43 1 ln -fs $(LIBNAME) $(LIBPREFIX).$(LIBSUFFIX) a44 1 +endif d48 2 a49 1 ifeq ($(OSNAME), $(filter $(OSNAME),Linux SunOS Android Haiku FreeBSD DragonFly)) d52 10 a61 13 - ln -fs $(LIBSONAME) $(LIBPREFIX).so ; \ - ln -fs $(LIBSONAME) $(LIBPREFIX).so.$(MAJOR_VERSION) + if ! test $(LIBSONAME) = $(LIBPREFIX).so; then \ + ln -fs $(LIBSONAME) $(LIBPREFIX).so ; fi ; \ + if ! test $(LIBSONAME) = $(LIBPREFIX).so.$(MAJOR_VERSION); then \ + ln -fs $(LIBSONAME) $(LIBPREFIX).so.$(MAJOR_VERSION); fi endif ifeq ($(OSNAME), $(filter $(OSNAME),OpenBSD NetBSD)) @@cp $(LIBSONAME) "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)" +ifneq ($(LIBSONAME), $(LIBPREFIX).so) @@cd "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)" ; \ ln -fs $(LIBSONAME) $(LIBPREFIX).so a62 1 +endif d67 4 a70 4 + if ! test $(LIBDYNNAME) = $(LIBPREFIX).dylib; then \ ln -fs $(LIBDYNNAME) $(LIBPREFIX).dylib ; \ + fi ; \ ln -fs $(LIBDYNNAME) $(LIBPREFIX).$(MAJOR_VERSION).dylib d73 2 a74 2 @@@@ -140,16 +148,20 @@@@ endif ifneq ($(NO_STATIC),1) a76 1 +ifneq ($(LIBNAME), $(LIBPREFIX).$(LIBSUFFIX)) d78 2 a79 1 ln -fs $(LIBNAME) $(LIBPREFIX).$(LIBSUFFIX) a80 1 +endif d86 4 a89 6 - ln -fs $(LIBSONAME) $(LIBPREFIX).so ; \ - ln -fs $(LIBSONAME) $(LIBPREFIX).so.$(MAJOR_VERSION) + if ! test $(LIBSONAME) = $(LIBPREFIX).so; then \ + ln -fs $(LIBSONAME) $(LIBPREFIX).so ; fi ; \ + if ! test $(LIBSONAME) = $(LIBPREFIX).so.$(MAJOR_VERSION); then \ + ln -fs $(LIBSONAME) $(LIBPREFIX).so.$(MAJOR_VERSION); fi d93 5 a97 4 @@@@ -158,7 +170,7 @@@@ endif ifeq ($(INTERFACE64),1) SUFFIX64=64 endif d99 3 a101 3 + PKGFILE="$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)/$(LIBNAMEBASE).pc" @@echo Generating $(LIBSONAMEBASE)$(SUFFIX64).pc in "$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)" d103 7 a109 9 @@@@ -167,7 +179,7 @@@@ endif @@echo 'openblas_config= USE_64BITINT='$(INTERFACE64) 'DYNAMIC_ARCH='$(DYNAMIC_ARCH) 'DYNAMIC_OLDER='$(DYNAMIC_OLDER) 'NO_CBLAS='$(NO_CBLAS) 'NO_LAPACK='$(NO_LAPACK) 'NO_LAPACKE='$(NO_LAPACKE) 'NO_AFFINITY='$(NO_AFFINITY) 'USE_OPENMP='$(USE_OPENMP) $(CORE) 'MAX_THREADS='$(NUM_THREADS)>> "$(PKGFILE)" @@echo 'version='$(VERSION) >> "$(PKGFILE)" @@echo 'extralib='$(PKG_EXTRALIB) >> "$(PKGFILE)" - @@cat openblas.pc.in >> "$(PKGFILE)" + @@cat openblas.pc.in | sed -e 's,-lopenblas[^[:space:]]*,-l$(LIBNAMEBASE),' >> "$(PKGFILE)" #Generating OpenBLASConfig.cmake @ 1.6 log @math/openblas: Fix pkg-config file for current version. The last update broke the library name in the installed pkg-config file for the openblas variants. Fixing this. Our type of library naming should be pushed upstream, or adapted to some other scheme. @ text @d1 1 a1 1 $NetBSD: patch-Makefile.install,v 1.5 2023/09/17 08:56:19 adam Exp $ d94 1 a94 1 + @@cat openblas.pc.in | sed -e 's,-lopenblas\b,-l$(LIBNAMEBASE),' >> "$(PKGFILE)" @ 1.5 log @openblas*: updated to 0.3.24 OpenBLAS 0.3.24 general: declared the arguments of cblas_xerbla as const (in accordance with the reference implementation and others, the previous discrepancy appears to have dated back to GotoBLAS) fixed the implementation of ?GEMMT that was added in 0.3.23 made cpu-specific SWITCH_RATIO parameters for GEMM available to DYNAMIC_ARCH builds fixed application of SYMBOLSUFFIX in CMAKE builds fixed missing SSYCONVF function in the shared library fixed parallel build logic used with gmake added support for compilation with LLVM17, in particular its new Fortran compiler added support for CMAKE builds using the NVIDIA HPC compiler fixed INTERFACE64 builds with CMAKE and the f95 Fortran compiler fixed cross-build detection and management in c_check disabled building of the tests with CMAKE when ONLY_CBLAS is defined fixed several issues with the handling of runtime limits on the number of OPENMP threads corrected the error code returned by SGEADD/DGEADD when LDA is too small corrected the error code returned by IMATCOPY when LDB is too small updated ?NRM2 to support negative increment values (as introduced in release 3.10.0 of the Reference BLAS) updated ?ROTG to use the safe scaling algorithm introduced in release 3.10.0 of the Reference BLAS fixed OpenMP builds with CLANG for the case where libomp is not in a standard location fixed a potential overwrite of unrelated memory during thread initialisation on startup fixed a potential integer overflow in the multithreading threshold for ?SYMM/?SYRK fixed build of the LAPACKE interfaces for the LAPACK 3.11.0 ?TRSYL functions added in 0.3.22 fixed installation of .cmake files in concurrent 32 and 64bit builds with CMAKE applied additions and corrections from the development branch of Reference-LAPACK: fixed actual arguments passed to a number of LAPACK functions (from Reference-LAPACK PR 885) fixed workspace query results in LAPACK ?SYTRF/?TRECV3 (from Reference-LAPACK PR 883) fixed derivation of the UPLO parameter in LAPACKE_?larfb (from Reference-LAPACK PR 878) fixed a crash in LAPACK ?GELSDD on NRHS=0 (from Reference-LAPACK PR 876) added new LAPACK utility functions CRSCL and ZRSCL (from Reference-LAPACK PR 839) corrected the order of eigenvalues for 2x2 matrices in ?STEMR (Reference-LAPACK PR 867) removed spurious reference to OpenMP variables outside OpenMP contexts (Reference-LAPACK PR 860) updated file comments on use of LAMBDA variable in LAPACK (Reference-LAPACK PR 852) fixed documentation of LAPACK SLASD0/DLASD0 (Reference-LAPACK PR 855) fixed confusing use of "minor" in LAPACK documentation (Reference-LAPACK PR 849) added new LAPACK functions ?GEDMD for dynamic mode decomposition (Reference-LAPACK PR 736) fixed potential stack overflows in the EIG part of the LAPACK testsuite (Reference-LAPACK PR 854) applied small improvements to the variants of Cholesky and QR functions (Reference-LAPACK PR 847) removed unused variables from LAPACK ?BDSQR (Reference-LAPACK PR 832) fixed a potential crash on allocation failure in LAPACKE SGEESX/DGEESX (Reference-LAPACK PR 836) added a quick return from SLARUV/DLARUV for N < 1 (Reference-LAPACK PR 837) updated function descriptions in LAPACK ?GEGS/?GEGV (Reference-LAPACK PR 831) improved algorithm description in ?GELSY (Reference-LAPACK PR 833) fixed scaling in LAPACK STGSNA/DTGSNA (Reference-LAPACK PR 830) fixed crash in LAPACKE_?geqrt with row-major data (Reference-LAPACK PR 768) added LAPACKE interfaces for C/ZUNHR_COL and S/DORHR_COL (Reference-LAPACK PR 827) added error exit tests for SYSV/SYTD2/GEHD2 to the testsuite (Reference-LAPACK PR 795) fixed typos in LAPACK source and comments (Reference-LAPACK PRs 809,811,812,814,820) adopt refactored ?GEBAL implementation (Reference-LAPACK PR 808) x86_64: added cpu model autodetection for Intel Alder Lake N added activation of the AMX tile to the Sapphire Rapids SBGEMM kernel worked around miscompilations of GEMV/SYMV kernels by gcc's tree-vectorizer fixed compilation of Cooperlake and Sapphire Rapids kernels with CLANG fixed runtime detection of Cooperlake and Sapphire Rapids in DYNAMIC_ARCH fixed feature-based cputype fallback in DYNAMIC_ARCH added support for building the AVX512 kernels with the NVIDIA HPC compiler corrected ZAXPY result on old pre-AVX hardware for the INCX=0 case fixed a potential use of uninitialized variables in ZTRSM ARMV8: added cpu model autodetection for Apple M2 fixed wrong results of CGEMM/CTRMM/DNRM2 under OSX (use of reserved register) added support for building the SVE kernels with the NVIDIA HPC compiler added support for building the SVE kernels with the Apple Clang compiler fixed compiler option handling for building the SVE kernels with LLVM implemented SWITCH_RATIO parameter for improved GEMM performance on Neoverse activated SVE SGEMM and DGEMM kernels for Neoverse V1 improved performance of the SVE CGEMM and ZGEMM kernels on Neoverse V1 improved kernel selection for the ARMV8SVE target and added it to DYNAMIC_ARCH fixed runtime check for SVE availability in DYNAMIC_ARCH builds to take OS or container restrictions into account fixed a potential use of uninitialized variables in ZTRSM fix a potential misdetection of ARMV8 hardware as 32bit in CMAKE builds LOONGARCH64: added ABI detection added support for cpu affinity handling fixed compilation with early versions of the Loongson toolchain added an optimized SGEMM kernel for 3A5000 added optimized DGEMV kernels for 3A5000 improved the performance of the DGEMM kernel for 3A5000 MIPS64: fixed miscompilation of TRMM kernels for the MIPS64_GENERIC target POWER: fixed compiler warnings in the POWER10 SBGEMM kernel RISCV: fixed application of the INTERFACE64 option when building with CMAKE fix a potential misdetection of RISCV hardware as 32bit in CMAKE builds fixed IDAMAX and DOT kernels for C910V fixed corner cases in the ROT and SWAP kernels for C910V fixed compilation of the C910V target with recent vendor compilers @ text @d1 1 a1 1 $NetBSD: patch-Makefile.install,v 1.4 2021/06/07 15:52:04 adam Exp $ d94 1 a94 1 + @@cat openblas.pc.in | sed -e 's,-lopenblas$$,-l$(LIBNAMEBASE),' >> "$(PKGFILE)" @ 1.4 log @openblas: fix building on Darwin @ text @d1 1 a1 1 $NetBSD: patch-Makefile.install,v 1.3 2021/05/29 19:57:21 thor Exp $ d6 1 a6 1 --- Makefile.install.orig 2021-05-02 21:50:22.000000000 +0000 d8 10 a17 17 @@@@ -74,40 +74,46 @@@@ endif ifneq ($(OSNAME), AIX) ifndef NO_LAPACKE @@echo Copying LAPACKE header files to $(DESTDIR)$(OPENBLAS_INCLUDE_DIR) - @@-install -pm644 $(NETLIB_LAPACK_DIR)/LAPACKE/include/lapack.h "$(DESTDIR)$(OPENBLAS_INCLUDE_DIR)/lapack.h" - @@-install -pm644 $(NETLIB_LAPACK_DIR)/LAPACKE/include/lapacke.h "$(DESTDIR)$(OPENBLAS_INCLUDE_DIR)/lapacke.h" - @@-install -pm644 $(NETLIB_LAPACK_DIR)/LAPACKE/include/lapacke_config.h "$(DESTDIR)$(OPENBLAS_INCLUDE_DIR)/lapacke_config.h" - @@-install -pm644 $(NETLIB_LAPACK_DIR)/LAPACKE/include/lapacke_mangling_with_flags.h.in "$(DESTDIR)$(OPENBLAS_INCLUDE_DIR)/lapacke_mangling.h" - @@-install -pm644 $(NETLIB_LAPACK_DIR)/LAPACKE/include/lapacke_utils.h "$(DESTDIR)$(OPENBLAS_INCLUDE_DIR)/lapacke_utils.h" + @@-install -p -m 644 $(NETLIB_LAPACK_DIR)/LAPACKE/include/lapack.h "$(DESTDIR)$(OPENBLAS_INCLUDE_DIR)/lapack.h" + @@-install -p -m 644 $(NETLIB_LAPACK_DIR)/LAPACKE/include/lapacke.h "$(DESTDIR)$(OPENBLAS_INCLUDE_DIR)/lapacke.h" + @@-install -p -m 644 $(NETLIB_LAPACK_DIR)/LAPACKE/include/lapacke_config.h "$(DESTDIR)$(OPENBLAS_INCLUDE_DIR)/lapacke_config.h" + @@-install -p -m 644 $(NETLIB_LAPACK_DIR)/LAPACKE/include/lapacke_mangling_with_flags.h.in "$(DESTDIR)$(OPENBLAS_INCLUDE_DIR)/lapacke_mangling.h" + @@-install -p -m 644 $(NETLIB_LAPACK_DIR)/LAPACKE/include/lapacke_utils.h "$(DESTDIR)$(OPENBLAS_INCLUDE_DIR)/lapacke_utils.h" endif #for install static library d20 1 a20 2 - @@install -pm644 $(LIBNAME) "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)" + @@install -p -m 644 $(LIBNAME) "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)" d30 1 a30 1 @@install -pm755 $(LIBSONAME) "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)" d51 3 a53 2 - ln -fs $(LIBDYNNAME) $(LIBPREFIX).dylib ; \ + if ! test $(LIBDYNNAME) = $(LIBPREFIX).dylib; then ln -fs $(LIBDYNNAME) $(LIBPREFIX).dylib; fi ; \ d57 1 a57 1 @@@@ -135,28 +141,32 @@@@ endif d76 1 a76 1 + ln -fs $(LIBSONAME) $(LIBPREFIX).so.$(MAJOR_VERSION) ; fi d80 15 a94 15 #Generating openblas.pc @@echo Generating $(LIBSONAMEBASE).pc in "$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)" - @@echo 'libdir='$(OPENBLAS_LIBRARY_DIR) > "$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)/$(LIBSONAMEBASE).pc" - @@echo 'includedir='$(OPENBLAS_INCLUDE_DIR) >> "$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)/$(LIBSONAMEBASE).pc" - @@echo 'openblas_config= USE_64BITINT='$(USE_64BITINT) 'DYNAMIC_ARCH='$(DYNAMIC_ARCH) 'DYNAMIC_OLDER='$(DYNAMIC_OLDER) 'NO_CBLAS='$(NO_CBLAS) 'NO_LAPACK='$(NO_LAPACK) 'NO_LAPACKE='$(NO_LAPACKE) 'NO_AFFINITY='$(NO_AFFINITY) 'USE_OPENMP='$(USE_OPENMP) $(CORE) 'MAX_THREADS='$(NUM_THREADS)>> "$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)/$(LIBSONAMEBASE).pc" - @@echo 'version='$(VERSION) >> "$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)/$(LIBSONAMEBASE).pc" - @@echo 'extralib='$(PKG_EXTRALIB) >> "$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)/$(LIBSONAMEBASE).pc" - @@cat openblas.pc.in >> "$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)/$(LIBSONAMEBASE).pc" + @@echo 'libdir='$(OPENBLAS_LIBRARY_DIR) > "$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)/$(LIBNAMEBASE).pc" + @@echo 'includedir='$(OPENBLAS_INCLUDE_DIR) >> "$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)/$(LIBNAMEBASE).pc" + @@echo 'openblas_config= USE_64BITINT='$(USE_64BITINT) 'DYNAMIC_ARCH='$(DYNAMIC_ARCH) 'DYNAMIC_OLDER='$(DYNAMIC_OLDER) 'NO_CBLAS='$(NO_CBLAS) 'NO_LAPACK='$(NO_LAPACK) 'NO_LAPACKE='$(NO_LAPACKE) 'NO_AFFINITY='$(NO_AFFINITY) 'USE_OPENMP='$(USE_OPENMP) $(CORE) 'MAX_THREADS='$(NUM_THREADS)>> "$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)/$(LIBNAMEBASE).pc" + @@echo 'version='$(VERSION) >> "$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)/$(LIBNAMEBASE).pc" + @@echo 'extralib='$(PKG_EXTRALIB) >> "$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)/$(LIBNAMEBASE).pc" + @@cat openblas.pc.in | sed -e 's,-lopenblas$$,-l$(LIBNAMEBASE),' >> "$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)/$(LIBNAMEBASE).pc" @ 1.3 log @math/openblas: update to version 0.3.15 This includes a rework of our patchery with the hope of upstreaming a good deal of it. These are the upstream changes since 0.3.10: Version 0.3.15 2-May-2021 common: - imported improvements and bugfixes from Reference-LAPACK 3.9.1 - imported LAPACKE interface fixes from Reference-LAPACK PRs 534 + 537 - fixed a problem in the cpu detection of 0.3.14 that prevented cross-compilation - fixed a sequence problem in the generation of softlinks to the library in GMAKE RISC V: - fixed compilation on RISCV (missing entry in getarch) - fixed a potential division by zero in CROTG and ZROTG POWER: - fixed LAPACK testsuite failures seen with the NVIDIA HPC compiler - improved CGEMM, DGEMM and ZGEMM performance on POWER10 - added an optimized ZGEMV kernel for POWER10 - fixed a potential division by zero in CROTG and ZROTG x86_64: - added support for Intel Control-flow Enforcement Technology (CET) - reverted the DOMATCOPY_RT code to the generic C version - fixed a bug in the AVX512 SGEMM kernel introduced in 0.3.14 - fixed misapplication of -msse flag to non-SSE cpus in DYNAMIC_ARCH - added support for compilation of the benchmarks on older OSX versions - fix propagation of the NO_AVX512 option in CMAKE builds - fix compilation of the AVX512 SGEMM kernel with clang-cl on Windows - fixed compilation of the CTESTs with INTERFACE64=1 (random faults on OSX) - corrected the Haswell DROT kernel to require AVX2/FMA3 rather than AVX512 ARM: - fixed a potential division by zero in CROTG and ZROTG - fixed a potential overflow in IMATCOPY/ZIMATCOPY and the CTESTs ARM64: - fixed spurious reads outside the array in the SGEMM tcopy macro - fixed a potential division by zero in CROTG and ZROTG - fixed a segmentation fault in DYNAMIC_ARCH builds (reappeared in 0.3.14) MIPS - fixed a potential division by zero in CROTG and ZROTG - fixed a potential overflow in IMATCOPY/ZIMATCOPY and the CTESTs MIPS64: - fixed a potential division by zero in CROTG and ZROTG SPARC: - fixed a potential division by zero in CROTG and ZROTG ==================================================================== Version 0.3.14 17-Mar-2021 common: * Fixed a race condition on thread shutdown in non-OpenMP builds * Fixed custom BUFFERSIZE option getting ignored in gmake builds * Fixed CMAKE compilation of the TRMM kernels for GENERIC platforms * Added CBLAS interfaces for CROTG, ZROTG, CSROT and ZDROT * Improved performance of OMATCOPY_RT across all platforms * Changed perl scripts to use env instead of a hardcoded /usr/bin/perl * Fixed potential misreading of the GCC compiler version in the build scripts * Fixed convergence problems in LAPACK complex GGEV/GGES (Reference-LAPACK #477) * Reduced the stacksize requirements for running the LAPACK testsuite (Reference-LAPACK #335) RISCV: * Fixed compilation on RISCV (missing entry in getarch) POWER: * Fixed compilation for DYNAMIC_ARCH with clang and with old gcc versions * Added support for compilation on FreeBSD/ppc64le * Added optimized POWER10 kernels for SSCAL, DSCAL, CSCAL, ZSCAL * Added optimized POWER10 kernels for SROT, DROT, CDOT, SASUM, DASUM * Improved SSWAP, DSWAP, CSWAP, ZSWAP performance on POWER10 * Improved SCOPY and CCOPY performance on POWER10 * Improved SGEMM and DGEMM performance on POWER10 * Added support for compilation with the NVIDIA HPC compiler x86_64: * Added an optimized bfloat16 GEMM kernel for Cooperlake * Added CPUID autodetection for Intel Rocket Lake and Tiger Lake cpus * Improved the performance of SASUM,DASUM,SROT,DROT on AMD Ryzen cpus * Added support for compilation with the NAG Fortran compiler * Fixed recognition of the AMD AOCC compiler * Fixed compilation for DYNAMIC_ARCH with clang on Windows * Added support for running the BLAS/CBLAS tests on Windows * Fixed signatures of the tls callback functions for Windows x64 * Fixed various issues with fma intrinsics support handling ARM: * Added support for embedded Cortex M targets via a new option EMBEDDED ARMV8: * Fixed the THUNDERX2T99 and NEOVERSEN1 DNRM2/ZNRM2 kernels for inputs with Inf * Added support for the DYNAMIC_LIST option * Added support for compilation with the NVIDIA HPC compiler * Added support for compiling with the NAG Fortran compiler ==================================================================== Version 0.3.13 12-Dec-2020 common: * Added a generic bfloat16 SBGEMV kernel * Fixed a potentially severe memory leak after fork in OpenMP builds that was introduced in 0.3.12 * Added detection of the Fujitsu Fortran compiler * Added detection of the (e)gfortran compiler on OpenBSD * Added support for overriding the default name of the library independently from symbol suffixing in the gmake builds (already supported in cmake) RISCV: * Added a RISC V port optimized for C910V POWER: * Added optimized POWER10 kernels for SAXPY, CAXPY, SDOT, DDOT and DGEMV_N * Improved DGEMM performance on POWER10 * Improved STRSM and DTRSM performance on POWER9 and POWER10 * Fixed segmemtation faults in DYNAMIC_ARCH builds * Fixed compilation with the PGI compiler x86: * Fixed compilation of kernels that require SSE2 intrinsics since 0.3.12 x86_64: * Added an optimized bfloat16 SBGEMV kernel for SkylakeX and Cooperlake * Improved the performance of SASUM and DASUM kernels through parallelization * Improved the performance of SROT and DROT kernels * Improved the performance of multithreaded xSYRK * Fixed OpenMP builds that use the LLVM Clang compiler together with GNU gfortran (where linking of both the LLVM libomp and GNU libgomp could lead to lockups or wrong results) * Fixed miscompilations by old gcc 4.6 * Fixed misdetection of AVX2 capability in some Sandybridge cpus * Fixed lockups in builds combining DYNAMIC_ARCH with TARGET=GENERIC on OpenBSD ARM64: * Fixed segmemtation faults in DYNAMIC_ARCH builds MIPS: * Improved kernels for Loongson 3R3 ("3A") and 3R4 ("3B") models, including MSA * Fixed bugs in the MSA kernels for CGEMM, CTRMM, CGEMV and ZGEMV * Added handling of zero increments in the MSA kernels for SSWAP and DSWAP * Added DYNAMIC_ARCH support for MIPS64 (currently Loongson3R3/3R4 only) SPARC: * Fixed building 32 and 64 bit SPARC kernels with the SolarisStudio compilers ==================================================================== Version 0.3.12 24-Oct-2020 common: * Fixed missing BLAS/LAPACK functions (inadvertently dropped during the build system restructuring) * Fixed argument conversion macro in LAPACKE_zgesvdq (LAPACK #458) POWER: * Added optimized SCOPY/CCOPY kernels for POWER10 * Increased and unified the default size of the GEMM BUFFER * Fixed building for POWER10 in DYNAMIC_ARCH mode * POWER10 compatibility test now checks binutils version as well * Cleaned up compiler warnings x86_64: * corrected compiler version checks for AVX2 compatibility * added compiler option -mavx2 for building with flang * fixed direct SGEMM pathway for small matrix sizes (broken by the code refactoring in 0.3.11) * fixed unhandled partial register clobbers in several kernels for AXPY,DOT,GEMV_N and GEMV_T flagged by gcc10 tree-vectorizer ARMV8: * improved Apple Vortex support to include cross-compiling ==================================================================== Version 0.3.11 17-Oct-2020 common: * API change: the newly added BFLOAT16 functions were renamed to use the letter "B" instead of "H" to avoid potential confusion with the IEEE "half precision float" type, i.e. the 0.3.10 SHGEMM is now SBGEMM and the corresponding build option was changed from "BUILD_HALF" to "BUILD_BFLOAT16". * Reduced the default BLAS3_MEM_ALLOC_THRESHOLD (used as an upper limit for placing temporary arrays on the stack) to be compatible with a stack size of 1mb (as imposed by the JAVA runtime library) * Added mixed-precision dot function SBDOT and utility functions shstobf16, shdtobf16, sbf16tos and dbf16tod to convert between single or double precision float arrays and bfloat16 arrays * Fixed prototypes of LAPACK_?ggsvp and LAPACK_?ggsvd functions in lapack.h * Fixed underflow and rounding errors in LAPACK SLANV2 and DLANV2 (causing miscalculations in e.g. SHSEQR/DHSEQR, LAPACK issue #263) * Fixed workspace calculation in LAPACK ?GELQ (LAPACK issue #415) * Fixed several bugs in the LAPACK testsuite * Improved performance of TRMM and TRSM for certain problem sizes * Fixed infinite recursions and workspace miscalculations in ReLAPACK * CMAKE builds no longer require pkg-config for creating the .pc file * Makefile builds no longer misread NO_CBLAS=0 or NO_LAPACK=0 as enabling these options * Fixed detection of gfortran when invoked through an mpi wrapper * Improve thread reinitialization performance with OpenMP after a fork * Added support for building only the subset of the library required for a particular precision by specifying BUILD_SINGLE, BUILD_DOUBLE * Optional function name prefixes and suffixes are now correctly reflected in the generated cblas.h * Added CMAKE build support for the LAPACK and multithreading tests POWER: * Added optimized support for POWER10 * Added support for compiling for POWER8 in 32bit mode * Added support for compilation with LLVM/clang * Added support for compilation with NVIDIA/PGI compilers * Fixed building on big-endian POWER8 * Fixed miscompilation of ZDOTC by gcc10 * Fixed alignment errors in the POWER8 SAXPY kernel * Improved CPU detection on AIX * Supported building with older compilers on POWER9 x86_64: * Added support for Intel Cooperlake * Added autodetection of AMD Renoir/Matisse/Zen3 cpus * Added autodetection of Intel Comet Lake cpus * Reimplemented ?sum, ?dot and daxpy using universal intrinsics * Reset the fpu state before using the fpu on Windows as a workaround for a problem introduced in Windows 10 build 19041 (a.k.a. SDK 2004) * Fixed potentially undefined behaviour in the dot and gemv_t kernels * Fixed a potential segmentation fault in DYNAMIC_ARCH builds * Fixed building for ZEN with PGI/NVIDIA and AMD AOCC compilers ARMV7: * Fixed cpu detection on BSD-like systems ARMV8: * Added preliminary support for Apple Vortex cpus * Added support for the Cavium ThunderX3T110 cpu * Fixed cpu detection on BSD-like systems * Fixed compilation in -std=C18 mode IBM Z: * Added support for compiling with the clang compiler * Improved GEMM performance on Z14 @ text @d1 1 a1 1 $NetBSD: patch-Makefile.install,v 1.2 2021/03/25 23:22:35 thor Exp $ d4 1 d8 17 a24 1 @@@@ -85,29 +85,35 @@@@ endif d27 2 a28 1 @@install -pm644 $(LIBNAME) "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)" @ 1.2 log @openblas: update to 0.3.10, fixing build with gcc 10 Fix openblas build issues, mainly by updating to 0.3.10. This pulls in these commits from WIP (newest first): commit 3c6284cba90280bc367cf4d1d8252ae4d6e92e76 Author: Jason Bacon Date: Thu Feb 25 11:56:13 2021 -0600 openblas: Update ONLY_FOR_PLATFORMS documentation commit 8071bf28f3ffc95af046ff3eaaac6983f4f70035 Author: Jason Bacon Date: Thu Feb 25 11:51:32 2021 -0600 openblas*: Successful build on NetBSD commit 056e3d5c972a4b286e8755dbee323a9951855165 Author: Dr. Thomas Orgis Date: Wed Feb 24 18:40:17 2021 +0100 openblas: flags from environment again to un-break PICy build The bug that prompted us to force the compiler flags in the make arguments is fixed. Now we got a different one: The logic that decides to add -fPIC where needed is overridden when doing this, resulting in relocation errors (strangely, not with every toolchain). So let's remove that again and take FFLAGS and friends from the environment again. commit 86af17db8526e629c2c02c6af1f1ce7db6f6ba6d Author: Dr. Thomas Orgis Date: Thu Nov 12 12:44:39 2020 +0100 openblas: version 0.3.10 This updated fixes the build with gcc 10 (segfault in cblat1 test). I did not go all the way to the current 0.3.12, as that would need some hacking of chosen compiler flags. 0.3.13 should be the next one. This commit also fixes the ARCH → ARCH_ sed to change all occurences on a line. This fulfills pkg/55999 and was approved by wiz during freeze. @ text @d1 1 a1 1 $NetBSD: patch-Makefile.install,v 1.1 2020/11/05 16:31:45 bacon Exp $ d5 1 a5 1 --- Makefile.install.orig 2020-06-14 20:03:04.000000000 +0000 d7 1 a7 1 @@@@ -62,8 +62,6 @@@@ endif d11 3 a13 2 - @@cd "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)" ; \ - ln -fs $(LIBNAME) $(LIBPREFIX).$(LIBSUFFIX) d15 1 d18 1 a18 1 @@@@ -71,8 +69,7 @@@@ ifneq ($(NO_SHARED),1) d24 4 a27 1 + ln -fs $(LIBSONAME) $(LIBPREFIX).so d31 16 a46 1 @@@@ -112,16 +109,13 @@@@ endif d50 3 a52 2 - @@cd "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)" ; \ - ln -fs $(LIBNAME) $(LIBPREFIX).$(LIBSUFFIX) d54 1 d62 4 a65 1 + ln -fs $(LIBSONAME) $(LIBPREFIX).so d69 18 @ 1.1 log @math/openblas: import openblas-0.3.7 OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version. OpenBLAS is an open source project supported by Lab of Parallel Software and Computational Science, ISCAS. This package builds the serial library. Needs patching and testing on platforms besides Linux @ text @d1 1 a1 1 $NetBSD$ d5 1 a5 1 --- Makefile.install.orig 2019-08-11 21:23:27.000000000 +0000 d7 1 a7 1 @@@@ -61,8 +61,6 @@@@ endif d16 2 a17 2 @@@@ -70,8 +68,7 @@@@ ifneq ($(NO_SHARED),1) ifeq ($(OSNAME), $(filter $(OSNAME),Linux SunOS Android Haiku)) d25 2 a26 2 ifeq ($(OSNAME), $(filter $(OSNAME),FreeBSD OpenBSD NetBSD DragonFly)) @@@@ -110,16 +107,13 @@@@ endif @