mirror of
https://github.com/ultimatepp/ultimatepp.git
synced 2026-05-15 06:05:58 -06:00
[PR #231] [CLOSED] New simd functions (GetMask variants (i8, i16, i32, f32) and equality operator (i8) for SSE2 and NEON). #252
Labels
No labels
pull-request
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/ultimatepp#252
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/ultimatepp/ultimatepp/pull/231
Author: @ismail-yilmaz
Created: 2/7/2025
Status: ❌ Closed
Base:
master← Head:new_simd_functions📝 Commits (5)
d62bf06Core/SSE2: GetMask function variants (i8,i16,i32, f32) are added.eb8d6a7Core/NEON: GetMask function variants (i8,i16,i32, f32) are added.32b4c24Core/SIMD: Equality operator (==) support for i8x16 type (NEON & SSE2)6301841autotest: SIMD test code for GetMask (i8, i16, i32, f32) variants.00d7e59autottest: Etalog log for SIMD3 test is added.📊 Changes
5 files changed (+155 additions, -0 deletions)
View changed files
➕
autotest/SIMD3/Etalon.log(+11 -0)➕
autotest/SIMD3/SIMD3.cpp(+66 -0)➕
autotest/SIMD3/SIMD3.upp(+10 -0)📝
uppsrc/Core/SIMD_NEON.h(+53 -0)📝
uppsrc/Core/SIMD_SSE2.h(+15 -0)📄 Description
This PR adds some very useful and crucial code seemingly missing in U++ SIMD functions:
GetMaski8x16(),GetMaski16x8(),GetMaski32x4(),GetMaskf32x4().These functions allow developers to do some very useful operations easily using the SIMD instructions: Counting, accumulating and determining positions in arrays (e.g they can be used to vectorize string/byte searches).
A simple example is
reference/StreamGetSzPointerexample. The vectorized version of the example can be as follows:The results of this operation with a ~2Gib file on a Ryzen 5600 with 16 Gib RAM (CLANG) is as follows (note that the scalar version is increased to 16 bytes chunks and 16 newline checks to give the compiler an opportunity to vectorize):
Admittedly, this is not definitive, GCC can better vectorize and in fact it does. However, the aim of this example is to show how the usage pattern can be simplified using masks and equality operator for i8x16 data type.
There is also an autotest for the patch.
Please check.
🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.