DescriptionWe extract a sub-function from MAKENAME(_nofilter_DX), only dealing with reading one index array, indexing into src array and output to dst array.
Because of the scatter-gather nature, we can not do much burst/batch reading/writing to improve the performance.
We tried Neon vector instructions. We also tried the hand optimize the compiler generated assembly (non-neon) code. The latter seems to have better gain. About 6% improvements, not much though...
Patch Set 1 #Patch Set 2 : Update license information in header #
Total comments: 7
Patch Set 3 : Update patch upon comments #
MessagesTotal messages: 6
|