DescriptionWe don't get any speedup from unrolling the GL loop on Linux, where we did seem to get speedup from unrolling convolution, so just proposing this as a benchmark.
Erode/Dilate are quite slow over a 400x400 circle, so this benchmark only repeats 3x. We might consider reducing the size of the circle, too - this is just cut & pasted from the blur benchmark.
If erode/dilate are commonly seen in the wild, we can probably SSE-accelerate them for large wins; more than 60% of the time in this benchmark is spent on:
unpack an ARGB into 4 unsigned bytes
take the max of each of the four components
If the memory->register penalty isn't too bad, pmaxub seems custom-made for this issue.
Patch Set 1 #Patch Set 2 : Remove ineffective GLSL changes #
Total comments: 2
Patch Set 3 : Fix copyright date #
MessagesTotal messages: 9
|