|
|
Created:
12 years, 2 months ago by apatrick1 Modified:
12 years, 1 month ago CC:
angleproject-review_googlegroups.com, VangelisK, John Bauman Base URL:
http://angleproject.googlecode.com/svn/trunk/ Visibility:
Public. |
DescriptionPrototype asynchronous glTex*Image2D.
Patch Set 1 : #
Total comments: 15
MessagesTotal messages: 8
Hi Daniel, I think Vangelis mentioned this prototype I was working on last week. The idea is to allow ANGLE to asynchronously upload textures, i.e. have glTexImage2D return before the texture is uploaded to the default pool, and do so from shared memory that can be directly written to by the renderer processes. The trick is to use Windows copy-on-write and another mapped view of our shared memory transfer buffers. Explanation inline in the patch. It seems to significantly reduce the time that glTexImage2D takes. In my testing so far, the cost of glTexImage2D for the fast path is more than halved and that's on a workstation. I think it could be better on a lower end machine. I thought you might be interested. It's not ready for review yet. https://codereview.appspot.com/5696096/ Al https://codereview.appspot.com/5696096/diff/6/src/libGLESv2/Texture.cpp File src/libGLESv2/Texture.cpp (right): https://codereview.appspot.com/5696096/diff/6/src/libGLESv2/Texture.cpp#newco... src/libGLESv2/Texture.cpp:129: void* page= MapViewOfFile(fileMapping, FILE_MAP_COPY, 0, pageAlignedOffset, sizeToMap); FILE_MAP_COPY causes the physical pages referenced by pixels to be mapped to a second address range in such a way that Windows will copy them on demand in the event that either address range is modified. This means the caller of glTexImage2D can modify the pixels immediately after glTexImage2D returns without modifying the bits that ANGLE is uploading. https://codereview.appspot.com/5696096/diff/6/src/libGLESv2/Texture.cpp#newco... src/libGLESv2/Texture.cpp:1550: static __int64 times[2]; This profiling code is just here for testing performance. I'll take it out. https://codereview.appspot.com/5696096/diff/6/src/libGLESv2/Texture.cpp#newco... src/libGLESv2/Texture.cpp:1563: if (mFileMappingHandle && path == 0 && image->getFormat() == GL_BGRA_EXT && image->getType() == GL_UNSIGNED_BYTE && unpackAlignment == 4 /* && unpackReverseRowOrder */) It's probably also worth having a heuristic based on texture size. If it's really small the overhead of calling MapViewOfFile might be more than copying the texels synchronously. Also, UNPACK_REVERSE_ROW_ORDER is not implemented but I don't see any technical issues there. https://codereview.appspot.com/5696096/diff/6/src/libGLESv2/Texture.cpp#newco... src/libGLESv2/Texture.cpp:1598: int inputSize = ComputeCompressedSize(image->getWidth(), image->getHeight(), image->getFormat()); If UNPACK_REVERSE_ROW_ORDER was supported I think I could do this path too. I don't see any channel swizzling in the compressed texture upload code. https://codereview.appspot.com/5696096/diff/6/src/libGLESv2/Texture.cpp#newco... src/libGLESv2/Texture.cpp:1658: /*else if (mFileMappingHandle && && format == GL_BGRA_EXT && type == GL_UNSIGNED_BYTE && unpackAlignment == 4 && unpackReverseRowOrder) Chrome would use this path a lot so I'll have a go at prototyping it as well. https://codereview.appspot.com/5696096/diff/6/src/libGLESv2/Texture.h File src/libGLESv2/Texture.h (right): https://codereview.appspot.com/5696096/diff/6/src/libGLESv2/Texture.h#newcode240 src/libGLESv2/Texture.h:240: void setFileMappingHandle(HANDLE fileMapping); I don't think the file mapping should be a parameter of the texture. It could actually be context state though then it isn't clear what kind of data it applies to, e.g. can buffer data be uploaded from a file mapping. I was thinking I could maybe make a file mapping a parameter of a buffer or a buffer like object. Then having the pixels pointer actually be an offset into the buffer is more natural. https://codereview.appspot.com/5696096/diff/6/src/libGLESv2/libGLESv2.cpp File src/libGLESv2/libGLESv2.cpp (right): https://codereview.appspot.com/5696096/diff/6/src/libGLESv2/libGLESv2.cpp#new... src/libGLESv2/libGLESv2.cpp:5299: texture->setFileMappingHandle((HANDLE)param); These handles are the ones returned by Win32 CreateFileMapping. They can either be memory mapped files or in Chrome's case inter-process shared memory backed by the system swap file. This means the renderer processes can write to shared memory that can be without a copy and asynchronously uploaded to the default pool by ANGLE.
Sign in to reply to this message.
Interesting approach. It's kind of like PBOs (http://www.opengl.org/registry/specs/ARB/pixel_buffer_object.txt), but in reverse -- instead of the driver giving the memory to the application to write to, the app gives the memory to the driver to read from. Speaking of PBO's had you considered using them? Cleaner than I was expecting though, and I believe it relies on having ANGLE_unpack_reverse_row_order implemented, right? Seems like this approach could work out though. https://codereview.appspot.com/5696096/diff/6/include/GLES2/gl2ext.h File include/GLES2/gl2ext.h (right): https://codereview.appspot.com/5696096/diff/6/include/GLES2/gl2ext.h#newcode241 include/GLES2/gl2ext.h:241: #define GL_TEXTURE_FILE_MAPPING_ANGLE 0x88FF Gah! that's not an ANGLE enum, that is MAX_ARRAY_TEXTURE_LAYERS See: http://www.opengl.org/registry/doc/enums.html (use enums between 0x6000 and 0x7FFF for temporary development purposes) https://codereview.appspot.com/5696096/diff/6/src/libGLESv2/Texture.cpp File src/libGLESv2/Texture.cpp (right): https://codereview.appspot.com/5696096/diff/6/src/libGLESv2/Texture.cpp#newco... src/libGLESv2/Texture.cpp:1346: HRESULT hr = device->CreateTexture(mWidth, mHeight, 1, 0, D3DFMT_A8R8G8B8, D3DPOOL_SYSTEMMEM, &newTexture, &handle); Note that this will only work on d3d9ex devices https://codereview.appspot.com/5696096/diff/6/src/libGLESv2/Texture.cpp#newco... src/libGLESv2/Texture.cpp:1563: if (mFileMappingHandle && path == 0 && image->getFormat() == GL_BGRA_EXT && image->getType() == GL_UNSIGNED_BYTE && unpackAlignment == 4 /* && unpackReverseRowOrder */) On 2012/02/28 22:09:45, apatrick1 wrote: > It's probably also worth having a heuristic based on texture size. If it's > really small the overhead of calling MapViewOfFile might be more than copying > the texels synchronously. Also, UNPACK_REVERSE_ROW_ORDER is not implemented but > I don't see any technical issues there. Let me know if you plan to implement UNPACK_REVERSE_ROW_ORDER, I believe I have a draft extension spec for it somewhere. Actually that extension is necessary for this to work at all isn't it? I'd prefer if that were implemented separately first. https://codereview.appspot.com/5696096/diff/6/src/libGLESv2/Texture.cpp#newco... src/libGLESv2/Texture.cpp:1598: int inputSize = ComputeCompressedSize(image->getWidth(), image->getHeight(), image->getFormat()); On 2012/02/28 22:09:45, apatrick1 wrote: > If UNPACK_REVERSE_ROW_ORDER was supported I think I could do this path too. I > don't see any channel swizzling in the compressed texture upload code. We do flip the rows when uploading compressed textures, so you would need to ensure that UNPACK_REVERSE_ROW_ORDER was enabled as well to take this path. https://codereview.appspot.com/5696096/diff/6/src/libGLESv2/Texture.cpp#newco... src/libGLESv2/Texture.cpp:1667: // - Main thread joins the worker thread before subsequently assuming mImageData bits are valid. this sounds like "fun"… :-) https://codereview.appspot.com/5696096/diff/6/src/libGLESv2/Texture.h File src/libGLESv2/Texture.h (right): https://codereview.appspot.com/5696096/diff/6/src/libGLESv2/Texture.h#newcode240 src/libGLESv2/Texture.h:240: void setFileMappingHandle(HANDLE fileMapping); Context state does seem a bit more natural. It's similar to how VBOs and PBOs work. As for which kind of data it would apply to, it's whatever is specified in the extension. VBOs/PBOs started like this, VBO was done first for vertex data, PBO extended it for texture data. On 2012/02/28 22:09:45, apatrick1 wrote: > I don't think the file mapping should be a parameter of the texture. It could > actually be context state though then it isn't clear what kind of data it > applies to, e.g. can buffer data be uploaded from a file mapping. I was thinking > I could maybe make a file mapping a parameter of a buffer or a buffer like > object. Then having the pixels pointer actually be an offset into the buffer is > more natural.
Sign in to reply to this message.
Yes I was thinking about using PBOs. The reason I want the application to allocate the memory is because I want it to be shared memory that another process can write to. I think it might be interesting to implement PBOs and then add an extension to associate a file mapping with them. Then it is natural for the pixels pointers to be reinterpreted as offsets within a PBO and there is a natural place for the file mapping state to go. Another option I am researching is having ANGLE enumerate the process handle table to find which file mapping is associated with a given address. Then all this stuff would be an implementation detail and glTex*Image2D would just always be asynchronous if a file mapping was located. Scary multithreading / accidentally holding very large files open for extended periods issues. I'll do another prototype, potentially completely different, once we have pushed the next chrome stable. https://codereview.appspot.com/5696096/diff/6/src/libGLESv2/Texture.cpp File src/libGLESv2/Texture.cpp (right): https://codereview.appspot.com/5696096/diff/6/src/libGLESv2/Texture.cpp#newco... src/libGLESv2/Texture.cpp:1563: if (mFileMappingHandle && path == 0 && image->getFormat() == GL_BGRA_EXT && image->getType() == GL_UNSIGNED_BYTE && unpackAlignment == 4 /* && unpackReverseRowOrder */) On 2012/03/07 22:05:19, dgkoch wrote: > On 2012/02/28 22:09:45, apatrick1 wrote: > > It's probably also worth having a heuristic based on texture size. If it's > > really small the overhead of calling MapViewOfFile might be more than copying > > the texels synchronously. Also, UNPACK_REVERSE_ROW_ORDER is not implemented > but > > I don't see any technical issues there. > > Let me know if you plan to implement UNPACK_REVERSE_ROW_ORDER, I believe I have > a draft extension spec for it somewhere. Actually that extension is necessary > for this to work at all isn't it? I'd prefer if that were implemented > separately first. As things stands UNPACK_REVERSE_ROW_ORDER would be needed. I have a couple of ideas to make it work but they imply more overhead. One is to reverse the rows on another thread and synchronize with the thread before the texture is next accessed (bleh). The other is to upload asynchronously to a temporary texture and then flip it with the blitter. This would also allow the mapped view to be unmapped sooner and hopefully before the client next writes to the memory references by pixels, which implies a page fault and a copy of the page in question. That would mean converting the texture to a render target unfortunately. I will mess around some more. https://codereview.appspot.com/5696096/diff/6/src/libGLESv2/Texture.cpp#newco... src/libGLESv2/Texture.cpp:1667: // - Main thread joins the worker thread before subsequently assuming mImageData bits are valid. On 2012/03/07 22:05:19, dgkoch wrote: > this sounds like "fun"… :-) It is proving to not be fun at all...
Sign in to reply to this message.
On 2012/03/07 22:37:03, apatrick1 wrote: > Yes I was thinking about using PBOs. The reason I want the application to > allocate the memory is because I want it to be shared memory that another > process can write to. I think it might be interesting to implement PBOs and then > add an extension to associate a file mapping with them. Then it is natural for > the pixels pointers to be reinterpreted as offsets within a PBO and there is a > natural place for the file mapping state to go. I agree, this does seem somewhat like an extension to PBO. Although it does also remind me of APPLE_client_storage (http://www.opengl.org/registry/specs/APPLE/client_storage.txt). > Another option I am researching is having ANGLE enumerate the process handle > table to find which file mapping is associated with a given address. Then all > this stuff would be an implementation detail and glTex*Image2D would just always > be asynchronous if a file mapping was located. Scary multithreading / > accidentally holding very large files open for extended periods issues. Erm.. ew? How would you know which file mapping to use? I think I prefer the approach in this patch! > > I'll do another prototype, potentially completely different, once we have pushed > the next chrome stable. > > https://codereview.appspot.com/5696096/diff/6/src/libGLESv2/Texture.cpp > File src/libGLESv2/Texture.cpp (right): > > https://codereview.appspot.com/5696096/diff/6/src/libGLESv2/Texture.cpp#newco... > src/libGLESv2/Texture.cpp:1563: if (mFileMappingHandle && path == 0 && > image->getFormat() == GL_BGRA_EXT && image->getType() == GL_UNSIGNED_BYTE && > unpackAlignment == 4 /* && unpackReverseRowOrder */) > On 2012/03/07 22:05:19, dgkoch wrote: > > On 2012/02/28 22:09:45, apatrick1 wrote: > > > It's probably also worth having a heuristic based on texture size. If it's > > > really small the overhead of calling MapViewOfFile might be more than > copying > > > the texels synchronously. Also, UNPACK_REVERSE_ROW_ORDER is not implemented > > but > > > I don't see any technical issues there. > > > > Let me know if you plan to implement UNPACK_REVERSE_ROW_ORDER, I believe I > have > > a draft extension spec for it somewhere. Actually that extension is necessary > > for this to work at all isn't it? I'd prefer if that were implemented > > separately first. > > As things stands UNPACK_REVERSE_ROW_ORDER would be needed. I have a couple of > ideas to make it work but they imply more overhead. One is to reverse the rows > on another thread and synchronize with the thread before the texture is next > accessed (bleh). The other is to upload asynchronously to a temporary texture > and then flip it with the blitter. This would also allow the mapped view to be > unmapped sooner and hopefully before the client next writes to the memory > references by pixels, which implies a page fault and a copy of the page in > question. That would mean converting the texture to a render target > unfortunately. I will mess around some more. If the row reversal is proving to be a limiting factor here, I'd much rather we change the way we handle the origin differences in ANGLE. We do have another approach that we believe is viable which we could implement to avoid flipping-the-textures-on-load-and-rewriting-the-sampling-instructions approach. > > https://codereview.appspot.com/5696096/diff/6/src/libGLESv2/Texture.cpp#newco... > src/libGLESv2/Texture.cpp:1667: // - Main thread joins the worker thread before > subsequently assuming mImageData bits are valid. > On 2012/03/07 22:05:19, dgkoch wrote: > > this sounds like "fun"… :-) > > It is proving to not be fun at all...
Sign in to reply to this message.
> > As things stands UNPACK_REVERSE_ROW_ORDER would be needed. I have a couple of > > ideas to make it work but they imply more overhead. One is to reverse the rows > > on another thread and synchronize with the thread before the texture is next > > accessed (bleh). The other is to upload asynchronously to a temporary texture > > and then flip it with the blitter. This would also allow the mapped view to be > > unmapped sooner and hopefully before the client next writes to the memory > > references by pixels, which implies a page fault and a copy of the page in > > question. That would mean converting the texture to a render target > > unfortunately. I will mess around some more. > > If the row reversal is proving to be a limiting factor here, I'd much rather we > change the way we handle the origin differences in ANGLE. We do have another > approach that we believe is viable which we could implement to avoid > flipping-the-textures-on-load-and-rewriting-the-sampling-instructions approach. > I think if we didn't reverse the row order of textures, that might open up some other possibilities for uploading asynchronously. I tried hard not to reverse the row order and we discussed different approaches here: http://codereview.appspot.com/3265041 I'm interested in what you have in mind.
Sign in to reply to this message.
Another thought. Chrome will probably soon not use EGL window surfaces at all. We're only using them on XP at the moment. I believe Firefox never has, just pbuffers. Therefore adding back the redundant blit to reverse row order of the default FBO prior to calling Present and chucking out all the row reversal code for texture uploads and FBO readbacks is something that would be okay from my point of view. It would simplify the code a lot. It would also satisfy a nagging feeling of mine that if Direct3D surfaces can be bound to EGLImages, our reversing the row ordering in memory is a little arbitrary from the point of view of a public API. The pbuffers shared with other Direct3D devices would all turn "upside down" of course. That would take some coordination, though it seems to just be an issue of flipping the V coordinate in whatever is consuming the texture. I can do that easily for Chrome.
Sign in to reply to this message.
That would certainly simplify the implementation, but unfortunately it isn't really correct. The other idea we've got (referred to as the "third way" in the ANGLE chapter) is to invert the rendering (in the shader) only when rendering to a texture and to use the shader unmodified when rendering to a window. This ensures that rendered textures (and pbuffers) have the correct orientation, and that windowed rendering also has the correct orientation. This will potentially required us to compile shaders differently depending if they are rendering on-screen or offscreen, but it's possible we could pass that information in with a uniform. I now think that this is overall the best way of solving the problem.
Sign in to reply to this message.
I remember I tried doing the flip in the vertex shader. I almost got it working. There were some corner cases that I couldn't think of a way around though at the time. For example, glBlitFramebuffer on an FBO with a depth buffer attached is hard if the differing "defaultness" of the draw and read FBOs implies a flip. Direct3D's StretchRect can't do it given its restrictions. There are some notes here on line 3531. http://codereview.appspot.com/3265041/diff/12001/src/libGLESv2/Context.cpp Thinking about it some more though, we could make the depth buffers lockable and do the "blit" using the CPU when a flip is required. Since using lockable depth formats might have a performance impact, on 9Ex, we could start out using a regular depth format and replace it with a lockable one if glBlitFramebuffer ever requires it. 9Ex can copy a non-lockable depth buffer into a lockable one using UpdateSurface, like this: http://msdn.microsoft.com/en-us/library/windows/desktop/bb219800(v=vs.85).asp... I think ANGLE_framebuffer_multisampling is sufficiently restrictive about multisampled depth stencil buffers. I think patch set #2 here is more or less what you are describing. It used a uniform to control the flip depending on the type of FBO. From patch set #3 on it did the flip in the fragment shader on texture lookup. http://codereview.appspot.com/3265041 I think it might be worth trying to get #2 working again.
Sign in to reply to this message.
|