Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in

Issue 1696060: Review: LLVM optimization pass diddling (Closed)

Can't Edit
Can't Publish+Mail
Start Review
13 years, 9 months ago by larrygritz
13 years, 9 months ago
osl-dev_googlegroups.com, dev-osl_imageworks.com
Base URL:


This rearrangement of LLVM optimization passes approximately doubles the performance of the resulting code. Yes, I know it looks like all the pieces of the engine are splayed across the floor of the garage. That's because I'm still tinkering furiously, but I wanted to get such a big speedup out there (and checkpoint what I have). And no, I do not expect anybody to understand exactly why these optimizations are good, or what the next step is. It's all trial and error. This comes along with some other minor changes: I track the number of times each group is run and change 'optimized' from a bool to an int, in anticipation of doing less optimization at first, then more when it's clear that the shaders of a group are being used heavily. I also suppress the voluminous output of LLVM IR unless the use_llvm > 1 (and so promoted it to int).

Patch Set 1 #

Total comments: 2
Unified diffs Side-by-side diffs Delta from patch set Stats (+160 lines, -26 lines) Patch
src/liboslexec/context.cpp View 1 chunk +5 lines, -2 lines 0 comments Download
src/liboslexec/instance.cpp View 1 chunk +24 lines, -0 lines 1 comment Download
src/liboslexec/llvm_instance.cpp View 6 chunks +113 lines, -14 lines 1 comment Download
src/liboslexec/oslexec_pvt.h View 4 chunks +13 lines, -8 lines 0 comments Download
src/liboslexec/runtimeoptimize.h View 3 chunks +4 lines, -1 line 0 comments Download
src/liboslexec/shadingsys.cpp View 1 chunk +1 line, -1 line 0 comments Download


Total messages: 3
13 years, 9 months ago (2010-08-03 00:38:25 UTC) #1
LGTM, just a few small comments. The level of compatibility between interpreter and LLVM will ...
13 years, 9 months ago (2010-08-03 01:11:49 UTC) #2
13 years, 9 months ago (2010-08-03 03:55:54 UTC) #3
On Aug 2, 2010, at 6:11 PM, <ckulla@gmail.com> <ckulla@gmail.com> wrote:

> The level of compatibility between interpreter and LLVM will have to be
> really high to pull off the delayed optimization well without obscure
> bugs. Did you gather stats about shaders in typical scenes to see if we
> have any shaders that are executed only a handful of times? What is your
> intuition about the percentage of shaders that only get evaluated a few
> times? If this percentage is low, there may not be much mileage to be
> had from delaying optimization.

For exactly these reasons, I'm not having this reviewed yet.  I don't know if
the strategy is very helpful, not least because it requires the interpreter to
run flawlessly and interchangeably.  Though another strategy is to never rely on
the interpreter, always JIT, but JIT quick and easy first, and then after enough
runs JIT again, long and hard.

I haven't tested a wide range of scenes, but on 1/4 res Tweedle 10-25% of shader
groups only dozens to hundreds of times, versus others that run tens of
thousands of times.  Maybe it's not worth extra mechanism to cut out perhaps 1/4
of the optimization time.  But I'm worried about short frames (low-res test
frames, etc.) for which perhaps most or all shader groups aren't run enough for
the optimization to pay off.  Currently, we spend about a minute on Tweedle. 
That's nothing for a long render, but what if there were 10x as many shader
groups and it was just a quick lighting check? So I am very concerned about
finding ways to cut out the optimization time.

I really have only three ideas for how to do it:

1. Keep randomly walking through optimization pass combinatorics and hope to
find a sequence that is low overhead and still does a good job speeding up the

2. Only spend time optimizing the groups that will run enough for it to pay off.

3. Cache the post-optimized IR to disk for subsequent runs (and coincidentally
identical shader groups within the same run).  This will probably be tricky to
determine absolutely positively that the cached IR on disk corresponds EXACTLY
to the unoptimized one in memory.  Exactly what do we hash to ensure that?

I have no other ideas at the moment.  Anybody else?

> src/liboslexec/instance.cpp:436: m_executions = 0;
> Why is this field initialized here instead of in the initialization list
> with the other member variables?

Because it's an atomic int.  I'm not sure all our atomic implementations on all
platforms support initialization, I don't think, but they all allow assignment.

> src/liboslexec/llvm_instance.cpp:3350: if (layer == (nlayers-1)) {
> How about using a variable here to clarify the code:
> bool do_interproc = layer == (nlayers-1);

Sure, will do.

Larry Gritz

Sign in to reply to this message.

Powered by Google App Engine
RSS Feeds Recent Issues | This issue
This is Rietveld f62528b