Issue 1696060: Review: LLVM optimization pass diddling

Issue 1696060: Review: LLVM optimization pass diddling (Closed)

Can't Edit
Can't Publish+Mail
Start Review

Created:
14 years, 11 months ago by larrygritz

Modified:
14 years, 11 months ago

Reviewers:
ckulla

CC:
osl-dev_googlegroups.com, dev-osl_imageworks.com

Base URL:
http://openshadinglanguage.googlecode.com/svn/trunk/

Visibility:
Public.

Description

This rearrangement of LLVM optimization passes approximately doubles the performance of the resulting code. Yes, I know it looks like all the pieces of the engine are splayed across the floor of the garage. That's because I'm still tinkering furiously, but I wanted to get such a big speedup out there (and checkpoint what I have). And no, I do not expect anybody to understand exactly why these optimizations are good, or what the next step is. It's all trial and error. This comes along with some other minor changes: I track the number of times each group is run and change 'optimized' from a bool to an int, in anticipation of doing less optimization at first, then more when it's clear that the shaders of a group are being used heavily. I also suppress the voluminous output of LLVM IR unless the use_llvm > 1 (and so promoted it to int).

Patch Set 1 #

Total comments: 2

Created: 14 years, 11 months ago

Download [raw] [tar.bz2]

Unified diffs	Side-by-side diffs	Stats (+160 lines, -26 lines)			Patch
src/liboslexec/context.cpp	View	1 chunk	+5 lines, -2 lines	0 comments	Download
src/liboslexec/instance.cpp	View	1 chunk	+24 lines, -0 lines	1 comment	Download
src/liboslexec/llvm_instance.cpp	View	6 chunks	+113 lines, -14 lines	1 comment	Download
src/liboslexec/oslexec_pvt.h	View	4 chunks	+13 lines, -8 lines	0 comments	Download
src/liboslexec/runtimeoptimize.h	View	3 chunks	+4 lines, -1 line	0 comments	Download
src/liboslexec/shadingsys.cpp	View	1 chunk	+1 line, -1 line	0 comments	Download

Messages

Total messages: 3

Expand All Messages | Collapse All Messages

ckulla

LGTM, just a few small comments. The level of compatibility between interpreter and LLVM will ...

14 years, 11 months ago (2010-08-03 01:11:49 UTC) #2

lg_imageworks.com

14 years, 11 months ago (2010-08-03 03:55:54 UTC) #3

On Aug 2, 2010, at 6:11 PM, <ckulla@gmail.com> <ckulla@gmail.com> wrote:

> The level of compatibility between interpreter and LLVM will have to be
> really high to pull off the delayed optimization well without obscure
> bugs. Did you gather stats about shaders in typical scenes to see if we
> have any shaders that are executed only a handful of times? What is your
> intuition about the percentage of shaders that only get evaluated a few
> times? If this percentage is low, there may not be much mileage to be
> had from delaying optimization.

For exactly these reasons, I'm not having this reviewed yet.  I don't know if
the strategy is very helpful, not least because it requires the interpreter to
run flawlessly and interchangeably.  Though another strategy is to never rely on
the interpreter, always JIT, but JIT quick and easy first, and then after enough
runs JIT again, long and hard.

I haven't tested a wide range of scenes, but on 1/4 res Tweedle 10-25% of shader
groups only dozens to hundreds of times, versus others that run tens of
thousands of times.  Maybe it's not worth extra mechanism to cut out perhaps 1/4
of the optimization time.  But I'm worried about short frames (low-res test
frames, etc.) for which perhaps most or all shader groups aren't run enough for
the optimization to pay off.  Currently, we spend about a minute on Tweedle. 
That's nothing for a long render, but what if there were 10x as many shader
groups and it was just a quick lighting check? So I am very concerned about
finding ways to cut out the optimization time.

I really have only three ideas for how to do it:

1. Keep randomly walking through optimization pass combinatorics and hope to
find a sequence that is low overhead and still does a good job speeding up the
code.

2. Only spend time optimizing the groups that will run enough for it to pay off.

3. Cache the post-optimized IR to disk for subsequent runs (and coincidentally
identical shader groups within the same run).  This will probably be tricky to
determine absolutely positively that the cached IR on disk corresponds EXACTLY
to the unoptimized one in memory.  Exactly what do we hash to ensure that?

I have no other ideas at the moment.  Anybody else?

> src/liboslexec/instance.cpp:436: m_executions = 0;
> 
> Why is this field initialized here instead of in the initialization list
> with the other member variables?

Because it's an atomic int.  I'm not sure all our atomic implementations on all
platforms support initialization, I don't think, but they all allow assignment.

> src/liboslexec/llvm_instance.cpp:3350: if (layer == (nlayers-1)) {
> 
> How about using a variable here to clarify the code:
> 
> bool do_interproc = layer == (nlayers-1);

Sure, will do.

--
Larry Gritz
lg@imageworks.com

Expand All Messages | Collapse All Messages