java coins.driver.Driver -coins:hirOpt=hiroptspec/hiroptspec/...
where each of the hiroptspec specifies some optimizing option such as
cf for constant folding, cpf for constant propagation, cse for local common
subexpression elimination, etc.
ex. java coins.Driver -S -coins:hirOpt=cf/cpf
The order of the optimizations is approximately the same to the order
of the optimization specifications by the command line, but in some
cases, some optimization may be inserted to make ready to apply another
specified optimization, or neglected when the effect of the optimization
is covered by another optimization specified. For example, by the command
java coins.Driver -S -coins:hirOpt=cf/cpf
constant folding is done at first and then constant propagation and
folding.
java coins.Driver -S -coins:hirOpt=pre
common subexpression elimination within basic blocks is automatically
inserted. As another example, by a command
java coins.Driver -S -coins:hirOpt=pre/cse
cse is neglected because pre will cover the effect of cse.
java coins.driver.Driver -coins:hirOpt=fromc
It is not treated by coins.opt.Opt but treated by C-front
java coins.driver.Driver -coins:hirOpt=globalReform
It can cover wide variety of transformations according to the patterns
given as a part of the input program.
noSimplify // do no simplification
cf // constant folding
cpf // constant propagation and folding triggered by the propagation
cse // common subexpression elimination within basic blocks
dce // dead code elimination
fromc // simple optimizations done by C parser
gt // global variable temporalization within basic blocks
The optimizers cf and gt do not require data flow analysis, however,
cpf, cse, dce require some result of data flow analysis.
noSimplify specifies not to do HIR simplification. When this option is not
specified, unused labels are removed immediately before transforming
HIR to LIR. It removes such labels as else-label generated for if-statement
without else part.
java cooins.driver.Driver -S -coins:hirOpt=cf/cpf
int main ()
{
int a, b, c, d, x;
a = 1;
b = 2;
c = a + b;
d = a + b;
if (a+1 < c-1)
x = a - b;
else
x = a + b;
printf("%d\n",x);
}
is compiled by
java coins.driver.Driver -S -coins:hirOpt=cf/cpf/dce,hir2c=opt
then following C program will be generated by converting resulting
HIR to C by hir2c.
int main( )
{
int a;
int b;
int c;
int d;
int x;
a = (int )1;
b = (int )2;
if (((int )2) < ((int )2))
{
x = ((a) - (b));
}
else
{
x = ((a) + (b));
}
(&(printf))( (const char * )((("%d\n"))),x);
}
The statement
d = a + b;
is eliminated as a dead code and condition expression is changed to
an expression that can be evaluated as false at compile time.
In the backend part of COINS, code sequence for only else-part
is generated for such if-statement as follows:
.global main
main:
save %sp,-96,%sp
.L18:
mov 1,%i1
mov 2,%i0
.L20:
add %i1,%i0,%o1
.L21:
sethi %hi(string.17),%o0
or %o0,%lo(string.17),%o0
call printf
nop
.L22:
ret
restore
java -coins:regpromote
is specified, it is not required to specify hirOpt=gt because
register promotion for global variables is done in the COINS backend.
In many cases, regpromote option will produce more efficient code than
hirOpt=gt option.
ex. new ConstFolding(lResults).doSubp(lSubpFlow);The method will return a boolean value indicating whether the program has been changed (optimized) as a result of the call to this optimizer method. For more examples, see the basic optimizer driver class coins.opt.Opt.
java coins.driver.Driver -S -coins:hirOpt=loopexp
java coins.driver.Driver -S -coins:hirOpt=loopif
java coins.driver.Driver -S -coins:hirOpt=inline
java coins.driver.Driver -S -coins:hirOpt=pre
The option inlinedepth controls inline expansion
as it is mentioned later.
The option globalReform is explained in section 5.7.
4 if number of registers N <= 16
8 if N > 16
as default, but it can be changed by specifying expansion number as a
sub-option in such a way as
hirOpt=loopexp.6
Following loops are not expanded:
(1) Outer loop (not an inner-most loop)
(2) Loop including subprogram call
(3) Loop including volatile variable
(4) Non-simple for-loop, that is, a loop having some of following
characteristics
not a for-loop
start condition is null
start condition is not a simple arithmetic comparison expression
for loop control variable
loop control variable is changed in the loop body (not in loop step part)
complexity level of the loop body is large, that is,
if (((R <= 8)&&(E * N > 200))||
((R <= 32)&&(E * N > 300))||
((R > 32)&&(E * N > 600)))
is true, where,
R: number of general registers of the target machine.
E: expansion number
N: the number of executable operators and assign-operators
Statements in the loop body may be changed if reordering does not affect
execution results. For example, following loop
for (i = 0; i < 100; i++) {
sum1 = sum1 + i;
sum2 = sum2 + a[i];
}
will be expanded as follows:
_var5 = 1*7;
_var7 = 1*8;
for (i = 0; i < 100 - _var5; i = i + _var7) {
sum1 = sum1 + i + i + 1 + i + 2 + i + 3 + i + 4 + i + 5
+ i + 6 + i + 7;
sum2 = sum2 + a[i] + a[i+1] + a[i+2] + a[i+3]
+ a[i+4] + a[i+5] + a[i+6] + a[i+7];
}
for (; i < 100; i = i + 1) {
sum1 = sum1 + i;
sum2 = sum2 + a[i];
}
for (i = 0; i < pn; i++) {
lSum = lSum + pa[i];
if (pMode > 0)
lSum = lSum + i;
else
lSum = lSum + i * i;
}
will be transformed as follows:
if (pMode > 0) {
for (i = 0; i < pn; i++) {
lSum = lSum + pa[i];
lSum = lSum + i;
}
}else {
for (i = 0; i < pn; i++) {
lSum = lSum + pa[i];
lSum = lSum + i * i;
}
}
If hirOpt=loopif/loopexp is specified, then the resultant loops in
then-part and else-part will be expanded by the loop expansion optimizer.
(1) Complexity of subprogram body is large
Subprograms having more than 100 HIR nodes are not expanded.
This complexity threshold can be changed by sub-option; for example,
hirOpt=inline.200
will expand subprograms up to 200 HIR nodes.
(2) Call is included in conditional expression of if-statement,
loop-statement, or included in case-selection expression of switch-statement.
(3) Subprogram whose definition is not given in the same compile unit.
Subprograms called before giving its definition can be expanded. Recursive
subprograms are also expanded up to 2 times. For example,
int fact(int p) {
if (p > 0)
return p * fact(p - 1);
else
return 1;
}
will be expanded as follows:
int fact( int p ) {
int _var1, _var3, _var5, _var7, _var9, _var11;
if (p > 0) {
_var1 = p - 1;
if (_var1 > 0) {
_var5 = _var1 - 1;
if (_var5 > 0) {
_var7 = _var5 - 1;
if (_var7 > 0) {
_var9 = _var7 * fact(_var7 - 1);
goto _lab19;
}else {
_var9 = 1;
goto _lab19;
}
_lab19:;
_var11 = _var5 * _var9;
goto _lab22;
}else {
_var11 = 1;
goto _lab22;
}
_lab22:;
_var3 = _var1 * _var11;
goto _lab12;
}else {
_var3 = 1;
goto _lab12;
}
_lab12:;
return p * _var3;
}
else {
return 1;
}
}
The option inlinedepth changes this limitation. The option
locally available: EGen (downward exposed)
After computation, operands are not changed.
available: AvailIn
locally anticipable: AntLoc (upward exposed)
Operands are not set in preceding operations
(before use) in a basic block.
safe: Either anticipable or available.
e-path([b_i ... b_k]) = set of eliminatable computation e included
in b_k, i.e.
{e | e is locally available in b_i and locally anticipable in b_k } &
empty((b_i ... b_k)) & // not computed in intermediate point
e is safe at exit of each node on the path [b_i ... b_k),
where, b_i, ..., b_k are basic block i, ..., basic block k, respectively.
e-path suffix is the maximal suffix of an E-path such that
AntIn * (not AvIn) = true for each node in it.Data flow properties are as follows:
Comp_i : e is locally available in b_i Antloc_i : e is locally anticipable in b_i Transp_i : b_i does not contain definitions of e's operands AvIn_i : e is available at entry of b_i AvOut_i : e is available at exit of b_i AntIn_i : e is anticipable at entry of b_i AntOut_i : e is anticipable at exit of b_i EpsIn_i : entry of b_i is in an e-path suffix EpsOut_i : exit of b_i is in an e-path suffix Redund_i : Occurrence of e in b_i is redundant Insert_i : Insert t_e := e in node b_i Insert_i_j : Insert t_e := e along edge (b_i, b_j) SaIn_i : A Save must be inserted above the entry of b_i SaOut_i : A Save must be inserted above the exit of b_i Save_i : e should be saved in t_e in node b_iwhere, t_e is a temporal variable to hold the value of e.
AvIn_i = PAI_p (AvOut_p) AvOut_i = AvIn_i * Transp_i + Comp_i AntIn_i = AntOut_i * Transp_i + Antloc_i AntOut_i = PAI_s (AntIn_s) EpsIn_i = SIGMA_p (AvOut_p + EpsOut_p) * AntIn_i * (not AvIn_i) EpsOut_i = EpsIn_i * (not Antloc_i) Redund_i = (EpxIn_i + AvIn_i) * Antloc_i Insert_i = (not AvOut_i) * (not EpsOut_i) * PAI_s(EpsIn_s) Insert_i_j = (not AvOut_i) * (not EpsOut_i) * (not Insert_i) * EpsIn_j SaOut_i = SIGMA_s (EpsIn_s + Redund_s + SaIn_s) * AvOut_i SaIn_i = SaOut_i * (not Comp_i) Save_i = SaOut_i * Comp_i * (not Redund_i * Transp_i)where, _s means successor and _p means predecessor.
Save the value of e: computation t_e is inserted before an occurrence of e and the occurrence of e are replaced by t_e (as indicated by Save_i). Insert an evaluation of e: A computation t_e <- e is inserted (as indicated by Insert_i and Insert_i_j). Eliminate a redundant evaluation of e: An occurrence of e is replaced by t_e (as indicated by Redund_i).Before doing partial redundancy elimination, critical edges[3] in control flow graphs are removed by preparatory transformation phase (NormalizeHir). A critical edge is an edge that goes from a basic block having multiple successors to a basic block having multiple predecessors. For example,
switch (i) {
case 0:
s = 0;
case 1:
s = s + i;
....
}
will be changed by the preparatory transformation phase to
switch (i) {
case 0:
s = 0;
goto _lab11;
case 1:
{ _lab11:;
s = s + i;
}
......
}
java coins.driver.Driver -S -coins:hirOpt=fromc xxx.c
Most optimizations done by fromc specification are covered
by other HIR and LIR optimizations. The effect of the optimization
in C front is to make slim the HIR representation of source program
so that succeeding processing will be simplified.
coins.flow: Flow analysis used currently in all HIR optimizations
such as loopif/loopexp/inline/cf/cpf/cse/gt/pre.
coins.aflow: Old version flow analysis which is used currently in
loop parallelizer, coarse grain parallelizing module (-coins:mdf).
In building new modules, it is recommended to use coins.flow version
because coins.aflow version may take long compile time and huge
storage space for large subprograms.
flowRoot: instance of FlowRoot (usually passed from Driver).
subpDefinition: instance for SubpDefinition representing
the HIR subtree of a subprogram.
subpFlow: instance of SubpFlow to represent control/data flow information
of specified subprogram.
The instance of HirSubpFlowImpl is required to be made only once
for each subprogram. All control flow information and data flow
information are linked from this instance and if you renew the
instance, then all flow information previously computed will be reset.
coins.flow.SubpFlow lSubpFlow = new HirSubpFlowImpl(flowRoot, subpDefinition);
and after executing it, the instance can be referred by
flowRoot.fSubpFlow
To do control flow analysis, it is necessary to prepare for it by
flowRoot.flow.controlFlowAnal(lSubpFlow);
After executing this statement, methods related to basic blocks such as
cfgIterator(), getEntryBBlock(), getBBlockOfIR(ir.getIndex()), bblockSubtreeIterator(bblock), ...are made available. There are many other methods for control/data flow analysis as they are shown in the interface coins.flow.SubpFlow. After executing controlFlowAnal(lSubpFlow), methods of coins.flow.BBlock interface such as
getPredList(), getSuccList(), getImmediateDominator(), getPostDominatedChildren(), ...are made available. To see the result of control flow analysis, the coding sequence
coins.flow.ShowControlFlow lShow = flowRoot.controlFlow.getShowControlFlow();
lShow.showAll();
will print the result of control flow analysis.
flowRoot.flow.controlFlowAnal(lSubpFlow);
resets previous control flow analysis information and begins to re-compute.
If there is no change in HIR subtree of SubpDefinition instance, then it is
not necessary to re-compute it. It is recommended to avoid it by
following coding sequence:
if (flowRoot.flow.getFlowAnalStateLevel() <
coins.flow.Flow.STATE_CFG_AVAILABLE)
flowRoot.flow.controlFlowAnal(lSubpFlow);
The method finishHir() and setIndexNumbetToAllNodes() of HIR0 interface
will make
getGlowAnalStateLevel() < coins.flow.Flow.STATE_CFG_AVAILABLE)
true so as to inform re-computation is required.
x, y, t, u : variable or register representing an operand.
(variable may be a compound variable such as
array element or structure element.)
op : operator.
def(x) : shows that value of x is defined (value is set).
def(x, y, ...) : shows that values of x, y, ... are defined.
use(x) : shows that x is used.
p(use(x)) : x is used at program point p.
or_all(z) : construct a set by applying or-operation
on all components indicated by z.
and_all(z) : construct a set by applying and-operation
on all components indicated by z.
The data flow analyzer will compute following information according to requests:
Def(B) =
{ p | for some x, p(def(x)) is included in B and after that point
there is no p'(def(x)) in B. }
Kill(B) =
{ p | for some x, p(def(x)) is included in B' (where, B' != B)
and there exists some defining point of x p'(def(x)) in B. }
Reach(B)=
{ p | there is some path from program point p defining x
(that is p(def(x))) to the entry of B such that there is
no p'(def(x)) on that path. }
Reach(B) = or_all( (Def(B') | (Reach(B') - Kill(B')))
for all predecessors B' of B)
Defined(B) =
{ x | x is defined in B. }
Exposed(B) =
{ x | x is used in B and x is not defined in B
before x is used. }
Used(B) =
{x|x is used in B}
EGen(B) =
{ op(x,y) | expression op(x,y) is computed in B and after
that point, neither x nor y are defined in B. }
Thus, the result of op(x,y) is available after B.
EKill(B) =
{ op(x,y) | operand x or y is defined in B and the
expression op(x,y) is not re-evaluated after
that definition in B. }
If t = op(x,y) is killed in B,
then op(t,u) should also be killed in B.
AvailIn(B) =
{ op(x,y) | op(x,y) is computed in every paths to B and
x, y are not defined after the computations
on the paths. }
Thus, the result of op(x,y) can be used without
re-evaluation in B.
AvailOut(B) =
{ op(x,y) | op(x,y) is computed in every paths to the exit of B and
x, y are not defined after the computations
on the paths. }
Thus, op(x,y) can be used without re-evaluation after B.
Following relations hold.
AvailIn(B) = and_all(AvailOut(B') for all predecessors B' of B)
if B is not an entry block;
AvailIn(B) = { } if B is an entry block.
AvailOut(B) = EGen(B) | (AvailIn(B) - EKill(B))
LiveIn(B) =
{ x | x is alive at entry to B, that is, on some path from
entrance point of B to use point of x, x is not defined. }
Thus, x in LiveIn(B) should not be changed until it is used.
LiveOut(B) =
{ x | x is live at exit from B, that is, there is some
path from B to B' where x is in Exposed(B'). }
Following relations hold.
LiveOut(B) = or_all(LiveIn(B') for all successors B' of B
LiveIn(B) = Exposed(B) | (LiveOut(B) - Defined(B))
DefIn(B) =
{ x | x is always defined at entry to B whichever path
may be taken. }
DefIn(B) = and_all(DefOut(B') for all predecessors B' of B)
DefOut(B) =
{ x | x is always defined at exit from B whichever path
may be taken.}
DefOut(B) = Defined(B) | DefIn(B)
Reach(p(use(x))) =
{ p'(def(x)) | there are some paths from p to p' on which
x is not re-defined. }
DefUseList(p(def(x))) =
{ p'(use(x)) | p(def(x)) is included in p'(use(x)). }
UseDefList(p(use(x))) =
{ p'(def(x)) | p'(def(x)) is included in p(use(x)). }
flowRoot.flow.dataFlwoAnal(subpDefinition);
at the first time. This makes coins.flow.SubpFlow methods such as
getDefinedSyms(), getUsedSyms(), ...
available. It also makes coins.flow.BBlock methods such as
getDefIn(), getDefOut(), getRech(), getLiveIn(), getLiveOut(),
getAvailIn(), getAvailOut(), ....
available. available. There are many other methods for accessing data flow
information as shown in the interface SubpFlow. Such methods can be
called via the SubpFlow instance
flowRoot.fSubpFlow
which is prepared by calling dataFlwoAnal(subpDefinition);
if (flowRoot.flow.getFlowAnalStateLevel() <
coins.flow.Flow.STATE_DATA_FLOW_AVAILABLE)
flowRoot.dataFlow = flowRoot.flow.dataFlowAnal(subpDefinition);
The methods finishHir() and setIndexNumbetToAllNodes() of HIR0 interface
will make getFlowAnalStateLevel() as STATE_DATA_UNAVAILABLE, that is, it will
make
getFlowAnalStateLevel() < coins.flow.Flow.STATE_DATA_FLOW_AVAILABLE
true so as to inform re-computation is required.
AssignStmt
Conditional expressions in LoopStmt
Subprogram call
ReturnStmt
which are treated as a statement in the data flow analysis.
Following methods are available for SetRefRepr.
defSym() returns the set of symbols definitely defined.
modSyms() returns the set of symbols that are possibly defined.
useSyms() returns the set of symbols definitely used (referred).
As for expressions, ExpId interface provides following methods:
getOperandSet() returns the set of variables used as leaf operand.
getExpInf().hasCall() returns true if the expression has call.
SubpFlow interface provides following methods for corresponding subprogram:
cfgIterator() traverses all reachable basic blocks of the subprogram.
bblockSubtreeItrator(BBlock pBBlock) returns iterator that traverse top
subtrees of the basic block pBBlock.
Traversed top-subtrees are
LabeledStmt, AssignStmt, ExpStmt, ReturnStmt,
IfStmt, LoopStmt, SwitchStmt
Conditional expression in IfStmt and LoopStmt
Case-selection expression in SwitchStmt
Call subtree (irrespective of contained in ExpStmt or Exp)
bblockStmtIterator(BBlock pBBlock) returns iterator to traverse all
HIR statements in the basic block pBBlock.
bblockNodeIterator(BBlock pBBlock) returns iterator to traverse all
HIR nodes in the basic block pBBlock.
getSetRefReprOfIR(IR pIr) returns SetRefRepr corresponding to pIr
or returns null if pIr has no SetRefRepr instance.
getExpId(IR pIr) returns ExpId corresponding to pIr.
getExpOfTemp(Var pTemp) returns the expression represented by
the temporal variable pTemp.
setOfGlobalVariables() returns the set of global variables appeared.
setOfAddressTakenVariables() returns the set of address taken variables.
getRecordAlias() returns the instance of RecordAlias that is used to
access alias information of the subprogram.
DefUseList and UseDefList are computed by the information of definitely
defined and definitely used relations because if all possibilities are
unconditionally included, define/use lists and use/define lists will become
very large. Possibly defined symbols and possibly used symbols can be get
from defSyms() and useSyms() by using the set of global variables, the
set of address-taken variables, and the set of variables aliased to a variable.
int printf(char*, ...);
int func(int pa[10], int pn);
int ga1[10] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
int main()
{
int a = 1, b = 2, c, d;
int i = 0;
int *ptrc, *ptry;
int sum;
int x[10];
int y[10] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
int z[10] = {10, 20, 30, 40, 50, 60, 70, 80, 90, 100};
int zz[10] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
ptrc = &c;
ptry = y;
x[i] = a;
*ptrc = x[i] + 1;
sum = c + func(z, 10);
d = zz[2] + zz[3];
printf(" sum=%d ", sum);
for (i = 0; i < 10; i++) {
d = d + (zz[2] + zz[3]);
d = d + z[i] + z[i];
d = d + zz[i] + zz[i];
sum = ga1[i] + ga1[i];
sum = sum + *ptry;
printf(" *ptry=%d d=%d ", *ptry, d);
ptry = ptry + 1;
sum = sum + z[i] + z[i];
sum = sum + zz[i] + zz[i];
d = d + ga1[i] + ga1[i];
}
d = d + ga1[2] + ga1[2];
printf("\n");
d = d + (zz[2] + zz[3]);
d = d + ga1[2] + ga1[2];
printf("%d %d %d \n", sum, c, d);
return 0;
}
basic blocks are
BBlock 1: statements from the beginning up to "i=0;" of for-statement
BBlock 2: conditional expression "i < 10"
BBlock 3: from "d=d+(zz[2]+zz[3]);" to "d=d+ga1[i]+ga1[i];"
BBlock 4: "i++"
BBlock 5: rest of statements (from "d=d+ga1[2]+ga1[2];" to "return 0;"
and
setOfAddressTakenVariables() = { c, z, y }
Available expressions of basic blocks are
BBlock 1
AvailOut= { zz[2], zz[3], &c, zz[2]+zz[3] }
BBlock 2
AvailIn = { zz[2], zz[3], &c, zz[2]+zz[3] }
AvailOut= { zz[2], zz[3], &c, zz[2]+zz[3], i<10 }
BBlock 3
AvailIn = { zz[2], zz[3], &c, zz[2]+zz[3] }
AvailOut= { zz[2], zz[3], &c, zz[2]+zz[3], i<10, z[i], zz[i], ga1[i] }
BBlock 4
AvailIn = { zz[2], zz[3], &c, zz[2]+zz[3], i<10, z[i], zz[i], ga1[i] }
AvailOut= { zz[2], zz[3], &c, zz[2]+zz[3] }
BBlock 5
AvailIn = { zz[2], zz[3], &c, zz[2]+zz[3], i<10 }
AvailOut= { zz[2], zz[3], &c, zz[2]+zz[3], i<10 }
At the statement "ptrc = x[i] + 1;" address taken variables are assumed
to be modified.
defSym = { *ptrc }
modSyms= { c, *ptrc, z, ptrc, y }
At the statement "sum = c + func(z, 10);" global variables are assumed
to be modified.
defSym = { sum }
modSyms= { z, ga1, sum }
Thus, after partial redundancy elimination, z[i]+z[i] is re-computed
after subprogram call but z[i]+z[i] is not re-computed (eliminated
as common subexpression), and zz[2]+zz[3] is recorded in a temporal
variable before entering the for-loop and all later occurrences of
zz[2]+zz[3] are replaced by the temporal variable.
SubpFlow lSubpFlow = new HirSubpFlowImpl(flowRoot, subpDefinition);
ControlFlow lControlFlow = flowRoot.flow.controlFlowAnal(lSubpFlow);
DataFlow lDataFlow = flowRoot.flow.dataFlowAnal(lSubpFlow);
RecordAlias lRecordAlias = lSubpFlow.getRecordAlias();
for (Iterator lBBlockIterator = lSubpFlow.cfgIterator();
lBBlockIterator.hasNext(); ) {
BBlock lBBlock = (BBlock)lBBlockIterator.next();
ExpVector lAvailableExp = lBBlock.getAvailIn();
for (BBlockSubtreeIterator lSubtreeIterator
= lSubpFlow.bblockSubtreeIterator(lBBlock);
lSubtreeIterator.hasNext(); ) {
HIR lSubtree = (HIR)lSubtreeIterator.next();
SetRefRepr lSetRefRepr = lSubpFlow.getSetRefReprOfIR(lSubtree);
Set lModSyms = lSetRefRepr.modSyms();
Set lModSymsAlias = fRecordAlias.aliasSymGroup(lModSyms); // Set of
// symbold aliased to some of modified variables.
for (ExpVectorIterator lExpIterator = lAvailableExp.expVectorIterator();
lExpIterator.hasNext(); ) {
ExpId lExpId = nextExpId();
Set lOperands = lExpId.getOperandSet();
if (! lOperands.retailAll(lModSymsAlias).isEmpty()) {
// Treat the expression corresponding to lExpId as unavailable
// because some operand may be changed by the subtree lSubtree.
}
......
}
....
}
....
}
To see the result of data flow analysis, execute following statement:
flowRoot.dataFlow.showSummary();
The amount of printed result may be large for subprograms with hundreds
of statements.
package coins.flow;
import coins.FlowRoot;
import coins.ir.hir.SubpDefinition;
import java.util.Iterator;
public class
MySubpFlow extends HirSubpFlowImpl implements HirSubpFlow
{
ExpVector fTransparent[];
public MySubpFlow(FlowRoot pFlowRoot, SubpDefinition pSubpDefinition)
{
super(pFlowRoot, pSubpDefinition);
} // MySubpFlow
public void
computeTransparent()
{
ExpVector lEKillAll;
ExpVector lTemp1 = expVector();
ExpVector lTemp2 = expVector();
FlowAnalSymVector lDefined;
int lBBlockNum;
fTransparent = new ExpVector[fBBlockCount + 1]; // Get space
// to record transparent vectors for all basic blocks.
for (Iterator lIterator = cfgIterator();
lIterator.hasNext(); ) { // Repeat for each basic block.
BBlock lBBlock = (BBlock)lIterator.next();
if (lBBlock == null)
continue;
lBBlockNum = lBBlock.getBBlockNumber(); // Get basic block number.
fTransparent[lBBlockNum] = expVector(); // Initiate by zero vector.
lEKillAll = lBBlock.getEKillAll(); // Get the cumulative set of
//expressions killed by some statements in this BBlock.
lEKillAll.vectorNot(lTemp1); // lTemp1 is negation of lEKillAll..
// Get the set of defined variables.
lDefined = (FlowAnalSymVector)lBBlock.getDefined();
lTemp2 = lDefined.flowAnalSymToExpVector(); // Change the set to vector.
lTemp1.vectorSub(lTemp2, fTransparent[lBBlockNum]);
// fTransparent[lBBlockNum] = lTemp1 - lTemp2
if (fDbgLevel > 1) // If trace=Flow.2 or more, print the result.
ioRoot.dbgFlow.print(2, "Transparent B"+lBBlockNum,
fTransparent[lBBlockNum].toStringShort());
}
setComputedFlag(DF_TRSNSPARENT); // Set already-computed flag.
} // computeTransparent
/**
* Get the transparent expression for the basic block pBBlock.
* Expressions are represented by ExpId corresponding to the expression.
* @param pBBlock basic block.
* @return expression vector showing transparent expressions.
*/
public ExpVector
getTransparent( BBlock pBBlock )
{
if (! isComputed(DF_TRSNSPARENT)) // If already computed,
computeTransparent(); // do not re-compute but reuse.
return fTransparent[pBBlock.getBBlockNumber()];
} // getTransparent
} // MySubpFlow
In this example, it is necessary to add
public static final int DF_TRANSPARENT = 26;
as a flag number to SubpFlow.java. To use the subclass for extending
the flow analysis capability, write such coding as
SubpFlow lSubpFlow = new MySubpFlow(flowRoot, subpDefinition);
instead of
SubpFlow lSubpFlow = new HirSubpFlowImpl(flowRoot, subpDefinition);
which is shown in the previous example.
x, y, t, u : variable or register representing an operand.
op : operator.
def(x) : shows that value of x is definitely defined.
mod(x) : shows that value of x is possibly defined.
use(x) : shows that x is used.
p(def(x)) : value of x is (definitely) modified (i.e. via assign)
at program point p.
p(mod(x, y, ...)) : value of x, y, ... are modified at program point p
(modified means possibly changed).
p(use(x)) : x is used at program point p.
or_all(z) : construct a set by applying or-operation
on all components indicated by z.
and_all(z) : construct a set by applying and-operation
on all components indicated by z.
The data flow analyzer will compute following information according to requests:
PDef(B) =
{ p | p(mod(x, y, ...)) is included in B and after that point there is
no p' s.t. p'(def(x)) nor p" s.t. p"(def(y)), ... in B. }
DKill(B) =
{ p | p(def(x)) is not included in B and
p'(def(x)) is included in B. }
PReach(B)=
{ p | there is some path from program point p
that modifies some variables x, y, ... (that is, p(mod(x, y, ...)))
to the entry of B such that there is no p'(def(x)) or no p''(def(y))
or ... on that path. }
PReach(B) = or_all( (PDef(B') | (PReach(B') - DKill(B')))
for all predecessors B' of B)
DDefined(B) =
{ x | x is definitely modified in B. }
PDefined(B) =
{ x | x is posibly modified in B. }
PExposed(B) =
{ x | x is possibly used in B and x is not definitely set in B
before x is used. }
PUsed(B) = {x|x is possibly used in B}
DEGen(B) =
{ op(x,y) | expression op(x,y) is computed in B and after
that point, neither x nor y are possibly set in B. }
Thus, the result of op(x,y) is available after B.
PEKill(B) =
{ op(x,y) | operand x or y is possibly modified in B and the
expression op(x,y) is not re-evaluated after
that definition in B. }
If t = op(x,y) is killed in B,
then op(t,u) should also be killed in B.
DAvailIn(B) =
{ op(x,y) | op(x,y) is computed in every paths to B and
x, y are not modified after the computations
on the paths. }
Thus, the result of op(x,y) can be used without
re-evaluation in B.
DAvailOut(B) =
{ op(x,y) | op(x,y) is computed in every paths to the exit of B and
x, y are not modified after the computations
on the paths. }
Thus, op(x,y) can be used without re-evaluation after B.
Following relations hold.
DAvailIn(B) = and_all(DAvailOut(B') for all predecessors B'
of B) if B is not an entry block;
DAvailIn(B) = { } if B is an entry block.
DAvailOut(B) = DEGen(B) | (DAvailIn(B) - PEKill(B))
PLiveIn(B) =
{ x | x is alive at entry to B, that is, on some path from
entrance point of B to use point of x, x is not definitely set. }
Thus, x in PLiveIn(B) should not be changed until it is used.
PLiveOut(B) =
{ x | x is live at exit from B, that is, there is some
path from B to B' where x is in PExposed(B'). }
Following relations hold.
PLiveOut(B) = or_all(PLiveIn(B') for all successors B' of B
PLiveIn(B) = PExposed(B) | (PLiveOut(B) - DDefined(B))
DDefIn(B) =
{ x | x is always defined at entry to B whichever path
may be taken. }
DDefIn(B) = and_all(DDefOut(B') for all predecessors B' of B)
DDefOut(B) =
{ x | x is always defined at exit from B whichever path
may be taken.}
DDefOut(B) = DDefined(B) | DefIn(B)
ex. find("Def", lBBlock);
(2) get
ex. get("Def", lBBlock);
(3) put
ex. put("Def", lBBlock, lDefVector);
(4) getRaw
ex. getRaw("Def", lBBlock)
In the following code snippet, the Reach vector for the exit BBlock of the
SubpDefinition variable subpDef is going to be stored in the local variable
lReach.
// Establishes the map between the analysis names and the analyzer methods
// that actually do the analysis.
// A key of this map together with the arguments of the associated
// analyzer class methods forms a piece of information that supports the
// automatic analysis mechanism.
// This method will be called only twice during the program life; once
// for HIR and once for LIR.
FlowResults.putRegClasses(new RegisterFlowAnalClasses());
// Instantiate a FlowResults map.
FlowResults lResults = flow.results();
// Instantiate a SubpFlow object, with FlowResults object passed as an
// argument to the factory method.
SubpFlow lSubpFlow = flow.subpFlow(subpDef, lResults);
// Performs control flow analysis.
// Control flow analysis does not support the automatic flow analysis
// mechanism and must be called explicitly.
lSubpFlow.controlFlowAnal();
// Collects some basic information that does not require a complex
// algorithm.
// Some pieces of information obtained here ARE part of the automatic
// analysis picture, but some are not, so I call it explicitly.
lSubpFlow.initiateDataFlow();
// Finds the Reach vector for each of the BBlocks that belong to lSubpFlow.
// There is no need to call lSubpFlow.getExitBBlock().findDef() or
// lSubpFlow.getExitBBlock().findKill() (or lSubpFlow.findReach()) since
// they are called automatically (automatic analysis).
lReach = lResults.get("Reach", lSubpFlow.getExitBBlock());
// OR lReach = lSubpFlow.getExitBBlock().getReach();
...
The following command invokes HIR flow analyzer.
java coins.driver.Driver -S -coins:hirAnal,trace=Flow.2
The trace option is attached to see the result of the flow analysis.
There are 2 levels of analysis:
Also, there are 2 modes of analysis for both AliasAnalLevel1 and AliasAnalLevel2:
-coins:alias=opt
There are fine computation mode and coarse computation mode in the alias
analysis. The fine computation mode will consume much time and memory.
For large subprograms (more than 1000 HIR nodes), the coarse computation mode
is automatically adopted. In the fine computation mode, size of set
showing aliased symbols will be small compared to the coarse computation mode.
RecordAlias lRecordAlias = flowRoot.subpFlow.getRecordAlias();
....
if(lRecordAlias.mayAlias(x, y)) {
// Assume y may be changed when x is changed.
....
}
Set lSetOfVariablesAliased = lRecordAlias.aliasSyms(x);
where, x, y are variables.
#pragma globalReform patternSym copy
#pragma globalReform target main
void copy( char *pa, char *pb, int pn, int pi )
{
iPattern:
{
for (pi = 0; pi < pn; pi++)
*pb++ = *pa++;
}
oPattern:
{
memcpy(pb, pa, pn);
}
}
int main()
{
char x[100] = {1, 2, 3, 4, 5, 6, 7, 8}, y[100];
int j;
char *px = x;
char *py = y;
for (j = 0; j < 100; j++)
*py++ = *px++;
printf(" %d %d %d \n", y[0], y[1], y[2]);
return 0;
}
In this example, the in-pattern is
for (pi = 0; pi < pn; pi++)
*pb++ = *pa++;
and the out-pattern is
memcpy(pb, pa, pn);The description of the global patterns may have some pragmas that start with the keyword "globalReform". The pragma
#pragma globalReform patternSym copyindicates that the subprogram showing the correspondence of above in-pattern and out-pattern is "copy".
#pragma globalReform target mainindicates that above transformation should be applied to the subprogram "main".
for (j = 0; j < 100; j++)
*py++ = *px++;
fits in with the in-pattern and it is transformed to
memcpy(py, px, 100);
where, the parameters pa, pb, pn, pi correspond
to expressions px, py, 100, j respectively and the
parameters contained in the out-pattern are replaced
with these expressions, respectively.#pragma globalReform patternSym pattern1 pattern2 ...where the keyword "globalReform" indicates that this pragma is specified for the global pattern matching, and the keyword "patternSym" indicates that names of subprograms showing pattern correspondence follows in the same line.
#pragma globalReform target subp1 subp2 ...where the keyword "target" indicates that names of subprograms to be transformed follows in the same line.
void patternSymbol( type1 pParam1, type2 pParam2, ... )
{
iPattern:
{
statement1
statement2
....
}
oPattern:
{
statement4
statement5
....
}
}
where, patternSymbol is an identifier showing the name of
the pattern correspondence.
Parameter types (type1, type2, ... ) are usually a type
of expression (Exp or subclass of Exp) and the corresponding
parameter (formal parameter pParam1, pParam2, ...) fits in
with an expression of the type compatible with the parameter type.
A parameter qualified as "const" can match only with a
constant of the specified type.#pragma globalReform stmtParam param1in the scope of the parameter in such a way as
void patternSymbol( type1 pParam1, type2 pParam2, ... )
{
#pragma globalReform stmtParam param1 param2 ...
iPattern:
{
statement1
statement2
....
}
oPattern:
{
statement1
statement2
....
}
}
For the parameter representing a statement (statement parameter),
give void* as its type in the parameter list. The statement
parameter fits in with a statement that is placed at the same
position in the input program as the position where the statement
parameter is located in the in-pattern. As for other matters,
the same rules as above are applied to the statement parameters.
iPattern
{
p * 10;
}
If an in-pattern is an expression then corresponding out-pattern
should also be an expression, e.g.,
oPattern:
{
p * 8 + p * 2;
}
In the global transformation, each one of in-patterns may be considered
as a template to be matched with a part of the input program,
where a parameter in the in-pattern may be considered as a
hole in the template. The template is considered to be fitted in
with a part of the input program if both of them have the same
form except for the hole where any expression/statement may
fall in, that is, in the pattern matching, HIR subtrees having
the same form as some in-pattern are searched where each parameter
in the in-pattern is treated to be a formal parameter to which
an actual parameter of any complexity may correspond as far as
the type of the actual parameter is compatible with the type
of the formal parameter.
#pragma globalReform patternSym absAdd
#pragma globalReform target main
#define BSIZE 256
void absAdd( signed char pd[], int pi, int pm, int psum )
{
iPattern:
{
psum = 0;
for (pi = 0; pi < pm; pi = pi + 1 ) {
if (pd[pi] >= 0)
psum = psum + pd[pi];
else
psum = psum - pd[pi];
}
}
oPattern:
{
psum = absAddChar(pd, pm);
}
}
int main()
{
signed char buf[BSIZE], v;
int i, j;
int sum;
for (j = 0; j < BSIZE; j++) {
buf[j] = 128 - j;
}
sum = 0;
for (i = 0; i < BSIZE; i = i + 1) {
if (buf[i] >= 0)
sum = sum + buf[i];
else
sum = sum - buf[i];
}
printf("sum= %d\n", sum);
return 0;
}
the sequence of statements
sum = 0;
for (i = 0; i < 256; i = i + 1) {
if (buf[i] >= 0)
sum = sum + buf[i];
else
sum = sum - buf[i];
}
can fit in with the in-pattern of the pattern symbol named "absAdd"
and it is changed to the statement calling "absAddChar" function
that does the same computation by using one of the SIMD instructions.
for (i=0; i< 4; i++) {
t[i]=c[3]*m[i][3]+c[2]*m[i][2]
+c[1]*m[i][1]+c[0]*m[i][0];
}
for(j=0; j< 4; j++) c[j]=t[j];
using a temporal variable t and integer variables
i and j.
mov eax,c //edi = &c[0]
mov ebx,m //ebx = &m[0][0]
movq mm0,[eax] //mm0: c[3]: c[2]: c[1]: c[O]
// move quad words to mm0 (64bits)
movq mm1,[ebx] //mm1: m[0][3]:m[0][2]:m[0][1]:m[0][0]
movq mm2,[ebx+8] //mm2: m[1][3]:m[1][2]:m[1][1]:m[1][0]
movq mm3,[ebx+16]//mm3: m[2][3]:m[2][2]:m[2][1]:m[2][0]
movq mm4,[ebx+24]//mm4: m[3][3]:m[3][2]:m[3][1]:m[3][O]
pmaddwd mm1,mm0 //mm1: c[3]*m[0][3]+c[2]*m[0][2]: c[1]*m[0][1]+c[0]*m[0][0]
pmaddwd mm2,mm0 //mm2: c[3]*m[1][3]+c[2]*m[1][2]: c[1]*m[1][1]+c[0]*m[1][0]
pmaddwd mm3,mm0 //mm3: c[3]*m[2][3]+c[2]*m[2][2]: c[1]*m[2][1]+c[0]*m[2][0]
pmaddwd mm4,mm0 //mm4: c[3]*m[3][3]+c[2]*m[3][2]: c[1]*m[3][1]+c[0]*m[3][0]
packssdw mm1,mm2 //mm1: c[3]*m[1][3]+c[2]*m[1][2]: c[1]*m[1][1]+c[0]*m[1][0]
// : c[3]*m[0][3]+c[2]*m[0][2]: c[1]*m[0][1]+c[0]*m[0][0]
// 4 packed words in mm1, mm2 to 4 packed words in mm1
// with saturation operation.
movq [temp1],mm1 // short temp1[4];
packssdw mm3,mm4 //mm3: c[3]*m[3][3]+c[2]*m[3][2]: c[1]*m[3][1]+c[0]*m[3][0]
// : c[3]*m[2][3]+c[2]*m[2][2]: c[1]*m[2][1]+c[0]*m[2][0]
movq[temp2],mm3 // short temp2[4];
emms // empty MMX state so that FPU reg can be used for floating op.
)
c[0]=temp1[0]+temp1[1]; // Move the results from temp1 and temp2.
c[1]=temp1[2]+temp1[3];
c[2]=temp2[0]+temp2[1];
c[3]=temp2[2]+temp2[3];
In the MMX instruction coding, the saturation
operation is executed but in the above C language
coding, the saturation operation is not yet
considered.
#pragma globalReform patternSym linearTrans
#pragma globalReform target main
int printf(char*, ...);
void linearTrans(short pc[4], short pm[4][4], short pt[4], int pi, int pj)
{
short temp1[4], temp2[4];
iPattern:
{
for (pi=0;pi< 4;pi++) {
pt[pi]=pc[3]*pm[pi][3]+pc[2]*pm[pi][2]+pc[1]*pm[pi][1]+pc[0]*pm[pi][0];
}
for(pj=0;pj< 4;pj++) pc[pj]=pt[pj];
}
oPattern:
{
asm (
"#param %I32,%I32,%I32,%I32\n"
"#clobber %mm0,%mm1,%mm2,%mm3,%mm4\n"
"movq (%1),%mm0\n"
"movq (%2),%mm1\n"
"movq 8(%2),%mm2\n"
"movq 16(%2),%mm3\n"
"movq 24(%2),%mm4\n"
"pmaddwd %mm0,%mm1\n"
"pmaddwd %mm0,%mm2\n"
"pmaddwd %mm0,%mm3\n"
"pmaddwd %mm0,%mm4\n"
"packssdw %mm2,%mm1\n"
"movq %mm1,(%3)\n"
"packssdw %mm4,%mm3\n"
"movq %mm3,(%4)\n"
"emms\n",
pc, pm, temp1, temp2
);
pc[0]=temp1[0]+temp1[1];
pc[1]=temp1[2]+temp1[3];
pc[2]=temp2[0]+temp2[1];
pc[3]=temp2[2]+temp2[3];
}
} // linearTrans
int main()
{
int i;
short tt[4][4] = {{1, 0, 0, 0}, {0, 1, 0, 1}, {0, 0, 1, 0}, {0, 0, 0, 1}};
short xyz[4] = {10, 12, 13, 3};
short tmp[4] = {0};
printf("before %d %d %d %d \n", xyz[0], xyz[1], xyz[2], xyz[3]);
for (i=0;i< 4;i++) {
tmp[i]=xyz[3]*tt[i][3]+xyz[2]*tt[i][2]+xyz[1]*tt[i][1]+xyz[0]*tt[i][0];
}
for(i=0;i< 4;i++) xyz[i]=tmp[i];
printf("linTrans %d %d %d %d \n", xyz[0], xyz[1], xyz[2], xyz[3]);
return 0;
}
The function definition linearTrans defines the
C coding pattern as the block labeled by iPattern
and the corresponding MMX coding sequence as the
block labeled by oPattern.
asm("#param descriptor-list\n"
"#clobber destroyed registers...\n"
"instruction 1\n"
...
"instruction n\n",
input arguments(any expression)...);
where,
for (i=0;i< 4;i++) {
tmp[i]=xyz[3]*tt[i][3]+xyz[2]*tt[i][2]+xyz[1]*tt[i][1]+xyz[0]*tt[i][0];
}
for(i=0;i< 4;i++) xyz[i]=tmp[i];
in the main program matches with this pattern.
with globalReform without globalReform
real 8.712s 17.389s
user 8.624s 17.331s
sys 0.015s 0.031s
This shows the effectiveness of MMX code generation
by using globalReform optimization.
#pragma globalReform patternSym recmult
#pragma globalReform target fact
int recmult( int px )
{
iPattern:
{
if (px <= 1)
return 1;
else
return px * recmult(px - 1);
}
oPattern:
{
int lx, i;
lx = 1;
for (i = 1; i <= px; i++) {
lx = lx * i;
}
return lx;
}
}
int fact( int p )
{
if (p <= 1)
return 1;
else
return p * fact(p - 1);
}
Note that this is not a general transformation of recursion to loop
but a transformation of program fragments that matches with the
given in-pattern.
#pragma globalReform patternSym extractPower
#pragma globalReform nonterminal power
#pragma globalReform target main
int _bnfOr(int p, ...);
double power( double p1 );
double transformPower( double p2 );
void extractPower( double pv1 )
{
iPattern:
{
power(pv1) * pv1;
}
oPattern:
{
transformPower(power(pv1) * pv1);
}
}
double power( double pv2)
{
_bnfOr(2,
pv2 * pv2,
power(pv2) * pv2 );
}
double a = 2.0, b = 3.0, c = 4.0;
int main()
{
double x, y, z;
x = a * a * a;
y = b * b * b * b;
z = a * a * b;
printf(" %f %f %f \n", x, y, z);
return 0;
}
The above example extracts power expressions that multiply the same
variable several times and call the function transformPower.
(This example is made to explain the usage of pattern nonterminals
and is not intended to increase execution speed.)
The function power is a pattern nonterminal that is similar to BNF
nonterminal. In BNF, a power expression of variable v may be
defined as
powerExp ::= powerExp "*" var | varbut if the symbol var is a nonterminal representing variables, then it is not possible to restrict its operand to the same variable. If nonterminals may have parameters, then such restriction can be specified.
double power( double pv2) {
_bnfOr(2, pv2*pv2, power(pv2)*pv2);
}
pv2 is a formal parameter of the pattern nonterminal named "power".
_bnfOr(2, pv2*pv2, power(pv2)*pv2 );
represents to select either
int _bnfOr(int, ...);
The first parameter of _bnfOr is a dummy one attached
to make _bnfOr as a C function having indefinite number of parameters.
The search of production is done from left to right and
the one fitted first is selected. If there is no production fitted,
then the nonterminal is treated to be not fitted with the input program.
a * a * a
is given as an expression of input program, then
power(pv2) * pv2
is selected as the production to be matched because the other production
pv2*pv2 does not fit in with a*a*a.
In the next step, trial matching of power(pv2) with a*a is performed
after tentatively setting pv2 to the variable "a". This trial succeeds
by selecting the production pv2*pv2.
Thus, the in-pattern fits in with the input expression a*a*a.
b * b * b * b
is given as an input expression, at the first trial,
power(pv2) * pv2
is selected matching pv2 to b and trying to match power(pv2) with
b*b*b peeling off the trailing "*b". This trial matching succeeds
and the in-pattern fits in with the expression b*b*b*b, too.
a * a * b
is given as an input expression, power(pv2) does not fits in
with it, because, at the first trial,
power(pv2) * pv2
is selected matching pv2 to b, then in the second trial,
power(pv2) fails to match with a*a
as the power(pv2) requests the variable b as a multiplicand.
#pragma globalReform patternSym loopUnroll
#pragma globalReform nonterminal subsVar expS termS factS iExp
#pragma globalReform target main
#define BSIZE 999
int expS( int pzz[], int pi);
int termS( int pzz[], int pi);
int factS( int pzz[], int pi);
int subsVar( int pzz[], int pi);
int iExp( int px);
int printf(char*, ...);
void loopUnroll( int pzz[], int pi, int pFrom, int pTo)
{
iPattern:
{
for (pi = pFrom; pi < pTo; pi++) {
pzz[iExp(pi)] = expS(pzz, pi);
}
}
oPattern:
{
for (pi = pFrom; pi < pTo-1; pi=pi+2) {
pzz[iExp(pi)] = expS(pzz, pi);
pzz[iExp(pi+1)] = expS(pzz, pi+1);
}
if ((pTo-pFrom) % 2 != 0)
pzz[pTo-1] = expS(pzz, pTo-1);
}
}
int subsVar( int pzz1[], int pi1)
{
pzz1[iExp(pi1)];
}
int expS( int pzz2[], int pi2)
{
#pragma globalReform transparentFitting pc (pzz2, pi2)
int pc;
_bnfOr(1, expS(pzz2,iExp(pi2))+termS(pzz2,iExp(pi2)),
expS(pzz2,iExp(pi2))-termS(pzz2,iExp(pi2)),
expS(pzz2,iExp(pi2))+pc,
expS(pzz2,iExp(pi2))-pc,
expS(pzz2,iExp(pi2)) );
}
int termS( int pzz3[], int pi3 )
{
#pragma globalReform transparentFitting pc2 (pzz3, pi3)
int pc2;
_bnfOr(1, termS(pzz3,iExp(pi3))*factS(pzz3,iExp(pi3)),
termS(pzz3,iExp(pi3))/factS(pzz3,iExp(pi3)),
termS(pzz3,iExp(pi3))*pc2,
termS(pzz3,iExp(pi3))/pc2,
factS(pzz3,iExp(pi3)) );
}
int factS( int pzz4[], int pi4 )
{
#pragma globalReform transparentFitting pc3 (pzz4, pi4)
int pc3;
_bnfOr(2, pc3*subsVar(pzz4,iExp(pi4)),
pc3/subsVar(pzz4,iExp(pi4)),
subsVar(pzz4,iExp(pi4)));
}
int iExp( int px )
{
#pragma globalReform transparentFitting pc4 (px)
int pc4;
_bnfOr(3, px+pc4, px-pc4, px);
}
int main() {
int zz[BSIZE]; int i, n;
n = BSIZE;
for (i = 0; i < n; i++) {
zz[i] = i;
}
for (i = 0; i < n; i++) {
zz[i] = zz[i]*2;
}
printf(" %d %d %d %d %d\n", n, zz[0], zz[1], zz[n-2], zz[n-1]);
return 0;
}
The pattern named loopUnroll changes loops such as
for (i = 0; i < n; i++) {
zz[i] = zz[i]*2;
}
to such a form as
for (i = 0; i < n-1; i=i+2) {
zz[i] = zz[i]*2;
zz[i+1] = zz[i+1]*2;
}
if ((n-0) % 2 != 0)
zz[n-1] = zz[n-1]*2;
by unrolling the loops. The factors of the loop body may be
any subscripted variable having i+c2 or i-c2 as its subscript,
where c1 and c2 may be any integer expression that does not contain i.
Pattern nonterminals are used to specify such syntax. The pattern
nonterminal
int iExp( int px ) {
#pragma globalReform transparentFitting pc4 (px)
int pc4;
_bnfOr(3, px + pc4, px - pc4, px );
}
fits in with one of expressions px+pc4, px-pc4, px
where the expression pc4 does not contain the variable
px which is conveyed from the upper construct.
int subsVar( int pzz1[], int pi1) {
pzz1[iExp(pi1)];
}
fits in with any subscripted variable whose subscript
is an integer expression satisfying the restriction of
iExp(pi1), where pzz1 is an array conveyed from the upper
construct, and pi1 is an integer variable conveyed from
the upper construct.
expS ::= expS "+" termS
| expS "-" termS
| expS "+" pc
| expS "-" pc
| termS
termS ::= termS "*" factS
| termS "/" factS
| termS "*" pc2
| termS "/" pc2
| factS
factS ::= pc3 "*" subsVar
| pc3 "/" subsVar
| subsVar
The pattern nonterminal
int factS( int pzz4[], int pi4 ) {
#pragma globalReform transparentFitting pc3 (pzz4, pi4)
int pc3;
_bnfOr(2,
pc3 * subsVar(pzz4, i(pi4)),
pc3 / subsVar(pzz4, iExp(pi4)),
subsVar(pzz4, iExp(pi4)) );
}
fits in with a subscripted variable as described above or
a multiplication/division expression whose first operand
pc3 does not contain any of pzz4 and pi4. Variables
pzz4, pi4 are conveyed from the upper construct.
int termS( int pzz3[], int pi3 ) {
#pragma globalReform transparentFitting pc2 (pzz3, pi3)
int pc2;
_bnfOr(1,
termS(pzz3, iExp(pi3)) * factS(pzz3, iExp(pi3)),
termS(pzz3, iExp(pi3)) / factS(pzz3, iExp(pi3)),
termS(pzz3, iExp(pi3)) * pc2,
termS(pzz3, iExp(pi3)) / pc2,
factS(pzz3, iExp(pi3)) );
}
fits in with any subscripted expression specified by
factS(pzz3, iExp(pi3)) when the top
operator of the expression is neither '*' nor '/'. If the top
operator of the expression is either '*' or '/', then the pattern
nonterminal fits in with the expression when its first operand
fits in with termS(pzz3, iExp(pi3))
and its second operand is an expression specified by
subsVar(pzz3, iExp(pi3)) or pc2. The variables pzz3, pi3 are
conveyed from the upper construct.
int expS( int pzz2[], int pi2) {
#pragma globalReform transparentFitting pc (pzz2, pi2)
int pc;
_bnfOr(1,
expS(pzz2, iExp(pi2)) + termS(pzz2, iExp(pi2)),
expS(pzz2, iExp(pi2)) - termS(pzz2, iExp(pi2)),
expS(pzz2, iExp(pi2)) + pc,
expS(pzz2, iExp(pi2)) - pc,
termS(pzz2, iExp(pi2)) );
}
matches with any subscripted variable expression specified by
for (i = 0; i < n; i++) {
zz[i] = zz[i]*2;
}
to such form as
for (i = 0; i < n-1; i=i+2) {
zz[i] = zz[i]*2;
zz[i+1] = zz[i+1]*2;
}
if ((n-0) % 2 != 0)
zz[n-1] = zz[n-1]*2;
COINS has the optimization module that is dedicated to loop unrolling.
It is implemented in about 2000 lines of coding and can be applied
to wide variety of loops. The above global pattern matching
example cannot replace the dedicated loop unrolling module but
shows another
implementation of loop unrolling transformation in less than 100
lines of coding applicable only for typical coding patterns.