Newer 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Older
CS704 – Advanced
Computer Architecture-II
Due Date:
1st June, 2015
Assignment 1
Instructions to Solve
Assignments
The
purpose of assignments is to give you hands on practice. It is expected that
students will solve the assignments themselves. Following rules will apply
during the evaluation of assignment.
·
Cheating from any source will result in zero marks in the
assignment.
·
Any student found cheating in any two of the assignments
submitted will be awarded "F" grade in the course.
·
No assignment after due date will be accepted.
Question 1: Total Points
(10)
The 16-bit Zilog Z8001 has the following
general instruction format:
15
|
14
|
13
|
12
|
11
|
10
|
9
|
8
|
7
|
6
|
5
|
4
|
3
|
2
|
1
|
0
|
Mode
|
Opcode
|
w/b
|
Operand2
|
Operand1
|
The mode
field specifies how to locate the operands from the operand fields. The w/b
filed is used in certain instructions to specify whether the operands are byte
or 16-bit words. The operand 1 field
may (depending on the mode field
contents) specify one of 16 general-purpose registers. The operand 2 field may specify any general-purpose registers except
register 0. When the operand 2 field
is all zeros, each of the original opcodes takes on a new meaning.
(a) How many opcodes are provided on the Z8001?
(b)Suggest
an efficient way to provided more opcodes and indicate the trade-off involved.
Solution:
- Let us analyze the problem a bit:
- The Opcode field of the instruction has five bits available i.e. from bit 9 to 13. Hence this amounts to a total of 25 = 32 Opcodes.
- As stated in the question that when operand 2 field is all zeros, each of original Opcodes takes on new meaning. This means 32 additional Opcodes if the value of operand 2 is all zero.
- Mode field consists of two bytes hence 22 = 4 different modes are possible. Theoretically speaking for each combination of mode field there is a whole class of Opcodes available.
- w/b field consists of one bit so only two possible values exist. Hence in this case also for each possible value of w/b field, there is a whole class of Opcodes available.
Hence there comes a total of (32 + 32)*4*2 = 512
possible Opcodes. These Opcodes result from different combinations of mode
field, w/b field and operand 2 field.
- In order to provide room for additional Opcodes, we must sacrifice some functionality of the system. We can’t reduce the field size of mode field as it will severely reduce the functionality of the system and the total Opcodes will also remain the same i.e. (64 + 64) *2 * 2 = 512. Also the w/b field can’t be reduced further as it is already 1 bit of size. We also can’t reduce the size of operand fields because they have to access sixteen general purpose registers that is possible only through 4 bit value. Hence some other strategy must be considered.
As we have seen that operand 2 can’t access
general purpose register 0 this is because if it has to access register 0 it
will place its address as 0000 but this all zero code actually changes the
meaning of original Opcodes. We can apply the same technique to operand 1 also.
By making it not to use general purpose register 15 or R15 which is addressed
as 1111 we can assign new meanings to all original Opcodes.
Hence by making the operand 1 field not use R15
and saying that if this field contains all 1s then all original Opcodes will
have different meanings we can increase the number of Opcodes to (32 + 32 +
32)*4*2 = 768. So we have increased the number of Opcodes by 768 – 512 by 256
after sacrificing R15 from operand 1 field.
Question 2: Total Points (15)
A computer architect is
designing a hardware datapath implementation and the architect has determined
following circuit element delays.
Instruction Memory
|
150 ps
|
Decode
|
70 ps
|
Register Fetch
|
60 ps
|
ALU
|
150 ps
|
Data Memory
|
200 ps
|
Register Write Back
|
60 ps
|
(a) What is the length of a clock cycle for a single cycle datapath
implementation?
(b)What
would be the frequency of a processor, corresponding to single datapath
implementation?
(c) What would be the length of fastest clock cycle for a
5-stage pipeline datapath? What would be the corresponding processor frequency?
(d)How
much faster is the 5-stage pipelined datapath compared to the single cycle
datapath implementation?
Solution
(A) What is the length
of a clock cycle for a single cycle datapath implementation?
Solution:
Time(lw) = Time(IF)
+ Time(ID+Reg.File) +
Time(ALU) +
Time(MemRead) + Time(Reg.FileWrite)
Time(lw)
= 150 + (70+60) +
200 + 60
Time(lw)
= 690ps
(B) What would be the frequency of a processor,
corresponding to single datapath implementation?
Solution:
As
we know
Frequency = where c is the maximum clock cycle.
The value of clock cycle for single
cycle is 1. It means that frequency of processor corresponding to single cycle
datapath is 1 . Frequency can be defined as
Frequency = 1/clock rate
=1/690* 10^-12*10^9 =1/690*10^
-3=1000/690= 1.44 9GHz=1.45GHz
(C) What would be the length of fastest clock
cycle for a 5-stage pipeline datapath? What would be the corresponding
processor frequency?
Solution:
Fastest clock cycle is that whose latency is minimum i.e 60 ps. but
when we calculate the frequency, we have to consider the slowest cycle length
in multi-cycle datapath that is 200 ps.
Frequency =
So
Frequency
=
(D) How much faster is
the 5-stage pipelined datapath compared to the single cycle datapath
implementation?
Solution:
Single
cycle execution time= 690 ps
Execution
time for multi cycle = 200ps
So
690/200ps=3.45 times faster
It
means that 5-stage pipelined datapath is 3.45 time faster than single cycle
datapath.
If
we analyze this according to frequency point of view then
5GHz/1.45GHz=3.45
times faster
Question 3: Total Points
(10)
Consider
the following mathematical expressions:
W= B
<< 3
X = 7B +
B + C+ D
For
each expression, write an assembly language code:
(a) For Reg-Reg
architecture
(b)For Reg-Mem architecture
Solution
W= B
<< 3
|
|
LOAD R2, B
Shl [R2], 3
STORE R2, W
|
Shl [B], 3
STORE B, W
OR
Shl [R2] , 3
STORE R3,W
|
X = 7B + B + C+ D
|
|
LOAD
R2, B
LOAD
R1, C
LOAD
R3, D
ADD
R4, R3, R1
ADD
R5, R4, R2
MUL
R2, R2, 7
ADD
R4, R5,R2
STORE
R4, X
|
LOAD
R2, B
ADD
R1, R2, C
ADD
R3, R1, D
MUL
R4, R2, 7
ADD
R4, R4, R3
STORE
R4 , X
|
Question 4:
Total Points (15)
Read the research paper
titled “LLVA: A Low-level Virtual
Instruction Set Architecture”, and answer the following question.
What is the basic idea of
LLVA and how it is better than previous virtual architectures (DAISY and
Transmeta’s Crusoe)?
Solution:
Solution:
This paper is actually
proposed a strategy for implementing virtual ISA (LLVA) on arbitrary hardware.
It is basically a layer resides over actual ISA. This layer helps and proved
flexibility for processor designer. Virtual
instruction set is the same as LLVA is to decouple the program representation
from actual hardware implementation. This is same as in high level programming
language command is to computer and compiler is used to translate so that the
microprocessor understand it. Similarly Virtual ISA can include rich commands
and information which can’t be implemented (or is not suitable for direct
implementation) on hardware level. It means working of Virtual ISA is not
possible without translator just like a high level language without compiler or
interpreter. In order to avoid huge and complex changing in design of
microprocessor due to implement some functionality at hardware level and
introduce OP Codes for them, the researchers take interest in virtual ISA use
and development. The computers that
implement Virtual ISA are called Virtual Instruction Set Computers.
Components of LLVA
Following are the main
components of instruction set:
·
Register
Set and Memory Model
Infinite, typed,
register file is used by LLVA where all registers are in Static Single Assignment
(SSA) form. Register can hold scalar values i.e. Boolean, integer, float and
pointer. There is partition of memory into stack, heap, and global memory and
it allocated explicitly. LLVA is a load/store architecture.
·
LLVA
instruction set
There are only 28
instructions in a small, orthogonal instruction set. During translation process
multiple instructions of LLVA are combined into more complex physical ISA
instructions wherever possible. LLVA provides low level operation in machine
independent form that can be used to implement high level language feature, so
LLVA stands for Low Level Virtual instruction set Architecture. Hence it is
virtual and low level as well.
·
Global
Data-flow (SSA) & Control Flow Information
It uses SSA form as
primary representation for scalar register values. This feature enables
efficient dynamic translation. SSA form is widely used for compiler
optimizations. An explicit instruction phi is used by LLVA in order to merge
values at control flow join points and to show SSA information directly in the
code. Another critical feature of LLVA
is exposing an explicit Control flow Graph (CFG).
LLVA Type System
The instruction set of
LLVA is fully typed and source language independent type system. This is simple
system with primitive types in predefined size (ubyte, uint, float, double,
etc...) and derived types with predefined sizes
and 4 derived types (pointer, array, structure, and
function).
The main purpose of this
system is to enable typed memory access. The getelementptr instruction is used to achieve this via type-safe
pointer arithmetic.
How LLVA
is better than previous virtual architectures (DAISY and Transmeta’s Crusoe)?
1. Many
research works has been done for virtual ISA. DAISY and Crusoe are example of
ISAs. The experiences show the major
problem of DAISY and Crusoe was that it used the existing ISA of hardware as
virtual ISA. So the translation and optimization became complicated but this problem
is not with LLVA. In this research paper new translation strategy has proposed
to optimizes the translation process and reduces complications.
Previous experience with
DAISY and Crusoe show three difficult features to follow in traditional hardware
interfaces: 1) load/store dependences. 2) precise exceptions 3) self-modifying
code but The LLVA V-ISA simplifies all
this problems.
2. There
are some deficiencies in DAISY and Crusoe and they depended on OS level design.
Hence things became more complicated.
On the base of three
main improvements we say LLVA is better than DAISY and Crusoe:
a. The
design of LLVA has sufficient features to support sophisticated compiler
analysis. LLVA provides low level operation in machine independent form that
can be used to implement high level language feature and match with underlying
hardware ISA.
b. If
we compare exception handling than LLVA is better than was in DAISY and Crusoe.
c. Offline
translation and offline catching of native code is possible in LLVA’s translation
strategy but not possible with DAISY and Crusoe. Another benefit of LLVA is that it can access
external system resources by using OS independent interface. The goal of the
translation strategy of LLVA ISA to hardware ISA is to minimize the need for
online translation and to exploit the novel optimization capabilities enabled
by a rich, persistent code representation.
3. To
overcome the shortcoming in Daisy and Crusoe, the LLVA comes out with a Virtual
ISA . The author worked to proposed LLVA an ISA which has capability of easy
mapping with actual physical ISA due to low
enough and it is fault tolerant and has excellent translation strategy.
Download: Click here