MSCS (VU).: CS704 - Assignments

Newer 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Older

CS704 – Advanced Computer Architecture-II

Due Date: 1^st June, 2015

Assignment 1

Instructions to Solve Assignments

The purpose of assignments is to give you hands on practice. It is expected that students will solve the assignments themselves. Following rules will apply during the evaluation of assignment.

· Cheating from any source will result in zero marks in the assignment.

· Any student found cheating in any two of the assignments submitted will be awarded "F" grade in the course.

· No assignment after due date will be accepted.

Question 1: Total Points (10)

The 16-bit Zilog Z8001 has the following general instruction format:

15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
Mode		Opcode					w/b	Operand2				Operand1

The mode field specifies how to locate the operands from the operand fields. The w/b filed is used in certain instructions to specify whether the operands are byte or 16-bit words. The operand 1 field may (depending on the mode field contents) specify one of 16 general-purpose registers. The operand 2 field may specify any general-purpose registers except register 0. When the operand 2 field is all zeros, each of the original opcodes takes on a new meaning.

(a) How many opcodes are provided on the Z8001?

(b)Suggest an efficient way to provided more opcodes and indicate the trade-off involved.

Solution:

Let us analyze the problem a bit:

The Opcode field of the instruction has five bits available i.e. from bit 9 to 13. Hence this amounts to a total of 2⁵ = 32 Opcodes.
As stated in the question that when operand 2 field is all zeros, each of original Opcodes takes on new meaning. This means 32 additional Opcodes if the value of operand 2 is all zero.
Mode field consists of two bytes hence 2² = 4 different modes are possible. Theoretically speaking for each combination of mode field there is a whole class of Opcodes available.
w/b field consists of one bit so only two possible values exist. Hence in this case also for each possible value of w/b field, there is a whole class of Opcodes available.

Hence there comes a total of (32 + 32)*4*2 = 512 possible Opcodes. These Opcodes result from different combinations of mode field, w/b field and operand 2 field.

In order to provide room for additional Opcodes, we must sacrifice some functionality of the system. We can’t reduce the field size of mode field as it will severely reduce the functionality of the system and the total Opcodes will also remain the same i.e. (64 + 64) *2 * 2 = 512. Also the w/b field can’t be reduced further as it is already 1 bit of size. We also can’t reduce the size of operand fields because they have to access sixteen general purpose registers that is possible only through 4 bit value. Hence some other strategy must be considered.

As we have seen that operand 2 can’t access general purpose register 0 this is because if it has to access register 0 it will place its address as 0000 but this all zero code actually changes the meaning of original Opcodes. We can apply the same technique to operand 1 also. By making it not to use general purpose register 15 or R15 which is addressed as 1111 we can assign new meanings to all original Opcodes.

Hence by making the operand 1 field not use R15 and saying that if this field contains all 1s then all original Opcodes will have different meanings we can increase the number of Opcodes to (32 + 32 + 32)*4*2 = 768. So we have increased the number of Opcodes by 768 – 512 by 256 after sacrificing R15 from operand 1 field.

Question 2: Total Points (15)

A computer architect is designing a hardware datapath implementation and the architect has determined following circuit element delays.

Instruction Memory	150 ps
Decode	70 ps
Register Fetch	60 ps
ALU	150 ps
Data Memory	200 ps
Register Write Back	60 ps

(a) What is the length of a clock cycle for a single cycle datapath implementation?

(b)What would be the frequency of a processor, corresponding to single datapath implementation?

(c) What would be the length of fastest clock cycle for a 5-stage pipeline datapath? What would be the corresponding processor frequency?

(d)How much faster is the 5-stage pipelined datapath compared to the single cycle datapath implementation?

Solution

(A) What is the length of a clock cycle for a single cycle datapath implementation?

Solution:

Time(lw) = Time(IF) + Time(ID+Reg.File) + Time(ALU) + Time(MemRead) + Time(Reg.FileWrite)

Time(lw) = 150 + (70+60) + 200 + 60

Time(lw) = 690ps

(B) What would be the frequency of a processor, corresponding to single datapath implementation?

Solution:

As we know

Frequency =

where c is the maximum clock cycle.

The value of clock cycle for single cycle is 1. It means that frequency of processor corresponding to single cycle datapath is 1 . Frequency can be defined as

Frequency = 1/clock rate

=1/690* 10^^-12*10^{^9}=1/690*10^{^}-3=1000/690= 1.44 9GHz=1.45GHz

(C) What would be the length of fastest clock cycle for a 5-stage pipeline datapath? What would be the corresponding processor frequency?

Solution:

Fastest clock cycle is that whose latency is minimum i.e 60 ps. but when we calculate the frequency, we have to consider the slowest cycle length in multi-cycle datapath that is 200 ps.

Frequency =

(D) How much faster is the 5-stage pipelined datapath compared to the single cycle datapath implementation?

Solution:

Single cycle execution time= 690 ps

Execution time for multi cycle = 200ps

So 690/200ps=3.45 times faster

It means that 5-stage pipelined datapath is 3.45 time faster than single cycle datapath.

If we analyze this according to frequency point of view then

5GHz/1.45GHz=3.45 times faster

Question 3: Total Points (10)

Consider the following mathematical expressions:

W= B << 3

X = 7B + B + C+ D

For each expression, write an assembly language code:

(a) For Reg-Reg architecture

(b)For Reg-Mem architecture

Solution

W= B << 3
LOAD R2, B Shl [R2], 3 STORE R2, W	Shl [B], 3 STORE B, W OR Shl [R2] , 3 STORE R3,W
X = 7B + B + C+ D
LOAD R2, B LOAD R1, C LOAD R3, D ADD R4, R3, R1 ADD R5, R4, R2 MUL R2, R2, 7 ADD R4, R5,R2 STORE R4, X	LOAD R2, B ADD R1, R2, C ADD R3, R1, D MUL R4, R2, 7 ADD R4, R4, R3 STORE R4 , X

Question 4: Total Points (15)

Read the research paper titled “LLVA: A Low-level Virtual Instruction Set Architecture”, and answer the following question.

What is the basic idea of LLVA and how it is better than previous virtual architectures (DAISY and Transmeta’s Crusoe)?

Solution:

This paper is actually proposed a strategy for implementing virtual ISA (LLVA) on arbitrary hardware. It is basically a layer resides over actual ISA. This layer helps and proved flexibility for processor designer. Virtual instruction set is the same as LLVA is to decouple the program representation from actual hardware implementation. This is same as in high level programming language command is to computer and compiler is used to translate so that the microprocessor understand it. Similarly Virtual ISA can include rich commands and information which can’t be implemented (or is not suitable for direct implementation) on hardware level. It means working of Virtual ISA is not possible without translator just like a high level language without compiler or interpreter. In order to avoid huge and complex changing in design of microprocessor due to implement some functionality at hardware level and introduce OP Codes for them, the researchers take interest in virtual ISA use and development. The computers that implement Virtual ISA are called Virtual Instruction Set Computers.

Components of LLVA

Following are the main components of instruction set:

· Register Set and Memory Model

Infinite, typed, register file is used by LLVA where all registers are in Static Single Assignment (SSA) form. Register can hold scalar values i.e. Boolean, integer, float and pointer. There is partition of memory into stack, heap, and global memory and it allocated explicitly. LLVA is a load/store architecture.

· LLVA instruction set

There are only 28 instructions in a small, orthogonal instruction set. During translation process multiple instructions of LLVA are combined into more complex physical ISA instructions wherever possible. LLVA provides low level operation in machine independent form that can be used to implement high level language feature, so LLVA stands for Low Level Virtual instruction set Architecture. Hence it is virtual and low level as well.

· Global Data-flow (SSA) & Control Flow Information

It uses SSA form as primary representation for scalar register values. This feature enables efficient dynamic translation. SSA form is widely used for compiler optimizations. An explicit instruction phi is used by LLVA in order to merge values at control flow join points and to show SSA information directly in the code. Another critical feature of LLVA is exposing an explicit Control flow Graph (CFG).

LLVA Type System

The instruction set of LLVA is fully typed and source language independent type system. This is simple system with primitive types in predefined size (ubyte, uint, float, double, etc...) and derived types with predefined sizes and 4 derived types (pointer, array, structure, and

function).

The main purpose of this system is to enable typed memory access. The getelementptr instruction is used to achieve this via type-safe pointer arithmetic.

How LLVA is better than previous virtual architectures (DAISY and Transmeta’s Crusoe)?

1. Many research works has been done for virtual ISA. DAISY and Crusoe are example of ISAs. The experiences show the major problem of DAISY and Crusoe was that it used the existing ISA of hardware as virtual ISA. So the translation and optimization became complicated but this problem is not with LLVA. In this research paper new translation strategy has proposed to optimizes the translation process and reduces complications.

Previous experience with DAISY and Crusoe show three difficult features to follow in traditional hardware interfaces: 1) load/store dependences. 2) precise exceptions 3) self-modifying code but The LLVA V-ISA simplifies all this problems.

2. There are some deficiencies in DAISY and Crusoe and they depended on OS level design. Hence things became more complicated.

On the base of three main improvements we say LLVA is better than DAISY and Crusoe:

a. The design of LLVA has sufficient features to support sophisticated compiler analysis. LLVA provides low level operation in machine independent form that can be used to implement high level language feature and match with underlying hardware ISA.

b. If we compare exception handling than LLVA is better than was in DAISY and Crusoe.

c. Offline translation and offline catching of native code is possible in LLVA’s translation strategy but not possible with DAISY and Crusoe. Another benefit of LLVA is that it can access external system resources by using OS independent interface. The goal of the translation strategy of LLVA ISA to hardware ISA is to minimize the need for online translation and to exploit the novel optimization capabilities enabled by a rich, persistent code representation.

3. To overcome the shortcoming in Daisy and Crusoe, the LLVA comes out with a Virtual ISA . The author worked to proposed LLVA an ISA which has capability of easy mapping with actual physical ISA due to low enough and it is fault tolerant and has excellent translation strategy.

Download: Click here