April 27, 1999
The assembler language allows us to define several environment variables. Memory needs to be addressed, and the assembler needs to associate symbolic names for variables with specific data being accessed and stored in memory. It needs to know what size to make the variables, what type of data it is, how to access it and what to start (initialize them with. The data segment is a collection of the variables in the memory, and is begun at an origin or starting address in memory. That origin (address), when paired with another value called the offset (mod data size) gives the address of each individual piece of the data. This is a kind of pointer and can be declared and manipulated by arithmetic that the system is always doing to locate data, much the same way that the code segment is addressed as we discussed in chapter 1. In the assembler this ‘Data segment’ has several attributes. First the data sizes:
1. The byte, containing 8 bits, where each consecutive address in this definition occurs in increments of 1 The basic building block, in memory, above the bit level, is the byte.
My_byte byte 34H or 00110100 or 52 decimal
2. The word, containing 16 bits, which is twice the size, and where each consecutive address occurs in increments of 2.
My_word word 002F5H or 000000010 11110101
3. The double word, containing 32 bits, twice the size again, and each consecutive address occurs in increments of 4, and mainly implemented in assembling programs for newer chips than the 8088, like the 80286 and 80386.
4. The nibble, half of a byte, which we will discuss later
A typical declaration in the data segment might be
my_var word ?
The ? tells the compiler no initializing is required, but we need to verify during design that the variable is initialized, or written to, prior to being utilized.
Data types are further divided into variables and constants.
Variables are either unsigned (Word or byte) or signed ( Sword or sbyte).
A constant may be coded into the instructions, as a number in binary, decimal, hexadecimal, or octal form, and as such, occurs in the machine code with the code segment
A constant is often defined by equating a certain string or number with a symbolic name.
A declaration for a string like:
My_string DB ‘any thing you wanna display’
Or
My_String Equ ‘Anything’
tells the compiler that this is a constant string called My_string. The symbolic name is actually turned into an offset value, which when paired with the value associated with the ‘data segment’, gives an address that can be accessed to retrieve that string. The DB means define bytes, which in this case makes the compiler set aside the correct size of memory to store each of the letters as an ASCII defined number (actually becomes a small one dimensional array) To get this to display:
1. Use the pseudo-procedure LEA, which means load effective address, to move the offset of My_string and allow the code segment pointer (within the pair CS:IP) to the access the origin of the one-dimensional array that comprises My_string. We deal with this access method, called direct addressing, below.
2. Then set the counter register (CX) to the size of this array, which in this case is the number of characters in the string. This is done with the MOV instruction, which is just like a move statement in COBOL. In this case we write MOV <destination>, <source>.
Next, a call to putstring in the externally linked io.lib will execute the movement of each character from memory to the screen as long as the counter (CX) has not run out, which should equal the number characters in My_string. This needs to be checked carefully during coding to make sure lengths match.
The assembler allows the use of a simplified directive called .model, to set up the memory environment, which gives an overall outline to the program data segment in terms of size, and relation to between each data structure. It is a form of memory management within the DOS executable as it initializes. It creates the data segment and/or the stack segments, and determines whether its addresses occur near the top of the memory area or not, and hence, how the address is implemented by the machine code. Types of models are called out and can be tiny, small, large, etc.
A stack as used in assembler is just like the stack you have utilized in data and file structures. The stack is accessed as a typical LIFO (last-in, first-out) structure, and the assembler allows us to use a PUSH and a
POP command to move data in and out of the stack. The format of the commands is:
PUSH <source location> = <symbolic name of memory location> or <a general register>
POP < destination locations > = <symbolic name of memory location>
See the book for more details on implementing this. The instruction to push is actually a copy to the stack, and the source stays unchanged, available for other processes if desired. The pop, by the way, is a copy, but the stack no longer can logically access any popped variable, since the top of stack pointer has been ‘ratcheted’ to the next item.
You have plenty of experience using this structure, and it is used very frequently in assembler to reduce complexity, for when data needs to be saved temporarily, and so on.
Modes of addressing can be either direct, register or immediate. These are the destinations or source locations given in an instruction such as the MOV, (or copy) mentioned above. The appendix of the book has numerous references, under instruction, etc, where knowing which addressing method applies is important to implementing the instruction. Often, the choice of addressing method is very dependent on the operation or instruction being used, so be careful how you use these.
The direct addressing is done when a symbolic name is used in the instruction. A previously defined Equ instruction (ie, My_var Equ 2999) instruction in the data segment also implements the symbolic name status. The example here tells the assembler that My_var is to be interpreted and assembled into machine code where the value 2999 shall occur wherever the source code refers to My_var, hence the ‘equivalence’.
This is either a byte or a word somewhere in the memory, as defined by the data segment.
Immediate addressing is the use of a hard coded value, usually a numerical value in binary, hexadecimal decimal or octal.
Register addressing refers to the most frequent usage of addressing. The general registers are spelled out in the instruction and the processor has the most rapid access, especially since they are local to the ALU, as they are in the cpu itself! General registers usually accessed this way are the data registers AX, BX,CX and DX. Index registers are also addressed fairly commonly, which are the destination and source pointers, to control or set up origins of structures, such as strings.
There exists a library of routines that can be accessed to make assembler language less tedious. So called external routines can be ‘called’, and we define them as EXTRN in the environment of the executable.
Examples are in the book appendix, and include PUTDEC, which displays a decimal on screen, PUTBIN, which displays a binary value on screen, GETDEC, which accepts a signed decimal value (GETDEC$ is for unsigned) from the keyboard, etc.
We can also build our own subprocedures and call them out in this area, and this subject is covered in more detail in chapter 5.
There are specific instructions that are the basic building blocks of procedure. You know most of them, and the functions they perform are fairly simple. The general format is
Operation or command <destination> , <source>
See the appendix for the allowable form of addressing , which determines what you may insert as the destination and the source. Be aware that many times you must do a move prior to arithmetic, because the instruction requires a general register and not a memory address, to do its job.
ADD, SUB, XCHG, MOV are examples of this kind of operation
Other operations , such as MUL, IMUL, DIV, IDIV, are commands that require attention to size and issues of rounding. Divide and multiply require attention to size, since the result of these operations by its very nature, at the binary level of the machine, requires expanding the number of positions for any sort of reasonable usage. To allow orderly expansion, the value is automatically allowed to expand into a field of twice its original register size, within the CPU. Therefore, an 8-bit byte undergoing this work will give a result which spans a full 16-bit word, and a word will give a 32-bit double word, etc.
The choice is also given here as to how to handle the answer, and if the system requires it be left no larger than before, then we actually remove the lower order half of the number, and effect a kind of rounding. For our purposes, this fact is not important , but it should be remembered to understand why some precision work in may not work until this issue is addressed.
To understand why all this happens, lets review the mechanics of multiplication and division.
(Demonstrate the way multiplication causes the decimal to grow in digit size, and then show how a binary number is dramatically effected because of the radix, (base 2) and how the carry to the next position is
implemented by the procedure. Also this demonstrates the principle reason for implementing the bit ‘shift’ at the ALU level of the CPU)
ie 23
x 6
= 18
12
ß
notice the shift
138 <-larger than either operand
These operations, incidentally, all have associated flag register responses to the results of these operations. Relate these flags to the arithmetic functions of the CPU, and you see most of the flags are directly related to arithmetic, such as Overflow, Sign, Carry, Zero , and so on.
There are also the implementation of signed values, using the 2s complement of the value to indicate the negative number. Note that the 2s complement requires the automatic negation of the binary value, and the addition of 1, in the ALU. This causes the most significant bit (MSB) to be set to one by this technique, so that bit is the sign indicator, rather than a carrier of value. IF an overflow occurs at that point, the overflow flag is set, and the MSB will a zero, indicating that , in this case, an invalid number has resulted.
Demonstrate the way subtraction is implemented in the ALU by the addition of the 2s complement of the minuend to the subtrahend. Subtraction is therefore, not done by the adder in the machine the same way we think of it. Negative numbers are therefore stored similarly, in 2s complement.
00010111
– 00000101 is done by doing 23 + ( –5) or 00010111 + (2s complement of 00000101)
Done this way invert 00000101 -> giving 11111010, then add one, giving 11111011 for –5
1 1 1 1 1 1 1
1
ßcarries
0001 0111 (23)
+
1111 1011 (-5)
=
0001 0010 (+18)
carryflag is set after this operation
By the way, this needs to be understood as part of the code designs issues, since an overflow or sign reversal will not be detected or reported automatically, except through the flags in the flag register. The assembler has no automatic facility for checking this, as in a higher language. Therefore, conditions where an operation may cause the overflow flag to be set must be checked by the procedure and appropriate response built-into the code. Otherwise, the program will just keep right on chugging to the end, but the results will be erroneous, at best.
See and demonstrate the interpretation rules for different operations on page 164 for more on this issue.
This is mainly a chapter on coding. All of these structures are familiar, but the implementation of some of these may seem weird. The basic JUMP instruction is like a goto on its own, but basically this is the most logical way the designer can use assembler to build branched sets of instructions. Just because it’s a goto does not mean it is a poor implementation of structured programming rules. On the contrary, the use of the code origin:offset address to redirect the flow of instructions is a necessity, or at least an optimal aspect of assembly language, because the appearance of each consecutive line of code is somewhat mirrored in the machine code and the format in memory that results.
An unconditional Jump is used to alter the normal top to bottom flow of a set of code without any exception. It most often used to escape or skip over another branch in the top bottom flow. Easier shown than explained as far as If-else branches are concerned, which is the next subject….
Generally, as we develop our pseudocode, we usually write scope delimiters, such as end-if and end-loop. Keep this in mind, as you develop your program flow. In assembler, we use branches by naming them, and so that the when you have the decision statement, which is oftentimes a conditional jump, you can name the branches, and hence, the assembler knows what the next line of instruction code should be. Page 189 has a good example of such logic being implemented.
A conditional jump is often used as a form of branch, as in an IF statement. There are several ways to handle and spell out the conditionals of the IF.
See page 172. The assembler has built in ways to react to the flags, (which have been set or cleared in prior instructions) as in tables 4.1, 4.2 and 4.3.
Conditional jumps cannot be used to branch to any instruction more than –128 or 127 bytes offset from the originating instruction. This is probably not going to be a problem for you, since most of your labs’ code will not be that long.
Loops Structure and syntax of the loop object depends on the designer, you, but good examples are shown in chapter 4
Give the loop a descriptive label and implement as:
(Assume BX has
already been set
before this structure)
THISLOOP: ;DO loop
Instruction ;
another .
another .
CMP BX, <somevalue>
JE THISLOOP ;(Jump back up to ‘THISLOOP:’ if not equal, else, drop out )
ENDTHISLOOP: ;END-Do)
Recall the rule about having the conditional changeable inside the loop to avoid an endless loop. In the for loop , it either decrements or increments on each pass. By the way, explain the CMP compare instruction
to explain how it acts to subtract one operand from the other and changes the flags to reflect the result.
Here is the above assembler code modified to demonstrate a while arrangement
THISLOOP: ;WHILE
CMP abc, xyz ;xyz ||= abc
JE ENDTHISLOOP ; body
Instruction ; of
Another . ; loop
Another .
Another .
JMP THISLOOP ; end of loop tells it to return to top unconditionally
ENDTHISLOOP: ;END-WHILE
The loop instruction is similar to the while in underlying format but the CX register is used instead of the a specified conditional.
Initialize the CX register :
MOV CX,somevalue
Then
TOPOFLOOP: ;REPEAT
instructions ; body
. ; of the
instructions ; loop
LOOP TOPOFLOOP ; CX is decremented or incremented automatically
; UNTIL COUNTER = 0
There are three other commands that similar in function, the LOOPE, LOOPNE, LOOPZ, LOOPNZ
All of these are convenient for searching arrays and strings as well.
MASM 6.0 and above allows the use of the following code to replace the ‘LOOP’
.repeat
code
.Untilczx
.Repeat is implemented by the assembler as a shorthand loop, that is a ‘labeling’ statement and a incrementing loop process built by the assembler.
It is a keystone of structured programming to develop modular processes that can be reused, be loosely coupled, and so forth. In assembler it is no different, and this is implemented in two forms, one is external, and is compiled as a separate assembly with the assembler program. It is then available for use by more than just the original parent code. To make use of it and create executable code that incorporates the subprocedure, it is brought in during linking, wherein the subprocedure is named along with the parent procedure. The parent , or higher code is called the caller, for semantic consistency as an obvious reason.
This should be very familiar to you from your Cobol and data and file structures classes.
If you will recall, parameters can be passed as call by-value, in which the called subprocedure makes changes, but they have no effect on the callers environment, and the other, the call by-location, does change the subprocedures variable. The difference depends on how you need to use and or share the data between the caller and the subprocedures.
The format for all of this is extremely terse, occurs in the form of a psuedo-operation, and is:
<Name of the subprocedure> PROC USES <list the registers needing to be saved for
after returning to the callers control>, <parameter list>
So the registers can be either call by value or call by location, depending on whether they are called out here or not. If they care called out, then the assembler automatically expands the USES into a procedure to save those values during entry and restore them upon exit of the subprocedure. The implementation is by pushing all the registers listed to the stack in order and popping from the stack in reverse order.
The parameter list are any variables (with symbolic names defined in the caller environment, of course) which are linked in to the calling assembly. The memory location is then accessible by the subprocedure as well as the caller. (eg, call by-location).
Talk about FAR and NEAR data segment configurations.
Macros are internally compiled pieces of code that are brought in during assembly time, as opposed to the subprocedure that is brought into play during linking. An important feature of macros is that the reference to the macro in the code causes a complete copy of that set of code to be brought into that location, and as such, can actually occur more than once in the assembly and executable code. This is different from the subprocedure which occurs but once.
The cpu has a built in set of logic to handle shifts with ease. As discussed before, shifting is essential to multiply or divide operations. It has other uses, but the basic idea is simply to copy each bit either to the left or the right, so that the value appears to shift through the register.
Two characteristics of types of shifts are
1. which direction it is going
2. logical or arithmetic shift
IN the first case, its either shifting left or right, in the second case, the operand is either treated as an unsigned binary integer (logical) or a signed binary integer (arithmetic ) see the page 284 for a display of the various rules that assembler implements to maintain data integrity using the most significant bit (sign) and the carry flag.
Some of the guys who took hardware remember this from the boards you made to demonstrate shifting, rotation and counting.
Rotation is similar to shifting at the lower levels of implementation, and is really just an extension of it except the bit moved out of one side or the other is transferred back into the oposite side of the byte of word.
Implementation of either the shift or the rotate instructions includes the need to specify how many shifts to the bits to carry out. That is done by moving a value to the CX register prior to the shift or rotate instruction.
The complete code is then
MOV CL,4
RCR AL,CL
Which will rotate thru the carry flag to the right 4 times.
See page 285-288 for the syntax of the shift and rotate instructions.
Recall the work you did in computational structures, data structures, and if you have taken, from computer hardware. There are several instructions to do AND, OR XOR or NEG bits individually. The actual implementation of this instruction involves the entire byte of entire word. They must match in size, as with many other commands. Masks to write ones or zeros can be written that by choosing whether to do an AND or OR. For example
0000 1101
AND
with a mask : 1011 1011
results in 0000 1001
Which is a mask to write a 0 to the 3rd and 7th bits. See the book page 6.5 for a good chart of other boolean processes.
NEG by the way, simply inverts each bit in the byte or word
Flags can be set without any change to the registers with a TEST. That command merely does an AND between the two registers an sets the flags. See the book for detail. (I did not find it all that useful for lab 4, even though my initial impression was that it would be)
AND < destination >, < source >
OR < destination >, < source >
XOR < destination >, < source >
Review of Computational structures at a minimum. Data and file structures should have given you more in depth on this area, as a review, check and use the truth tables on page 293 and 294 for these and try some examples on your own bytes and words like the above.
Then demonstrate that you can either write 1s or write 0s to specific bits, or you can check to see if specific ones are a 1 or a 0, and so on.
TEST is an assembler instruction to do an AND of the destination and source registers, without changing those registers. It is used to set the flags and utilize that response.
THE END
Have a great semester break!