LM11 – Programming

Lecture Recordings at bottom of page

Opening Example:

In class, you will often hear me state that we always think about security and we must learn Comp Sci from the ground up to ensure a proper foundation so that we can ensure security.  In programming, we cannot rely on IDEs and code generators (including AI) as these may contain flaws that generate and introduce security vulnerabilities in our code.  As an example, consider the recent Experian data breach that occurred because the programmers relied on IDE code generation: http://money.cnn.com/2017/09/16/technology/equifax-breach-security-hole/index.html

Introduction

Please recall the basic computational model consists of Input, Processing & Output (IPO), and Storage (IPOS).

A basic computing machine has memory, a processor, a program, an input file, and an output file (note an executing program is a process and the files are abstractions -> see operating systems for a complete description).

When a program is executed, it becomes a process and the OS allocates the necessary resources to the process.  One resource is memory and the OS provides each process with a code, data, and stack segment in memory.  This was demonstrated in the Fetch Execute Cycle simulation in LM2 and is demonstrated in the Recursive Binary Decimal Algorithm Lecture Capture at the bottom of the page.

Programming languages are notations/communications for specifying organizing and reasoning about computations.  They are a communication mechanism and their result makes the computer easier to use.

Similar to Operating System design, programming language design balances convenience for humans to use and efficient use of the computing machine.  Note, for high-level languages, convenience comes first as without convenience, efficiency is irrelevant.  This is not the case for Operating System design as efficiency is the overriding goal of some systems (real-time systems, servers, etc.).   The two major influences on language design have been machine architecture (Von Neumann) and program design methodologies.

History (sometimes referred to as generations)

Machine language is the native language of a particular computer (architecture). It is unintelligible to humans since it is low-level (0’s & 1’s) and requires human lookup (e.g. to program the binary operation code for addition the programmer must look up the code and its use). It provides a low level of functionality and is very inflexible (recall generation 1 computers w/vacuum tubes needed to be rewired to change the program and/or input data).

Assembly language reflects the machine language (low level) as it establishes a one-to-one correspondence between human usable mnemonics and machine instructions (e.g. add R1, R2 as we did this in the Fetch Execute Simulation). During the assembly phase (i.e. the Assembler), operation and symbol mnemonics are replaced with machine instructions and addresses.

Higher-level languages are independent of the underlying machine.  It is this machine independence that leads to portability (write a program once and compile it for or interpret it on different architectures).  We also need higher-level languages for readability and to facilitate organizing and managing large programs (e.g. MS Windows is 50 million lines of code).  This also supported the evolution and availability of program libraries.  Note, as a definition, higher-level languages are considered to be general-purpose if they may be applied to a wide range of problems.

You have heard me state over and over that reading CompSci is misleading as you see terms that are familiar and often mislead you into thinking you fully understand a concept.  Having stated this what is interpretation in contrast to compilation? You probably read right over this without understanding the full meaning.  

Compilation & Interpretation

Compilation is Static (ex. C, Java although Java is also interpreted)

Compilation is more efficient than interpretation since it translates source code to object code once and the compiled code can be run over and over with no additional compilation/translation. Source programs are translated to machine language – target code. Usually, this takes place in several phases.

The Lexical Analyzer reads characters from the source program and forms them into lexical units. These little lexical units consist of program identifiers, special words, operators, and punctuation while ignoring/discarding comments.

The Syntax Analyzer takes lexical units from the lexical analyzer and constructs a hierarchical structure called a parse tree.  Parse trees represent the syntactic structure of the program.

Symbol Tables are constructed to serve as databases for the compilation process (records type and attribute information of each user-defined name in the program and placed in symbol table by lexical and syntax analyzers)

The Symbol Table is used by the semantic analyzer and code generator. The compiler’s output is object code and is linked with other modules (system code).  This result is called the load module or executable image

The Semantic Analyzer checks for semantic errors (i.e. type discrepancies) and code may be optimized to improve the efficiency.

An Intermediate code generator produces programs in a different language at an intermediate level (very similar to assembly language).

Interpretation is Dynamic (ex. Shell Scripts, LISP, JavaScript)

Interpretation reevaluates and translates every line repeatedly even if it is in the same program (loops and procedure calls).  It is a more flexible system than compilation as it allows programs to be changed on the fly to add features or correct errors (this is the functionality IDE debuggers use).

Program-to-machine translation occurs at runtime therefore the interpreter acts as a software simulation of a machine and the interpreter mimics the fetch execute cycle. This can increase security (See Java) as the environment can have a macro understanding of a program’s intent.

Syntax vs. Semantics

Syntax is the programming language’s grammar (e.g. case sensitivity, punctuation, etc.)

Semantics is logical design (i.e. the meaning).

Programming Methodologies

Top-Down design – this is problem decomposition

Bottom-up design – this is used to build program libraries (APIs).

Reasons for Studying Programming Languages

We study programming languages for many reasons that including:

An increased capacity to express ideas – our ability and depth at which we think is influenced by the expressive power of our language.  It is this expressive power that directly determines the depth of abstraction we may apply (e.g. the types of control structures and data structures that we may use)

An increased ability to learn new languages comes from an understanding of data abstraction and programming constructs.  In today’s world, when we awake after sleeping it is likely the world has changed so continuing education is critical (note this is addressed in the ACM Code of Ethics and Professional Conduct).

An increased understanding of systems architecture since learning a new language forces us to learn implementation and foundational mechanisms resulting in better choices and selection of both programming language and architecture.

An increased ability to design new or extend existing languages.  In CISS 110 Programming and Logic you will use inheritance and create new classes so even in your first programming course you will extend the language leading to an  overall advancement in computing

Influences on Language Design

Computer Architecture: the basic architecture has a major effect on language design and note the prevalent languages designed around Von Neumann architecture.

Imperative languages: Data and programs are stored in the same memory. Separate CPU and memory and the notion of a stored program. Instructions and data are moved between memory and CPU. The central feature is variables are manipulated and moved around memory.

(Please recall RISC vs. CISC noting Intel is CISC and ARM CPUs and Apple’s new M1/M2/M3 chips are RISC)

Functional languages operate differently as they adhere to strict mathematical formalism but to fully optimize this system we would need an architecture to support functional languages.  Note parallel systems improve execution speed but are still not designed for functional languages

4 Major Programming Language Paradigms

Procedural/Structured/Imperative (e.g. C)

Action-oriented in that an algorithm is specified -> sequence of actions.  Basic constructs of execution are Sequential, Selection, and Iteration and there is also Structural Decomposition (e.g. functions, procedures, methods, etc.).

Object-Oriented (e.g. Java)

Based on Inheritance, Encapsulation, and Polymorphism.  Classification of objects into classes and subclasses and this hierarchical structure facilitates abstraction.  On a micro level, similar to imperative or procedure-oriented in their syntax and basic semantics

Logical (e.g. Prolog)

Symbolic computation adheres to mathematical formalism. Specify rules in no particular order and the inference machine generates results (resolves). Often used in AI. Prolog video demonstration in Prolog Sub-menu.

Functional (e.g. Lisp, KIF, Haskell)

Based on mathematical functions – high-level programming including AI.

Note there are also Mark-up Languages (PostScript, HTML, etc.) that describe the general appearances of documents and while this may not be considered computer programming,  the introduction of HTML5 and its built-in programming capabilities is changing this definition.

Programming Domains

Scientific applications: This is the foundation of modern computing as the first digital computers (1940s) were designed for scientific applications.  This environment was characterized by simple data structures (arrays & matrices), simple control structures (counting loops & selection), required representations for large floating-point numbers

Business applications:  In the 1950s special computers were developed along with special languages for business.  These business systems were characterized by facilities for producing elaborate reports and could describe and store decimal numbers and character data.

Relational database design (Codd, 1960s) -> see DBMS.

Artificial intelligence (AI)Characterized by symbolic rather than numeric computations where symbols consisting of names rather than numbers are manipulated.  The linked list is the data structure of choice for functional programming (Lisp and its variants) and formal mathematical logic in logical programming (Prolog).

Systems programming:  Operating system and all programming support tools of a computer system.  These systems are used almost continuously and therefore must have execution efficiency and low-level interfaces to external devices (e.g. Unix & C)

Scripting languages:  Scripting essentially puts a list of commands in a file and the file’s “scripted” commands are executed.  The first language was the Unix shell (sh) and commands were interpreted as calls to the system’s utility subprograms.  Later control flow statements functions and various other capabilities were added increasing the expressive capabilities (ksh, perl, etc.)

Special-purpose languages:  Examples are system simulation  (e.g. GPSS) and real-time systems.

Binding Times

Many of the subtle differences in languages occur with their binding times (translation or execution time).  This includes when variables are bound to values and data structures to memory locations.  An example of this late binding is the recently developed/adopted Swift language used by Apple.

Programming Environments

A programming environment is a collection of tools used in the development of software and typically consists of a file system, editor, linker, and compiler. Extreme Java programming with Eclipse is one of the possible Final Projects.

Agile & Extreme Programming

In contrast to the way the text presents this, agile programming is a methodology and extreme programming is a type of agile programming.

Agile Programming: http://en.wikipedia.org/wiki/Agile_programming

Extreme Programming: http://en.wikipedia.org/wiki/Extreme_programming

Game Design

Some nice open-source Game Design/Development tools:

http://www.unrealengine.com/udk/

http://www.blender.org/

Education

From alice.org, Alice is a graphical object-oriented environment for creating 3D animations.  It was developed to engage the next generation of programmers by teaching programming from a Storyboarding perspective.

I also recommend code.org or W3CSchools.com for Web coding tutorials.

Language Evaluation

There are many dimensions upon which we base evaluation and again, it is our responsibility to properly evaluate systems and alternatives (ACM Code of Ethics) so we must understand programming languages.

Readability: Recall from above that convenience is a driving factor.  Readability is the ease at which programs can be read and understood and therefore has a big impact on the software development life cycle (planning, analysis, design, implementation & maintenance).  Note readability must be considered in the context of the problem domain however, simplicity improves readability and a small number of basic components is easier to learn.

Multiplicity: Is there more than one way to accomplish a particular operation? Examples: count = count + 1, count +=1, count++, ++count

Operator overloading: Does a single operator have more than one meaning?  This is useful for programming but may increase the difficulty of readability

Orthogonality:  This comes from the mathematical concept of orthogonal vectors (i.e. independent of each other).  This is closely tied to simplicity and stipulates a relatively small set of primitive constructs that can be combined in a relatively small number of ways to build control and data structures.  Additionally, this stipulates that every possible combination of primitives is legal and meaningful. Note a lack of orthogonality leads to exceptions and exceptions are bad.

Control Statements:  First note that control statements exist to restrict goto’s. They must precede their targets except when used in loops and their targets must not be too distant.  Also, their numbers must be limited (i.e. a program should not be an endless chain of increasingly incremented nested control statements).

Data Types and Structures:  There must be adequate facilities for defining data types and data structures noting that record data types are far more readable than parallel arrays.

Syntax: Has a significant effect on the readability of programs.

Identifiers: Note that length restrictions can detract from readability.

Special Words: Assist in readability (e.g. Begin, and, for, etc.) and delimit compound statements or statement groups or control structures.  The question arises, may special words be used as variables?

Write-ability: The measure of how easily a language may be used to create programs in chosen problem’s domain.  Very similar to the readability issues.

Support for Abstraction: Ability to separate the theory in use from the implementation.  This is a key concept in contemporary programming language design and program design methodologies.  Note there are 2 key distinctions; Data Abstraction (e.g. structures) and Process Abstraction (e.g. sub-programs like EnQueue and DeQueue).

Reliability:  There are many dimensions upon which to determine the programming languages’ reliability.  Basically, by definition, it is reliable if it performs with specifications under all conditions.

Type Checking:  Testing for type errors in a given program.  Done at compilation time or run-time (interpretation) but note that run-time checking is more expensive.

Exception Handling: The ability to intercept and handle runtime errors as it is beneficial for the system to take corrective measures.  Note this is a big aid to reliability but again this decreases efficiency.

Aliasing: Two or more distinct referencing methods.  This is flexible yet dangerous since one memory cell may be accessed in various manners

Cost: Note there is a business component to language evaluation and selection that is based on the infrastructure cost of hiring and retaining employee programmers.

The total cost is a function of many of the characteristics that include: the cost of training the programmers to use the language and this is partly dependent on the programming language’s function and simplicity (orthogonality) of the language, the experience of the programmers, the cost of writing the programs, the readability the language cost of maintaining the programs, the abstraction allowed, the cost of compiling or interpreting the language, the cost of executing the program and efficiency considerations, the cost of implementing the environment, the cost of failures (consider real-time systems), etc.

Lecture Recording of this page’s presentation

Intro to Logic

Now some foundational computational logic & truth tables. The 3 basic control structures programmers use to control execution are sequential, selection and iteration (noting some authors also include sub-programs) –  presentation document located here: Truth Tables

Textbook Content Lectures

Programming Resources: