Keywords: static typing | strong typing | type safety | dynamic typing | type systems
Abstract: This article provides an in-depth analysis of the core distinctions between statically typed and strongly typed languages, examining the different dimensions of type checking timing and type system strictness. Through comparisons of type characteristics in programming languages like C, Java, and Lua, it explains the advantages of static type checking at compile time and the characteristics of strong typing in preventing type system circumvention. The paper also discusses the fundamental principles of type safety, including key concepts like progress and preservation, and explains why ambiguous terms like 'strong typing' and 'weak typing' should be avoided in professional discussions.
Fundamental Classification Dimensions of Type Systems
In discussions of programming language type systems, static typing versus dynamic typing constitutes an important classification dimension. Statically typed languages perform type checking at compile time, where compilers or interpreters verify the type correctness of programs. Through this checking mechanism, certain programs are rejected from execution, while programs that pass checking typically receive specific guarantees. For example, compilers can ensure that integer arithmetic instructions are not used on floating-point numbers.
In contrast, dynamically typed languages classify and check values at runtime. Such languages exhibit three characteristic features: values are classified into specific types at runtime; restrictions exist on the usage of these values; when these restrictions are violated, dynamic type errors are reported. Taking Lua as an example, as a dynamically typed language, it has string types, number types, boolean types, among others, with each value belonging to exactly one type. In Lua, two strings can be concatenated, but a string and a boolean cannot be concatenated.
Conceptual Analysis of Strong and Weak Typing
Regarding the term 'strong typing,' professional literature lacks a unified definition. The most widely accepted definition is: in strongly typed languages, programmers cannot circumvent the restrictions imposed by the type system. This term is almost always used to describe statically typed languages.
Weak typing, as the opposite of strong typing, means that the type system can be circumvented. The C language is famously weakly typed because any pointer type can be converted to another pointer type through simple type casting. Pascal was originally designed as a strongly typed language, but due to a design oversight with untagged variant records, a loophole was introduced into the type system, making it technically weakly typed.
Truly strongly typed languages include CLU, Standard ML, and Haskell. Notably, Standard ML underwent multiple revisions to eliminate type system loopholes discovered after the language was widely deployed.
Core Principles of Type Safety
Type safety is the extent to which a programming language discourages or prevents type errors. Type-safe languages are sometimes also called strongly typed or strictly typed. Behaviors classified as type errors typically result from attempts to perform operations on inappropriate data types, such as trying to add a string to an integer.
Type enforcement can be static, dynamic, or a combination of both. Dynamic type enforcement can often run programs that would be invalid under static enforcement, but at the cost of introducing errors at runtime.
In the context of static type systems, type safety typically involves a guarantee that the eventual value of any expression will be a legitimate member of that expression's static type. More precise requirements are subtler than this, with concepts like subtyping and polymorphism adding complexity.
Formal Definitions of Type Safety
Robin Milner's famous statement intuitively captures type soundness: 'Well-typed programs cannot go wrong.' In other words, if a type system is sound, then expressions accepted by that type system must evaluate to a value of the appropriate type, rather than produce a value of some other, unrelated type or crash with a type error.
Vijay Saraswat provides a related definition: 'A language is type-safe if the only operations that can be performed on data in the language are those sanctioned by the type of the data.' However, what precisely it means for a program to be 'well typed' or to 'go wrong' are properties of its static and dynamic semantics, which are specific to each programming language.
In 1994, Andrew Wright and Matthias Felleisen formulated the standard definition and proof technique for type safety in languages defined by operational semantics. Under this approach, a language's semantics must possess the following two properties to be considered type-safe:
Progress: A well-typed program never gets 'stuck': every expression is either already a value or can be reduced toward a value in some well-defined way. In other words, the program never enters an undefined state where no further transitions are possible.
Preservation: After each evaluation step, the type of each expression remains unchanged.
Practical Characteristics of Type Systems
In practice, talking about 'strong' and 'weak' is not very useful. Whether a type system has loopholes is less important than the exact number and nature of the loopholes, how likely they are to arise in practice, and what the consequences are of exploiting a loophole. Practically, it's best to avoid the terms 'strong' and 'weak' altogether because: amateurs often conflate them with 'static' and 'dynamic'; apparently 'weak typing' is used by some to discuss the relative prevalence or absence of implicit conversions; professionals cannot agree on the exact meaning of these terms; overall, you are unlikely to inform or enlighten your audience.
When it comes to type systems, 'strong' and 'weak' do not have a universally agreed technical meaning. If you want to discuss the relative strength of type systems, it's better to discuss exactly what guarantees are and are not provided. For example, a good question is: 'Is every value of a given type guaranteed to have been created by calling one of that type's constructors?' In C, the answer is no. In CLU, F#, and Haskell, it is yes.
Relationships Between Type System Dimensions
On a pedantic level, there is no implication between strong typing and static typing because the word 'strong' doesn't really mean anything. But in practice, people almost always do one of two things: they incorrectly use 'strong' and 'weak' to mean 'static' and 'dynamic,' in which case they incorrectly use 'strongly typed' and 'statically typed' interchangeably; or they use 'strong' and 'weak' to compare properties of static type systems. It is very rare to hear someone talk about a 'strong' or 'weak' dynamic type system.
Either way, if a person calls a language 'strongly typed,' that person is very likely to be talking about a statically typed language. This association stems from the concept of strong typing primarily applying to the context of static type systems, while dynamic type systems naturally prevent type system circumvention through their runtime checking mechanisms.
Analysis of Programming Language Examples
Consider the type characteristics of C: void* ptr = malloc(sizeof(int)); int* iptr = (int*)ptr; This example demonstrates the weak typing nature of C, where programmers can circumvent type system restrictions through explicit type casting.
In contrast, in Haskell: let x = 5 :: Int; let y = "hello" :: String Such operations would be rejected by the type system due to type mismatch, demonstrating the characteristics of strongly typed languages.
In dynamically typed languages, a Python example: x = "hello"; x = 42 demonstrates the dynamic type binding to values, while "hello" + 42 would produce a type error at runtime, illustrating dynamic type checking.
Relationship Between Type Safety and Memory Safety
Type safety is closely linked to memory safety. For instance, in an implementation of a language that has some type which allows certain bit patterns but not others, a dangling pointer memory error allows writing a bit pattern that does not represent a legitimate member into a dead variable, causing a type error when the variable is read.
Conversely, if a language is memory-safe, it cannot allow an arbitrary integer to be used as a pointer, hence there must be a separate pointer or reference type. As a minimal condition, a type-safe language must not allow dangling pointers across allocations of different types.
Most type-safe languages use garbage collection. However, Rust is generally considered type-safe and uses a borrow checker to achieve memory safety instead of garbage collection.