Assembly Language

From Gentoo Wiki
Jump to:navigation Jump to:search

Assembly language is the lowest level of all programming languages, typically represented as a series of CPU architecture specific mnemonics and related operands. Such mnemonics stand in for precise byte sequences, called machine language, which is the code the CPU actually executes at runtime. Assembly language is imperative by nature, though clever use of macros and data structures often permits other programming paradigms. Even object oriented programming is possible at this level, with the right combination of macros and data structures.

An assembler is the first piece of software written for a new CPU architecture. Typically, a Forth follows this as is it's a self-bootstrapping language that makes an excellent test to confirm a new assembler works as expected. Lastly, a C compiler back end is added targeting the new CPU instruction set. Give or take some intermediate representation, this must ultimately emit machine language in the final stages of compilation. With these three pieces of software written and confirmed functional existing programs can now be ported to the new CPU architecture.

Writing assembly language directly isn't as common as it once was but it is by no means a rare or unusual skill. Security researchers make frequent use of assembly language directly, as do embedded systems programmers and compiler designers. Further, many programmers advocate at least a minimal working knowledge of assembly as it clarifies certain advanced concepts, notably pointers.

Assembly language terminology is often a source of confusion for programmers who are unaccustomed to it. Unfortunately, even official documents have been known to get assembly related terminology wrong. For clarity:

  • The act of converting assembly language source code into an executable is called assembly.
  • Assembly instructions are made up of opcodes, together with it's addressing mode, represent a unique sequence of bytes.
  • Each opcode accepts zero or more operands as arguments depending upon its addressing mode.
  • The addressing mode of an opcode dictates how the instruction interacts with memory.
  • The resulting binary contains machine language instructions that are specific to the target CPU's instruction set.
  • A program that performs this assembly process is called an assembler.
  • Modern assemblers support macros which combine small groups of instructions and may take arguments, to make code easier to read and maintain.
  • Most also support pseudo operations which are aliases built into the assembler itself, typically to stand in for an opcode with a specific set of arguments that performs a common task.
  • The term hand written assembly can refer to developing software directly in an assembler without the aid of a compiler, this is its usual meaning. Much more rarely, the phrase is used to mean writing an assembly language program on pen and paper then transcribing it directly into machine language byte values with a hex editor.
  • Converting an existing binary back to a human readable programming language — assembly language or otherwise — is called disassembly. This process is as much art as science because it can be difficult to differentiate CPU instructions from the surrounding data.
  • Disassembly is a useful skill in various professional circles, especially cybersecurity and malware analysis. As a hobby it is most common in retrocomputing circles.
  • Disassembling a binary to learn how it works is called reverse engineering, which is a legally complex topic in most jurisdictions.
  • Disassembling a binary to reconstruct previously lost source code is called source code archaeology. This is occasionally necessary when the original source code to an important legacy program is lost and it must be or patched to address a bug or to add new features.
  • For the sake of completeness: many modern CISC processors use microcode — somewhat akin to on-chip firmware — as part of their instruction decoding logic. In practice this means that such processors can be patched to a limited extent if hardware bugs are found. Assembly language is not microcode, they're different technologies.

So, yes, an assembler assembles an assembly language program into a machine language executable. A disassembler disassembles an architecture specific machine language binary into (sort of) human readable source code. The resulting code may or may not actually be assembly language, it may be C or something else, but it is still called "disassembly".

Adding to this page

Gentoo strives to be a Linux distribution that developers love to use.

The architectures listed here in the main section should each be credible Gentoo Linux installation targets. Less powerful CPU's should be placed into the embedded section towards the bottom of the page provided an assembler ebuild exists in Portage or GURU. Architectures that used to serve as a Gentoo installation target but are no longer suitable for some reason should get moved to a Retired Architectures section or to the embedded section, whichever makes the most sense.

In general, entries should have the following details:

  1. Who originally designed the ISA.
  2. A few short words about its history and technical details, e.g., whether it is RISC or CISC and its register width, etc.
  3. What market niche the ISA fills.
  4. Anything that makes it interesting or unique about assembly language development on the ISA.
  5. Any special tendency the ISA has towards exposing bugs in code that might go unnoticed on other platforms.

Each and every architecture entry here is an invitation to write a corresponding guide that covers best practices for assembly language development on the platform — especially if it's a lesser known or niche ISA. Also, Gentoo maintainers of less popular CPU architectures need your help! Consider contributing to one of the ISA's Gentoo project, linked to in the architecture's name.

Developing in Assembly language on Gentoo

Gentoo has a wide variety of assemblers available that target many major and minor CPU architectures. While the GNU Assembler is the most ubiquitous, it is far from the only option.

  • sys-devel/binutils — Includes the GNU Assembler which is a well-worn part of the GNU compiler collection tool chain targeting multiple CPU architectures.
  • dev-lang/jwasm — A MASM compatible assembler which began as a fork of WASM.
  • dev-lang/mmix — Donald Knuth's MMIX Assembler and Simulator.
  • dev-lang/nasm — A popular x86/amd64 assembler.
  • dev-lang/yasm — An assembler for the x86 and x86_64 instruction sets.

Developing for Embedded Architectures


Learning assembly language today

Warning
Thou shalt not link to pirated books! Although direct links to freely available content is preferred, citing well respected texts that are readily available on the secondary (used) market is perfectly fine. That said, please DO NOT link to pirated copies of programming books, including long out-of-print books. Showing disrespect for the intellectual property rights of others could put the Gentoo project at legal risk and nobody wants that. Linking to officially copies of previously published texts that have subsequently been available for download by the original author(s) or current rights holder is acceptable.

While learning to program in assembly is challenging, today is far easier than it was in the past. In the past large expensive books on a given CPU architecture was required in order to master the subject. While such books are still available, mostly targeting academic use, the Internet has proliferated knowledge of assembly language far and wide. There are groups of programmers entirely devoted to learning assembly and YouTube channels exist that cover the subject in great dept for nearly every ISA current or historical.

This list is divided into two parts. The first is these is CPU architectures that are solid Gentoo Linux instillation targets, or have fulfilled such a role the reasonably recent past. The second part of the list concerns itself with embedded CPU's that are unlikely to serve as installation targets. Admittedly there are some edge-cases here: many MIPS-based computers can realistically support a Gentoo Linux installation, so it is in the first list; even though some MIPS computers make unlikely installation targets. On the other side of the fence, the venerable Motorola M68k line of embedded CPU's can theoretically support a Gentoo Linux installation, given enough RAM, but such a configuration is rare.

Acorn

Acorn Computers developed the ARM1 circa 1983 for the Acorn Archimedes and as a 32-bit expansion module for the BBC Micro line of 8-bit home computers. Famously the ARM1 CPU was designed in a few short weeks and prioritized low power consumption above nearly everything else in order to prevent thermal overload of the CPU package. (Details on the ARM's deign history are well covered in the YouTube documentary: How Amateurs created the world´s most popular Processor.) Today the 32-bit and 64-bit descendants of the original ARM1 processor run in thousands of mobile devices. It is perhaps best known as the heart of the Raspberry Pi line of single board computers. Knowledge of ARM assembly language is a sought after skill for firmware, embedded, and mobile developers.

ARM

ARM 64

AIM Alliance

The PowerPC is a RISC architecture that was designed by the AIM alliance in 1994. It was intended as a 32-bit ISA but it was later extended to 64-bits. It is best known for as the CPU of choice for 1990's Macintosh computers. Today it is promoted by the OpenPOWER Foundation as "The Most Open and High-Performance Processor Architecture and Ecosystem in the Industry." With the exception of Power10 (which is a relatively closed design relative to its counterparts) this statement is true.

At present PowerPC CPUs are most commonly used as high end embedded devices. Many automotive single board computers run some form of the Power ISA. In the aerospace industry the Power ISA is common and NXP is well known for producing radiation hardened models for use in space satellites. PowerPC is not entirely absent from the personal computer market as some very high end servers run PowerPC CPU's. Owing to its openness the PowerPC is the CPU of source in the Linux Open Hardware PowerPC notebook currently under development.

Gentoo's PowerPC Project remains active, but Gentoo's PowerPC project would welcome new contributors. Even for a system as old as the Mac Mini G4, Gentoo remains a viable installation target for those old systems. It should be noted however that high end Power ISA servers exist and are equally valid Gentoo Linux installation targets. Those seeking to learn PowerPC and PowerPC 64 assembly language run the gamut from those interested in retro Mac development to embedded development in the automotive or aerospace industries.

The PowerPC's endianness choices — technically bi-endian, but historically big endian by default — have a reputation for revealing certain kinds of programming. Such errors tend to happen when a programmer assumes all build targets are little endian like Intel and ARM CPU's. Endianness issues tend to be subtle, so developers are encouraged to compile their code against the PowerPC Big Endian and run appropriate unit tests. PowerPC(64) Big Endian can be emulated with qemu and chroot.

PowerPC

PowerPC 64

  • Power ISA Instruction Set Architecture by the Open Power Foundation Instruction Set Architecture Technical Working Group — extensive technical documentation on the Power ISA and its various revisions, PDF available. Covers both 32-bit and 64-bit PowerPC modes.

Digital Equipment Corporation

The Alpha was a 64-bit RISC processor developed by Digital Equipment Corporation (DEC) and first introduced in 1992. It was designed for the minicomputer and mainframe market, not the home microcomputer market. The Alpha was discontinued after DEC was acquired by Compaq in 1998 in favor of Compaq continuing with the Intel Itanium. Today, interest in the Alpha line continues mostly among mainframe computer hobbyists, some of whom choose Gentoo Linux as their preferred operating system.

Alpha's design choices have a reputation for making certain kinds of programming errors much more obvious. Potentially dangerous memory access bugs that would go unnoticed on other architectures cause programs to segfault on this architecture. This is detailed in the Gentoo's Alpha Porting Guide. Gentoo's Alpha project would welcome new contributors. Developers are encouraged to compile their code against the Alpha — which can be emulated with qemu and chroot — as a means of bug-hunting.

Alpha

  • Alpha RISC Architecture for Programmers by James S. Evans and Richard H. Eckhouse — highly detailed academic text on writing DEC Alpha assembly language.
  • Alpha Architecture Handbook by Digital Equipment Corporation — the official Alpha handbook, freely available on the Digital.com website circa 1998 shortly after its acquisition by Compaq. Site preserved by Archive.org.
  • Alpha Architecture Reference Manual edited by Richard L. Sites — an official Alpha manual from DEC.
  • Alpha Assembly Language Guide (PDF) by Randal E. Bryant — a short tutorial detailing the inner workings of the Alpha processor.
  • 21164 Alpha Microprocessor Hardware Reference Manual by Digital Equipment Corporation — freely available on the Digital.com website circa 1998 shortly after its acquisition by Compaq. Site preserved by Archive.org.
  • Alpha 21164PC Microprocessor Hardware Reference Manual by Digital Equipment Corporation — freely available on the Digital.com website circa 1998 shortly after its acquisition by Compaq. Site preserved by Archive.org.
  • How much better was DEC Alpha than contemporaneous x86? — a StackExchange article detailing the strengths and weaknesses of the Alpha's design.

Hewlett-Packard

Hewlett-Packard introduced the Hewlett Packard Precision Architecture (HPPA) in 1986. Its original incarnation has a mix of 32-bit and 64-bit registers, in 1996 its ISA was extended to a pure 64-bit design. It was intended for high end HP servers and workstations of its era. The HPPA line was discontinued in 2008 as it was displaced in its role by the Intel's Itanium before that architecture was also discontinued. Today there are older but functional HPPA workstations and servers that continue to see use past their end-of-life. Some owners of legacy HPPA hardware choose to run Gentoo Linux. Those who learn HPPA assembly language typically do so out of historical interest.

HPPA PA-RISC

Intel

Intel famously created the world's first CPU on an integrated package with the release of the Intel 4004. Other companies followed suit and a short time later hobbyist computer kits became available sparking the home microcomputer revolution.

The x86 line are 64-bit descendants of the 16-bit Intel 8086. The x86 line remains a viable installation target for Gentoo Linux. Today the oldest Intel CPU that a modern Linux kernel can run on is an i486 — though this may soon shift to the Intel Pentium.

The 64-bit Intel Itanium was a CISC design originally intended for the high end server market. It intentionally broke backwards compatibility with the x86 line in order to make it easier to implement improvements in CPU design. The AMD64 is a 64-bit extension of the 32-bit x86 line. It was developed by AMD in 2000 and proved so successful that Intel licensed the technology from its competitor. Today, most Gentoo Linux installation targets are 64-bits and many of those are AMD64. As the Itanium was only recently discontinued, though less common than AMD64 PC's, it remains a viable installation target for Gentoo Linux.

Two main types of syntax:

  • AT&T (.att_syntax): parameter order is: source → destination.
  • Intel (.intel_syntax): parameter order is: destination ← source (divides to NASM-style and MASM/TASM style).

Knowledge of Intel assembly language is useful for a great many things, but especially compiler optimization, vulnerability analysis, and malware reverse engineering. All of these skills are in high demand.

x86

AMD64

Itanium

Note
The Linux kernel dropped support for Intel Itanium in late 2023 and glibc followed suit in early 2024. Without kernel support it is not possible to boot very recent versions of the Linux kernel on Itanium hardware. Although other C runtimes exist, without glibc support maintaining IA64 as a C compilation target is much harder.

For more information, see: Gentoo Linux drops IA-64 (Itanium) support.

Loongson

The first LoongArch CPU was released in 2021 by the privately owned but Chinese state-controlled company Loongson Technology. It is a 64-bit RISC design intended to compete with its modern Intel, ARM, and RISC-V counterparts domestically. Some server-class designs already exist and and future designs in defense and aerospace applications are anticipated. In addition Loongson intends to produce some LoonArch processors for some embedded applications and embrace and extend the MIPS design for some others.

Learning LoongArch assembly language is a useful skill for any Gentoo user interested in the unique CPU design either as a learning exercise or one seeking to implement or improve a compiler on that platform as well as anyone interested in embedded aerospace or defense applications of this ISA. Gentoo's LoongArch project is in its early days and would welcome new contributors.

LoongArch

LoongISA

The LoongISA is a superset of MIPS64, see the MIPS Section.

International Business Machines

International Business Machines (IBM) introduced the System/390 mainframe series in 1990. It is an evolution of the System/360 with various models having different processor counts and system memory configurations based upon customer needs. The S/390 has the distinction of being the first high end mainframe build with processors implemented via CMOS fabrication. The S/390's architecture is multiple big endian 32-bit CISC processors with 64-bit floating point math support. Some models support as many as 10 single core processors and a (then staggering) 32-GB of RAM. The S/390 series was a highly successful product run and support was discontinued by IBM in 2004. Given the tendency of mainframe installations to greatly outlive their manufacturer support contracts it's no great surprise that some of these units remain in service. More than a few S/390 owners have chosen to migrate to Gentoo Linux for the remainder of their mainframe's useful service lives. Gentoo maintains a sys-apps/s390-tools package which contains device drives and userspace tools unique to the architecture.

Those interested in the assembly language of the S/360 series are likely those with a history with — or historical interest in — mainframe architecture. Gentoo's S/390 project would welcome new contributors! For those without access to real hardware, the Hercules Emulator app-emulation/hercules, can handle most workloads on commonly available PC hardware.

System/390

MIPS Computer Systems

MIPS Technologies released the first MIPS microprocessor in 1985. Both 32-bit and 64-bit implementations of the ISA exist. It's an extremely well-studied design which has gone on to influence nearly all RISC designs that came after it. It originated as a high end sever design but found new life in later decades as a specifically embedded CPU. A great many network devices run MIPS processors.

Recently the MIPS corporation has shifted away from the MIPS architecture and has committed itself to designing RISC-V cores going forward. MIPS Open Architecture has a relatively permissive licensing terms and third parties, most notably Loongson, intend to continue to evolve the platform. MIPS remains a viable installation target for Gentoo Linux, that said the Gentoo MIPS project welcomes new contributors. Those wishing to learn MIPS assembly often do so in academic environments or as prospective embedded systems programmers, most commonly targeting network appliances.

MIPS

Sun Microsystems

Sun Microsystems released the first SPARC processor in 1987 for its line of servers. While both their designs have similarities, SPARC is more of a descendant of Berkeley RISC than it is MIPS. Early SPARC processors were 32-bit big endian processors. Later processors were biendian and 64-bit. Oracle, who acquired Sun, ceased development of the SPARC processor in 2017. The last remaining SPARC producer, Fujitsu, is currently in the process of ending production of SPARC hardware and shifting towards the production of ARM CPU's.

A large number of high end workstations and servers running SPARC processors exist on the secondary (used) market, most of them perfectly suitable as Gentoo installation targets. The Gentoo SPARC team would welcome new contributors. Knowledge of SPARC is useful to anyone who seeks to broaden their knowledge of the inner workings of an interesting RISC ISA.

SPARC

University of California, Berkeley

RISC-V began as a project for at the University of California, Berkeley in 2010 as an effort to produce an open standard ISA. There are 32-bit and 64-bit variants. In addition to the core instruction set, RISC-V's modular design allows it to be easily extended to add custom features. RISC-V is a workable installation target for Gentoo Linux, but it is not yet considered stable. The Gentoo RISC-V project is seeking volunteers to join its ranks as it's hard to keep up with everything happening in the RISC-V space. The RISC-V ISA is experiencing a tsunami of rapid growth in the single board computer and embedded markets. Server versions of the chip, including those that support virtualization, are already in development.

Those seeking to learn RISC-V assembly may do so for any number of reasons. A working knowledge of RISC-V assembly language can open doors in multiple market segments.

RISC-V

Has its own syntax, not AT&T or Intel (from comment):

I'd just call it "RISC-V syntax". No $ on register specifiers; operand order is normally dest, src.

Embedded Processors and Microcontrollers

In modern usage a microcontroller is not just an ultra low-spec CPU contemporary standards. Modern microcontrollers are processors with their own RAM and ROM on a single package. This isn't quite the same thing as a System on a Chip, but the two concepts are similar and both exist with the goal of reducing total chip count to reduce production cost. A good many CPU's that are now only produced exclusively as micocontrolers or even just FPGA cores were once considered powerful enough to be the CPU of one or more lines of home computer. Edge cases exist, but in general most microcontrollers do not make good Gentoo Linux installation targets — even with a binhost providing the heavy lifting required to compile packages.

One of the few CPU's on this list that can theoretically support a Gentoo Linux install, provided it has enough RAM, is the Motorola M68k but doing so is considered highly experimental.

Atmel AVR

Atmel introduced the AVR microcontroller in 1997 and is among the first microcontrollers to contain on-ship flash memory, a practice that is now standard. The original model is an 8-bit RISC design of which the ATmega8 variant, popularized by early Arduino models, is perhaps the best known. The 32-bit AVR32, also produced by Atmel, bears little actual resemblance to its 8-bit counterpart. While Atmel has largely switched to producing ARM processors, 8-bit AVR microcontrollers are still produced in large numbers. Those interested in early Arduino single board computers or hardware hacking more generally will likely be among those who find AVR assembly language useful.

Microchip Technology PIC

MOS Technology 65xx

The 6502 was an early 8-bit processor produced by MOS Technology. It was the first successful example of a simple "low spec" processor. As it was extraordinarily inexpensive to produce, consequently it made its way into dozens of 1970's and 1980's home computers. Eventually, MOS was acquired by Commodore Semiconductor Group who produced a number of variants of the chip, notably the 6510 for the Commodore 64. After Commodore's bankruptcy Western Design Center acquired the rights to the 6502. It now produces a number of variants of the processor, including 16-bit derivatives. The 6502 continues to be produced in great numbers and it is found in many embedded devices to this day. Today 65xx processors are embedded in thousands of devices, many of them peripherals. Static core variants of the processor are heavily used in implanted medical devices, due their ability to consume no power in a sleep state and wait for an externally triggered interrupt from an attached sensor. Those interested in 6502 assembly run the gamut from retrocomputing enthusiasts to hardware design engineers.

  • Programming the 6502 by Rodnay Zaks — generally considered "the 6052 Bible." It was very often used in undergraduate courses on 6502 assembly programming. This text is widely available on the secondary (used) market.
  • NMOS 6510 Unintended Opcodes: No More Secrets, v0.96 (2021-12-24) (PDF) — provides unprecedented details into how the 6502/6510's (in)famous "undocumented opcodes" work. This text is only relevant for the NMOS variants of the 6502 and 6510, common in 1980's home computers. Modern CMOS derivatives of the 65xx produced by the Western Design Center removed these.
  • Western Design Center 65xx Resources — including datasheets and official manuals.
  • Wikibooks 6502 Assembly — a long one-page primer on 6502 assembly.
  • Nerdy Nights Tutorial — a series of articles intended to bring the reader up to speed on the 6502 over the course of a few week. This article is popular with the NES and SNES development crowd.
  • 6502 Instruction Set — a through and highly readable overview of the original 6502 processor's instructions, including undocumented instructions. The document does not cover the instructions added to CMOS variants of the chip.
  • Rosetta Code: 6502 Assembly — a fairly substantial introduction to 6502 assembly followed by links to articles on how to perform various programming tasks and build common data structures on that architecture.

Motorola M68k

Motorola produced the original M68k series of microprocessors. That IP was sold to Freescale Semiconductor which produces the Freescale 683XX which is backwards compatible with the original M68k line and is still produced in large numbers. Freescale eventually merged with NXP Semiconductor. Hence, NXP's website hosting content that asserts a Motorola or a Freescale copyright and references to Freescale processors with M68k instructions.

Zilog Z80

Zilog's Z80 processor entered in serial production in 1976. Ever since, the chip has been in continuous production in various form factors. Early in its history, the Z80 was a major player in the business and home microcomputer market thanks to the rise of CP/M, an early operating system that became a standard for almost a decade. Today, the Z80 and its variants are found in the majority of consumer electronics either as the device's CPU or as a support processor. Z80 assembly language is a desirable skill for those seeking to develop skills useful to the consumer embedded electronics market segment.

  • Programming the Z80 by Rodnay Zaks — generally considered a seminal manual on the inner workings of the Z80. It was very often used in undergraduate courses on Z80 assembly programming. This text is widely available on the secondary (used) market.
  • The Undocumented Z80 Documented by by Sean Young and Jan Wilmans — a comprehensive analysis of unintentionally created opcodes on the Z80. Like its 6502 counterpart, it's mostly only relevant for early models of the processor.
  • Rosetta Code: Z80 Assembly — a short introduction to Z80 assembly followed by links to articles on how to perform various programming tasks and build common data structures on that architecture.

Virtual Machine Assembly Languages

The two most significant virtual machines that run byte code, effectively machine language for the VM, are Java and WebAssembly. Java, for better or worse, is ubiquitous in modern Enterprise computing. Further, use of Java is ubiquitous in mobile computing. While no CPU's presently exist that have an ISA that maps directly to Java byte code, Java performance is a major consideration for nearly all modern processor designs.

WebAssembly is a virtual machine execution environment primarily targeting modern Web Browsers. This is distinct from a web browsers two other virtual machines, those being its rendering engine and its JavaScript execution environment. WebAssembly is marketed as allowing web-based application code to be executed at near-native speed.

Direct assembly or disassembly is rare and primarily limited to debugging. In both cases, the respective virtual machine byte codes are primarily intended as compilation targets.

Java Bytecode

WebAssembly

See also

  • Binutils — a set of tools for creating and manipulating certain types of binary files.
  • C — a programming language developed for Bell Labs in the early 1970s
  • C++ — a general-purpose programming language that originated from C
  • Fortran — a general-purpose, compiled imperative programming language that is especially suited to numeric computation and scientific computing.
  • Forth — a heavily stack-oriented self-compiling procedural programming language that is only slightly more abstract than assembly.
  • Rust — a general-purpose, multi-paradigm, compiled, programming language.

External resources