Who?

🔗 More Than Enough Linux to Get By: Part 1

text

First published: .

Part 1: Linux Is About Options

Pre-Rant

In my years of working as a software engineer, leading developer teams, interviewing hundreds of candidates, and—since opening my business—"travelling" through the R&D teams of various software companies, it has always been my impression that—despite the sweeping production usage of Linux in these companies—most developers (and companies) only have a tenuous grasp of what Linux even is. This is a shame, since new generations of developers are progressively losing information that is absolutely vital to the art/practice/occupation of writing software.

Most prominently, developers are increasingly blocked off from matters related to application deployment, system orchestration, software packaging and distribution, all now part of what is commonly referred to as DevOps. You know, that lie that promised to shift responsibilities left towards the developer (The Ole Shift-Left Maneuver), but really only renamed the unfashionable "System Administrators Team" to the hashtag-worthy "DevOps Team" and installed Terraform. But I digress.

The advantages to you, as a developer, of strengthening your knowledge and understanding of Linux, could be immense, and perhaps set you apart from your colleagues. With most development teams using Microsoft Windows or Apple Mac on workstations, developers are growing more and more unfamiliar with their production environments. Let's change that.

In this series of articles, I will be giving you more than enough Linux knowledge to get by. You will learn more about operating systems in general and Linux in particular; how open-source software gets distributed; practical usage of the command line; Linux/Unix-targeted software development; and more. I will try to make it as practical as possible, but this first one will be more "theoretical". Note that we are not learning about Linux kernel development, although we will obviously be talking about the kernel throughout this series.

When I use the term/name "Linux," I mean it as it is currently being used in the industry: a full operating system with some basic expectations as to its usage. This is an incorrect definition, but it can only be corrected after first explaining about operating systems in general, so follow along to the next section.

If you don't really know what an operating system is, please refer to Wikipedia for that. The first sentence is a pretty good definition: "An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs".

Prevalent Operating Systems

Note that I will be limiting our discussion to desktop operating systems in this part, ignoring mobile operating systems, which will be discussed in a subsequent feature.

Aside from Linux, the most prevalent desktop operating systems in the industry today are:

Of course there are more, but these are the ones you are more likely to encounter in the industry. All of these operating systems share some common themes. For example, they are generally comprised of two components: a kernel, and a base system.

The kernel is the core program of the operating system. It is the only program that can access the hardware directly, which it does through "device drivers"; it provides an API for other programs to access the OS and the computer's hardware; and it manages the execution of those programs on that hardware, e.g. the CPU, the memory, the hard disks, etc.

The base system is a standard set of software that comes "preloaded" with the operating system. For example, a command line interface (CLI), a graphical user interface (GUI), a text editor, a web browser, etc. This base system is developed and maintained by the same people/company/project that develop the kernel. They are both published and distributed together, as a unit, rather than separately. They most likely can't be used separately.

Here's a comparison of three of the most prominent operating systems:

Windows macOS OpenBSD
Developer Microsoft Apple The OpenBSD Project
Kernel Type Hybrid Hybrid (XNU) Monolithic
Base System GUI Windows shell Aqua X11 (Xenocara)
CLI cmd / PowerShell zsh pdksh
Text Editor Notepad TextEdit vi, ed, Xedit
Task Manager Task Manager Activity Monitor ps
File Manager Windows/File Explorer Finder cat, ls, rm…
Init System It's complicated launchd init

If some of these terms are unfamiliar to you, don't worry, they're either not important for the discussion, or will be elaborated upon later. What to take from this comparison is that when you encounter a computer with one of these operating systems on it, you can have some basic expectations with regards to the services/programs available on it, how to use it, and how it behaves.

So Where Does Linux Come In?

Linux is harder to discuss within the context of the previous comparison because it is a different beast. The name "Linux" really only belongs to an operating system kernel rather than a full operating system in the same sense as described before.

The Linux Kernel is a Free Software/Open Source project created by Linus Torvalds in 1991. Today, much of the code in the Linux kernel had been contributed by companies such as Intel, Red Hat, IBM, Samsung, Google and many more smaller companies and individual contributors. It is an independent program that can be used almost on every computer. But it doesn't come with a base system.

The kernel is written in C, and is highly modular. It can be compiled with different suites of configuration options, enabling or disabling features, and even choosing different implementations for certain features. For example, the kernel can be compiled with a different process scheduler, meaning the way processes are scheduled and executed on the operating system changes from the default. Some implementations are more suited for server usage while others are more suited for desktop usage, for example. You can compile the Linux kernel with or without certain device drivers and with or without support for certain technologies and hardware. If you don't have any need for USB interfaces (say, you're compiling Linux for embedded usage on a USB-less microcontroller), you can compile Linux without that support.

With no base system bundled with the kernel, Linux had to rely on an existing system. During its original development, Linus used the MINIX base system with the Linux kernel. The name MINIX may be familiar to you if you've read Andrew Tanenbaum's book Modern Operating Systems, which is a standard textbook in Computer Science and Software Engineering studies. MINIX means "mini-Unix".

Wait, Wait, Wait, What's Unix?

By now, you may have heard that name mentioned many times in relation to Linux (or software development, whatever), but are too afraid to ask what it means. Unix was an operating system developed by AT&T beginning in 1969. After years of internal usage, AT&T started licensing Unix to other companies, leading to many Unix variants. The University of California, Berkeley created the BSD operating system. IBM created AIX. Sun Microsystems created SunOS (and later Solaris), Hewlett-Packard created HPX, and even Microsoft had a variant called Xenix.

Unix's major influence was its base system. It came with a comprehensive suite of tools geared towards software development: a C compiler called cc; a linker called ld; a text editor called ed; a build manager called make; a command line interface called sh; file management commands such as ls, cp, grep, find; a documentation reader called man; and many more. It was also highly influential in its file system hierarchy, its treatment of all files as simple streams of bytes, and its representation of various hardware devices as files. For example, a printer was represented as a file in the filesystem, and to print documents, you would write to that file (well, at least your printing program would do that for you).

Since the '90s, the UNIX trademark is owned by a consortium called The Open Group, which published a specification called the Single UNIX Specification. Any operating system that complies with this specification can register to use the UNIX trademark, or in other words brand itself as a true Unix system.

Many of today's most prominent operating systems are either direct descendants of Unix, or "spiritual descendants" of it. Apple's macOS is a registered UNIX implementation, having originally been based on BSD. At its core, macOS is built on top of Apple's open source operating system called Darwin. Every macOS system is also a Darwin system. This is why various programming languages, such as Go for example, use "darwin" instead of "macos" as a compilation target. Darwin lacks the features more closely associated with macOS, such as the Aqua graphical user interface.

Operating systems like FreeBSD, OpenBSD, NetBSD and DragonflyBSD are direct descendants of BSD (a true Unix), but are not registered to use the UNIX trademark. Instead, they call themselves Unix-like, meaning they behave like Unix, but do not conform with its specification. Linux is, more often than not, one of those "Unix-like" operating systems.

Most of these "Unix-like" operating systems aim (or not) to be compatible with the POSIX standards, which predate—and form the core of—the aforementioned Single UNIX Specification. POSIX, which stands for Portable Operating System Interface, defines APIs that an OS should provide, command line shells, process management, common tools, and much more. Some operating systems are POSIX-certified, such as IBM z/OS and Apple macOS. Others, such as the BSD descendants (the aforementioned FreeBSD, OpenBSD, etc.), are "mostly" POSIX-compliant. Others provide a compatibility layer for POSIX applications. Others ignore POSIX completely.

All of this can definitely lead to some confusion. Some operating systems are Unix, some are Unix-like, some are POSIX-compliant, some are mostly POSIX-compliant, some are neither of the above.

So Where Does Linux Come In Again?

Since Linux does not have its own base system, it cannot in itself be said to necessarily be any of these things. It's only when a base system (and everything else that forms a full-fledged OS) is attached with it that it becomes something. But we don't really say "Linux operating system," we say "Linux distribution".

When the Linux kernel was first released, it came at a great time for the GNU Project, started by Richard Stallman way back in 1983. This project aimed to build GNU—a recursive acronym meaning GNU's Not Unix—a free software operating system that is like the proprietary Unix, but isn't Unix. By the time Linux was released, Stallman and his project had already developed free implementations of common Unix tools. In place of Unix's cc C compiler, GNU developed gcc, the GNU C Compiler. Instead of awk, they developed gawk. Free versions of tools such as ls, grep, make, ld, ed, and more were also developed, along with original tools like the Emacs text editor.

What GNU didn't have, though, was a production-ready kernel. Quite quickly after the release of the Linux kernel, people started joining it with GNU software to create standalone operating systems. Some of these early versions of what would grow to be called "Linux distributions" included Yggdrasil, SLS, and Slackware. Since these distributions attached the Linux kernel with the GNU base system, they were often labeled "GNU/Linux," meaning "the GNU system on top of the Linux kernel".

Many years have passed since then, and the number of Linux distributions has grown considerably. DistroWatch.com, which tracks their development, lists hundreds of distributions.

Each Linux distribution makes its own choices with regards to the base system. Debian, for example, mostly uses GNU software as a base system, but also some software developed in-house. Ubuntu, which is based on Debian, uses software from the Debian project, the GNU project, and from Canonical, the maintainers of Ubuntu itself. Alpine Linux comes with a base system completely devoid of GNU software.

This is probably the most prominent difference between Linux and other operating systems. Whereas most operating systems are the product of one company or project, with a kernel and base system that are developed and released together, a Linux distribution is a combination of work by multiple people/companies/projects. It is a combination of the Linux kernel with a base system curated by the distribution maintainers.

Different Linux distributions also have different distribution channels and mechanisms. Many provide ISO or USB images of the OS, allowing users to install from a CD or USB device. Some distributions support installation over the network. Some are simply "bootstrapped" by copying a downloaded file tree into a mounted file system. Some provide an installation script or a graphical wizard. Some simply give you a list of commands to execute manually. Most provide multiple methods of installation.

Most Linux distributions are binary distributions, meaning the kernel comes pre-compiled by the distribution maintainers, as does the base system, but some distributions are source-based, meaning everything is compiled from source during installation. It's the distribution maintainers who also usually create the installation program, if any. The maintainers of a distribution can be nothing more than distributors, with no substantial software of their own being part of the distribution, or full-fledged software companies developing a full-fledged Linux system, such as Red Hat Enterprise Linux. Hell, you can go through the process of compiling Linux and a base system of your curation, creating your own Linux OS.

I'm Over-Simplifying

Distributions are, though, much more complicated than merely a combination of the Linux kernel with a base system, an installation process, and maybe some default configuration files. The truth is that Linux distributions don't even really have a base system that is "set in stone," but merely have a default base system, if even that. What most Linux distributions do have is a package manager. Some sources go so far as to define the term "Linux distribution" as a combination of the Linux kernel with a package manager rather than a base system.

A package manager is a program that automates the installation, upgrading and configuration of software. If you're coming from Windows or possibly even Mac, you're probably doing most of your software installations by downloading an installer from the official website of that software. With a package manager, you can install software from a central location. If this sounds similar to your smartphone's App Store, trust me, it's not, and I will explain later why.

Package managers are quite common in Free Software operating systems, but they are mostly associated with Linux distributions. Recent versions of Windows and Mac may have some similar concepts, and there are open source package managers you can install on them too. Wikipedia is a good source if you're interested. Still, the package managers of Linux and Unix-like OSs are (or can be, depending on the OS) a lot more than just tools to install software through.

The package manager is perhaps the biggest differentiator between the various Linux distributions themselves. There are many package managers out there, and they can be categorized based on the format of packages used. Distributions like Debian and Ubuntu, for example, use the dpkg format, and the package manager is usually apt, but there are others. Distributions like Red Hat Enterprise Linux, the soon-to-be-discontinued CentOS, and many others use the rpm format, with package managers such as dnf, yum or others. Alpine Linux has apk-tools. Many distributions use custom tarball formats, such as Arch Linux's pacman and Slackware's pkgtool. And there are a lot more.

Most package managers install pre-compiled binary versions of software, but some allow compiling from source code. Gentoo, for example, has the Portage package manager, which compiles everything from source. Arch Linux has the ABS system to allow installing software from source.

How Free/Open Source Software Is Distributed

To understand Linux package managers, let's look at a specific software project. We'll use the Firefox web browser as an example. Firefox is developed by the Mozilla Foundation and the Mozilla Corporation. It is a Free Software/Open Source project that you can download the source code of and compile yourself, but this is rare. What you will most probably do‒on Windows and Mac perhaps‒is download a pre-compiled binary or an installation program from the official Mozilla website.

The original source of software distribution is called the "upstream". The upstream for the Firefox web browser is the aforementioned website and/or source code repository. Most software you install on Windows comes from upstream sources.

Package managers, however, do not install software from the upstream. Instead, they install software from package repositories. Every Linux distribution that uses a package manager has a set of official package repositories.

This is where things get interesting. With their package repositories, operating system vendors take upstream sources, compile them, package them into the format of their chosen package manager, and host these binary packages in their repositories. Different OS vendors have different release formats and methodologies. For example, at any given point in time, there are three different branches of the Debian distribution: stable, testing and unstable. Each branch has a different set of package repositories, a different set of available software packages, and therefore a different version of Firefox will be available in each of these branches. In the current stable branch of Debian, Firefox may only ever reach a specific version of the program, which may be significantly older than the most recent upstream version. This is because Debian aims to maintain backwards compatibility and stability above all else, so software receive updates only if absolutely necessary. Eventually, Debian will release a new stable version of the OS, with a more recent version of Firefox, but Debian users will need to upgrade their entire operating system, much like how you would upgrade your Windows installation from version 10 to 11.

Debian is a security-concious distribution. When a security vulnerability is found in a packaged software, it is often fixed in a new version by the upstream, but that version may be far ahead of the one in Debian's stable branch, and a direct upgrade to that version may be out of the question. In this case, Debian will often backport the fix to the version available in the Debian package repositories. This is critical to understand, because it means the same version of Firefox in the official Debian package repositories may be different than the one in the upstream. I'll explain how to recognize this in the next part of the series.

It is quite common for distributions to patch upstream versions for various reasons, most often security and/or compatibility with other software or the configuration of the OS. Not all distributions, however, employ Debian's methodologies. Some distributions use a rolling-release methodology, where only one branch of the OS exists and programs are constantly updated to more recent versions, and the distribution maintainers only rarely provide security fixes/patches of their own. Some have no security practices at all and rely entirely on the upstream developers to provide security fixes.

When you install a Linux distribution, you will often get the choice of selecting which software to install as part of the process, or at least which "groups" of software to install, meaning you get a certain degree of control over the base system you will be getting "out of the box". For example, you may choose not to install a graphical user interface at all, or install one that differs from the one you would have gotten had you not made any specific choices at all. These packages will be installed through the package manager. This is why Linux distributions only have a "default" base system. This may also differ from other operating systems that have a package manager, for example OpenBSD, whose base system comes preloaded with the OS and not installed through the package manager, which is only used to install supplementary software.

Here is a comparison of several Linux distributions and their Kernel version and default base systems:

Debian Slackware Arch Linux Alpine Linux Void Linux
Maintainer The Debian Project Patrick Volkerding Independent Contributors Independent Contributors Independent Contributors
Release Methodology Stable releases on a non-fixed schedule Semiannual releases Rolling release Semiannual releases Rolling release
OS Version 7 11 15 Current 3.17 Current
Linux Kernel Version 3.2.41 5.10.46 6.1.8 6.1.8 5.15.86 5.19
Package Manager apt apt pkgtools pacman apk-tools xbps
Core Utils GNU GNU GNU GNU busybox GNU
C Standard Library glibc glibc glibc glibc musl glibc
GUI GNOME GNOME KDE None None XFCE
CLI dash dash ash bash busybox sh dash/bash
Init System SysV systemd SysV systemd busybox init runit

As you can see, there is a lot of variety, and a lot of commonalities. And this is basically a negligible sample size. Once again, we will learn more about the different terms in the table in future parts.

Let's Make Things More Confusing

By now we've established that a Linux distribution generally combines the Linux kernel with a certain package manager tied to a set of package repositories, and a set of packages that are installed by default during the OS's installation, unless otherwise instructed. And by now, you're probably good and confused. But wait, it's even more confusing, because even the package manager and the package repositories are merely "defaults".

No matter which Linux distribution you're using, you can install other package managers alongside the official one. You can change the package repositories and/or add unofficial repositories. You can remove the package manager altogether, if you're adventurous and crazy enough. You can install one package manager through another package manager. And, of course, you can simply download software from the upstream like you would in any other operating system and install it from there.

Linux (both the OS and the kernel) is about options and flexibility. It allows you to do whatever you want. You can twist it as you please, to the point of breaking it, if you so choose. There's even joke distributions, such as Suicide Linux, which removes all the files in your file-system if you type a command incorrectly.

It's important to note, however, that this flexibility also means you can make your installation inflexible, hardening it to prevent all these crazy shenanigans from happening. We'll get to that later on in the series.

Conclusion

What all this means is that when you get a computer that has "Linux installed on it," this doesn't really give you much information, at least on paper. You don't know which distribution it is, if any; which package manager it is, if any; which package repositories are used, if any; which base system is installed; and many more things that you can take for granted with a different OS.

In the next part, we will take all this craziness and turn it into something more palatable. Among other things, we'll learn how to recognize Linux distributions, how to use them, what coreutils is, and more. Stay tuned.