🔗 More Than Enough Linux to Get By: Part 2

First published: 2023-03-15.

Part 2: Getting Started

This article is the second part in a series of articles about the Linux operating system, targeting software developers who need to work with Linux machines in their professional careers. In the first part, we've learned that the Linux operating system differs conceptually from most other operating systems, in that it is really a family of so-called "distributions", which are combinations of the Linux kernel with a curated default base system and (optionally) package manager, and that the components of the OS are not developed by just one entity. I've finished the article by describing how getting a computer that has "Linux installed on it" doesn't give us much information about what's actually there. In this part, we will learn what we can expect to find on most Linux distributions, how to recognize them, and initial, basic usage information.

As a developer, your main interface with the OS will probably be the command line interface rather than a graphical user interface, whether you access that interface directly, through the network (e.g. via SSH), or another way. We will not be learning about terminals, command lines and shells just yet, this will come in a later part, but we will be using the command line extensively throughout this text. I assume you're already using a Linux command line in your work, even if you don't necessarily understand exactly what's going on behind the scenes and exactly what all the concepts mean. I will reveal more and more parts of the picture as we progress through this series.

The Core Utilities

In the previous part, I explained how the earliest Linux distributions utilized the GNU base system, a free software recreation of Unix tools and commands, and that these commands are described in the POSIX specification. Unix had many built-in commands such as ls, cp, grep and many more, and they were implemented as actual commands that were integral parts of the system. The GNU project, however, implemented them as individual, independent programs, not commands. Originally, these programs were distributed in several separate packages, but eventually they were combined into one package called coreutils.

The GNU coreutils package implemented and extended certain parts of the POSIX specification (but not all). This, coupled with the fact that virtually all early Linux distributions relied on them, has resulted in this group of tools being a de-facto standard one can expect to find when encountering a Linux OS installation, regardless of that distribution specifically striving for POSIX-compatibility or not. However, while most distributions do use the GNU coreutils package, some use alternative implementations, such as BusyBox, sbase, toybox, and others. If you've ever used Alpine Linux, which is quite popular in the container world, then the ls command that you used came from BusyBox, not GNU coreutils.

For a full list of tools included with GNU coreutils, see the official manual; you can just skim the table of contents for the full list. Like I said, GNU coreutils does not implement the entire POSIX standard. Other tools from the standard, and certain tools that are Linux-specific, are instead provided by the util-linux package, which is developed by the Linux Kernel Organization. It is very likely that your Linux machine will have both the GNU coreutils package, and the util-linux package, installed by default. The aforementioned alternatives (e.g. BusyBox) often provide implementations of tools from both packages.

I will be introducing tools from both packages as we move along in the series, but I will not introduce all of them. I will attempt to provide information that is valid to all/most implementations of these tools, rather than specific to GNU coreutils/util-linux. It is important to understand, though, that the implementations do differ in certain ways. You can expect that basic a tool such as ls will work the same regardless of implementation, but more advanced features and flags may be missing, or behave differently.

Where Are We?

Let's imagine that we were dropped into a command line interface in a Linux installation and given no further information about it. Let's gather as much information as we can.

To begin with, we'll start with the Linux kernel. To get information about the kernel, we'll use the uname program. This program is part of the coreutils package. The samples presented in this text use the $ character to signify input from us to the command line. Lines that do not begin with this character are output for the commands we execute. The outputs are specific to the machine I am running the commands on, so you can expect them to be different if you're running the commands yourself.

$ uname
Linux

Well, that didn't give us much information except for letting us know this is a Linux machine. By default, the uname program prints out the name of the kernel installed. This is equivalent to running uname with the -s flag:

$ uname -s
Linux

This is a hint to the fact that the coreutils package is portable, meaning it can run on different operating systems, with different kernels, and on different architectures. Speaking of the architecture, let's see what it is:

$ uname -m
x86_64

The x86_64 architecture is the 64-bit CPU architecture that most desktops and servers use. You may also see it referred to by its original name, "amd64". Other architectures you are likely to encounter are from the ARM family, which is common on embedded platforms, single board computers such as the Raspberry Pi, mobile phones, and new Apple M1 computers. You may also see 686 or x86, which you may remember from the '90s, when your computer became obsolete every two weeks because 386 was replaced with 486, then 586, then 686. These machines, however, are now quite rare, except for very old servers perhaps.

Let's continue and ask uname which operating system is running:

$ uname -o
GNU/Linux

This is an interesting output, but still doesn't give us all the information we need. Before we continue with recognizing the system, however, it's important to know how to learn to use these command line programs. How do we know about the different flags to uname, for example? Most programs will have a flag we can use to get help about the program's usage. This flag will either be -h, --help, or just help. Sometimes, we'll just have to try:

$ uname -h
uname: invalid option -- 'h'
Try 'uname --help' for more information.

$ uname --help
Usage: uname [OPTION]...
Print certain system information.  With no OPTION, same as -s.

  -a, --all                print all information, in the following order,
                             except omit -p and -i if unknown:
  -s, --kernel-name        print the kernel name
  -n, --nodename           print the network node hostname
  -r, --kernel-release     print the kernel release
  -v, --kernel-version     print the kernel version
  -m, --machine            print the machine hardware name
  -p, --processor          print the processor type (non-portable)
  -i, --hardware-platform  print the hardware platform (non-portable)
  -o, --operating-system   print the operating system
      --help        display this help and exit
      --version     output version information and exit

GNU coreutils online help: 
Full documentation 
or available locally via: info '(coreutils) uname invocation'

The help flag is mostly useful for a quick review of the supported flags, options, and arguments, but doesn't teach us how to use the program and what its purpose is. For this, we'll use man. This program is not part of the coreutils package, but is installed by default in most distributions:

$ man uname

NAME
    uname - print system information

SYNOPSIS
    uname [OPTION]...

DESCRIPTION
    Print certain system information.  With no OPTION, same as -s.

    -a, --all
           print all information, in the following order, except omit -p and -i if unknown:

    -s, --kernel-name
           print the kernel name

    [...]

    --help display this help and exit

    --version
           output version information and exit

AUTHOR
    Written by David MacKenzie.

REPORTING BUGS
    GNU coreutils online help: 
    Report any translation bugs to 

COPYRIGHT
    Copyright © 2022 Free Software Foundation, Inc.  License GPLv3+: GNU GPL version 3 or later .
    This is free software: you are free to change and redistribute it.  There is NO WARRANTY, to the extent permitted by law.

SEE ALSO
    arch(1), uname(2)

    Full documentation 
    or available locally via: info '(coreutils) uname invocation'

The man program will open a "pager" with a manual for the program we've asked. A pager is a program that allows us to scroll up and down through the output of another program, search through it, and more. This is useful when our terminal is short, doesn't have a scrollbar, or the output is long.

We can scroll through the output with the Enter key, the up and down arrow keys, and the j and k keys. The latter two are there for compatibility with the vi text editor. You will find that many programs that display data will have such compatibility. You can search by pressing the slash (/) key, and entering some text. You can then cycle through the results (if any) by pressing the n key. Move backwards in the results by pressing Shift+n.

The manual pages on a Linux machine (much like other Unix-like OSs) are categorized into different sections. We can see them if we run man on itself:

$ man man

NAME
    man - an interface to the system reference manuals

SYNOPSIS
    man [man options] [[section] page ...] ...
    man -k [apropos options] regexp ...
    man -K [man options] [section] term ...
    man -f [whatis options] page ...
    man -l [man options] file ...
    man -w|-W [man options] page ...

DESCRIPTION
    man  is  the  system's  manual pager.  Each page argument given to
    man is normally the name of a program, utility or function.  The manual
    page associated with each of these arguments is then found and displayed.
    A section, if provided, will direct man to look only in that section
    of the manual.  The default action is to search in all of the available
    sections following a pre-defined order (see DEFAULTS), and to show only
    the first page found, even if page exists in several sections.

    The table below shows the section numbers of the manual followed by the
    types of pages they contain.

    1   Executable programs or shell commands
    2   System calls (functions provided by the kernel)
    3   Library calls (functions within program libraries)
    4   Special files (usually found in /dev)
    5   File formats and conventions, e.g. /etc/passwd
    6   Games
    7   Miscellaneous (including macro packages and conventions),
        e.g. man(7), groff(7), man-pages(7)
    8   System administration commands (usually only for root)
    9   Kernel routines [Non standard]

    A manual page consists of several sections.

    Conventional section names include NAME, SYNOPSIS, CONFIGURATION,
    DESCRIPTION, OPTIONS, EXIT STATUS, RETURN VALUE, ERRORS, ENVIRONMENT,
    FILES, VERSIONS, CONFORMING TO, NOTES, BUGS, EXAMPLE, AUTHORS, and
    SEE ALSO.

I have found that many developers are unaware of the existence of the man command, leading them to make convoluted Google searches to try to understand how to use a Linux machine. You should make man an integral part of your Linux command arsenal.

Sometimes, we don't know what the name of a manual page is. We can search for manual pages with the apropos command. Let's say I want to search manual pages related to file compression:

$ apropos compress
7z (1)               - A file archiver with highest compression ratio
7za (1)              - A file archiver with highest compression ratio
7zr (1)              - A file archiver with highest compression ratio
archive_util (3)     - libarchive utility functions
archive_read_filter (3) - functions for reading streaming archives
archive_write_filter (3) - functions enabling output filters
brotli (1)           - brotli, unbrotli - compress or decompress files
bunzip2 (1)          - a block-sorting file compressor, v1.0.8
bzcat (1)            - decompresses files to stdout
bzip2 (1)            - a block-sorting file compressor, v1.0.8
...

The output shows us the name of the manual page, and the section that it belongs to in parentheses. Sometimes, the same name can belong to multiple manual pages, in different sections. In that case, you may need to use man SECTION PAGE to get the correct manual, e.g. man 3 printf will show us the manual for the printf function in the standard C library, rather than the printf program from coreutils.

To get a list of all manual pages, we can use apropos .. This will probably yield a very long list. It may be more useful to list all pages in a specific section: apropos -s 5 ., for example, will list all manual pages in section 5, which contains manual pages about file formats and conventions.

We still haven't figured out the Linux distribution we're running on. For this, let's first introduce the cat command. This command concatenates files and prints them to the console. We won't need to concatenate multiple files, though, but remember that this is something that cat can do. The first thing we will do with cat is look at a file called /etc/os-release:

$ cat /etc/os-release
NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch
BUILD_ID=rolling
ANSI_COLOR="38;2;23;147;209"
HOME_URL="https://archlinux.org/"
DOCUMENTATION_URL="https://wiki.archlinux.org/"
SUPPORT_URL="https://bbs.archlinux.org/"
BUG_REPORT_URL="https://bugs.archlinux.org/"
PRIVACY_POLICY_URL="https://terms.archlinux.org/docs/privacy-policy/"
LOGO=archlinux-logo

Finally, we can see which distribution is installed. In this case, the distribution is Arch Linux. You want to learn what all these keys and values mean? I'll leave this one to you, just run man os-release and read the manual. You can expect to find this file on most modern Linux installations. Older installations may not have it, in which case you can expect to find a file called /etc/lsb-release, or to have a tool called lsb_release. LSB was a standard some distributions complied with in the past, but it has mostly been abandoned.

Some distributions will also provide their own specific identification files. For example, Debian and its descendants (including Ubuntu) will include an /etc/debian_version file:

$ cat /etc/debian_version
11.6

Another thing to note is the name of the machine, or "host", we are connected to. Every Linux machine (as most other OSs) will have a name, called "hostname". We'll talk more about hostname when we discuss networking, but for now it's enough to just run the hostname command and note the output:

$ hostname
my-server

When Are We?

Let's get the operating system's current date:

$ date
Wed Mar 15 09:47:06 PM IST 2023

By default, the date tool returns the date in a non-standardized format. Let's ask the tool to return the date in the ISO 8601 format:

$ date -Iseconds
2023-03-15T21:57:25+02:00

The -I flag tells the date tool to use the ISO 8601 format. By default, though, it will only print the date. By adding seconds, I am instructing the tool to print the date along with the time, in second resolution. As always, use man date for more information.

Let's also see how long this operating system has been running continuously:

$ uptime
 21:44:32  up 123 days,  6:22, 2 users, load averages: 1.00, 1.00, 1.00

Here we see that the operating system has been running nonstop for the past 123 days. We also get some additional information, such as the current date, the number of users logged in, and the system's load average, which we'll talk about in a later article. It is quite common for servers to have uptimes of years. Your desktop/laptop machine will often have much shorter ones.

Who Are We?

The next tools in our arsenal deal with users. Linux machines can work in one of two modes: single user, and multi user. In single user mode, only one user can be logged in to the operating system - the root user. This user is akin to the "Administrator" user in other operating systems. It has full access to all features of the operating system, all hardware interfaces, all file systems and files, everything. In single user mode, only a subset of system services are started, and many applications cannot work. It is mostly used for administrative purposes, rescue operations, etc.

Multi user mode, the one which you'll use almost always, allows multiple sessions by multiple users. The same user can even login multiple times. Of course, this includes the root user. Hopefully, your organization hasn't given you root access to its servers, so lets see which user we are logged in as. We can do this with the whoami tool from coreutils:

$ whoami
ido50

My personal user is named ido50. The whoami tool is really a wrapper around a different tool called id:

$ id -un
ido50

Let's run id without any flags:

$ id
uid=1000(ido50) gid=2000(devs) groups=2000(devs),977(docker)

This time we get more information that reveals more about Linux (well, POSIX) user management. Users not only have names, but also integer IDs. In my case, my user ID (a.k.a uid) is 1000. Users are also assigned to groups. Groups also have names and IDs (a.k.a gid). Every user, by default, has a primary group. On desktops, it is common for that group to have the same name as the user itself. On servers, this may not be the case. In the above example, my primary group is called "devs", and its gid is 2000. I also belong to another group, called "docker", whose gid is 977.

Different activities and features of the operating system can be made available to a user either by merit of its uid, or by its association with a group. While not part of the POSIX standard, nor coreutils, it is likely that you will also find a tool called groups installed on the machine. This tool allows you to see all the groups available on the machine:

$ groups
devs admins docker

Now let's use who to see which users are currently logged in to the machine:

$ who
ido50      tty1         2023-03-14 11:40 (:0)
jimbo      tty2         2023-03-15 16:30 (:0)

Here we get a list of all users currently logged in, including the date they had logged in. Ignore the "tty" thing for now, we'll get to that later. We can also use the users tool for simpler output (but read the manual page to learn the difference from who):

$ users
ido50
jimbo

What's Our Package Manager?

At this point, we know the kernel version, the identity of the Linux distribution, the system's time, hostname, our user identity, and how to use the system manual. Let's identify the package manager as well.

Since multiple package managers can be installed, or even no package manager, no one package manager can "claim ownership" of the system. Our best bet (and usually the right one) is to look for the package manager based on external knowledge. We know that Debian and its descendants use the APT/dpkg package manager. Still, we have to just try.

In your work, you will most probably be using one of the "big" Linux distributions: Red Hat Enterprise Linux (RHEL), CentOS, or Ubuntu. We can probably expect to find Amazon Linux and Alpine Linux too if the company you work for uses cloud services such as AWS or uses containers. Some distributions are based on other ones. Ubuntu is based on Debian. Amazon Linux and CentOS (now discontinued, but still in extensive usage) are based on Red Hat Enterprise Linux. That means the number of package managers we need to "know" isn't that big for most of our needs doing professional work with Linux.

Let's look at a map of common distributions, the package managers they use, and how to check if that package manager is installed:

	Debian	RHEL	Arch Linux	Alpine
	Debian-based	RHEL-based	Arch-based	Alpine
Package Manager	APT/dpkg	yum/dnf	pacman	apk
Check With	`dpkg --version` `apt --version`	`rpm --version` `yum --version` `dnf --version`	`pacman --version`	`apk --version`

If we run the commands described and receive a non-error response, then that package manager is installed. Note that some columns show multiple commands. Let's look at the Debian-based distributions, for example. The dpkg program is the main part of the package manager, it is the program that installs and uninstalls package archives in the "deb" format, verifies dependencies between packages, etc. It can only work with locally available package archives. The apt program uses dpkg internally, providing a higher-level interface to it, and adds the ability to download and install packages from the distribution's package repositories (i.e. the Internet). The same is true for yum, which is a higher-level interface to rpm.

My development machine, as we've already seen, uses the Arch Linux distribution. While being one of the most popular Linux distributions out there, it is not commonly found in professional usage. Arch Linux uses the pacman package manager. On my machine, the command pacman --version returns the following output:

$ pacman --version

 .--.                  Pacman v6.0.2 - libalpm v13.0.2
/ _.-' .-.  .-.  .-.   Copyright (C) 2006-2021 Pacman Development Team
\  '-. '-'  '-'  '-'   Copyright (C) 2002-2006 Judd Vinet
 '--'
                       This program may be freely redistributed under
                       the terms of the GNU General Public License.

Which Packages Are Installed?

To get a list of packages that were installed by the package manager, lets refer to the table again:

	Debian	Ubuntu	RHEL	CentOS	Amazon Linux	Arch Linux	Alpine
	Debian-based		RHEL-based			Arch-based	Alpine
List With	`dpkg --list`		`rpm -qa`			`pacman -Q`	`apk info`

When I run pacman -Q on my machine, I get a long list that starts like this:

$ pacman -Q
acl 2.3.1-3
acorn 1:8.8.2-1
alsa-lib 1.2.8-1
alsa-plugins 1:1.2.7.1-2
alsa-topology-conf 1.2.5.1-1
alsa-ucm-conf 1.2.8-1
alsa-utils 1.2.8-1
android-file-transfer 4.2-3
aom 3.6.0-1
apparmor 3.1.3-1
...

We see the names of packages installed, and their versions. It's important to delay here to talk about version numbers for a bit. In the previous part, I explained how the original source of a software is called the "upstream", and how distributions package binary versions of upstream software and publish them to their package repositories. The numbers we see above reflect both the upstream version, and the package version.

Most software uses a three-part versioning scheme: three numbers, separated by dots, e.g. "1.22.3". The first number (1) is the MAJOR part of the version number, the second number (22) is the MINOR part, and the last number (3) is the PATCH part. Many only use two numbers (e.g. "1.22"). Some use a completely different scheme, but not many.

When an upstream software gets packaged, two more parts are added to this versioning scheme. This is true for almost all package managers and distributions, with the biggest difference being that different package managers have different algorithms for sorting version numbers. Any new version of a software/package must be "larger than" any previous version.

In the list above, the version number for the "acorn" package is "1:8.8.2-1". The upstream of this software is located here on GitHub. Let's look at this version number more closely:

1	:	8	.	8	.	2	-	1
E	:	<------ U ------>					-	P

The meanings of the letters in the above figure are as follows:

U: The upstream version. This is the version of the upstream "acorn" software that is being packaged. This is a version that you can find in the upstream website/repository.
P: The package version. This is an integer that gets increased every time the same upstream version is packaged. This happens when there are bugs in the package itself (rather than the actual upstream software), or any other reason the package maintainers may choose to repackage the software. The first time a software version is packaged will receive the value 1 for its "P" component.
E: Epoch. This is another integer that gets increased, but this one begins at 0, and is omitted if it is indeed 0 (so, you won't see "0:1.2.3-1", you'll see "1.2.3-1"). The epoch is only used when there is a sorting issue with the other two components. If a new version of a package is released, but for any reason the version isn't "larger than" previous versions, the epoch forces it to be so.

I will not be showing you how to install packages or perform other package manager-related actions in this part, as this will require more information that I am not ready to give you just yet. This will come later.

In our next part of the series, we will be learning about the file system, how to use files, device drivers, partitions, mount points, and more. Stay tuned.