🔗 More Than Enough Linux to Get By: Part 2
textFirst published: .
Part 2: Getting Started
This article is the second part in a series of articles about the Linux operating system, targeting software developers who need to work with Linux machines in their professional careers. In the first part, we've learned that the Linux operating system differs conceptually from most other operating systems, in that it is really a family of so-called "distributions", which are combinations of the Linux kernel with a curated default base system and (optionally) package manager, and that the components of the OS are not developed by just one entity. I've finished the article by describing how getting a computer that has "Linux installed on it" doesn't give us much information about what's actually there. In this part, we will learn what we can expect to find on most Linux distributions, how to recognize them, and initial, basic usage information.
As a developer, your main interface with the OS will probably be the command line interface rather than a graphical user interface, whether you access that interface directly, through the network (e.g. via SSH), or another way. We will not be learning about terminals, command lines and shells just yet, this will come in a later part, but we will be using the command line extensively throughout this text. I assume you're already using a Linux command line in your work, even if you don't necessarily understand exactly what's going on behind the scenes and exactly what all the concepts mean. I will reveal more and more parts of the picture as we progress through this series.
The Core Utilities
In the previous part, I explained how the earliest Linux distributions
utilized the
GNU base system, a free
software recreation of Unix tools and commands, and that these commands are
described in the
POSIX specification. Unix had many built-in commands such as ls, cp,
grep and many more, and they were implemented as actual
commands that were integral parts of the system. The GNU project, however,
implemented them as individual, independent programs, not commands.
Originally, these programs were distributed in several separate packages,
but eventually they were combined into one package called
coreutils.
The GNU coreutils package implemented and extended certain parts of the
POSIX specification (but not all). This, coupled with the fact that
virtually all early Linux distributions relied on them, has resulted in this
group of tools being a de-facto standard one can expect to find when
encountering a Linux OS installation, regardless of that distribution
specifically striving for POSIX-compatibility or not. However, while most
distributions do use the GNU coreutils package, some use alternative
implementations, such as
BusyBox,
sbase, toybox, and
others. If you've ever used Alpine Linux, which is quite popular in the
container world, then the ls command that you used came from
BusyBox, not GNU coreutils.
For a full list of tools included with GNU coreutils, see the official manual; you can just skim the table of contents for the full list. Like I said, GNU coreutils does not implement the entire POSIX standard. Other tools from the standard, and certain tools that are Linux-specific, are instead provided by the util-linux package, which is developed by the Linux Kernel Organization. It is very likely that your Linux machine will have both the GNU coreutils package, and the util-linux package, installed by default. The aforementioned alternatives (e.g. BusyBox) often provide implementations of tools from both packages.
I will be introducing tools from both packages as we move along in the
series, but I will not introduce all of them. I will attempt to provide
information that is valid to all/most implementations of these tools, rather
than specific to GNU coreutils/util-linux. It is important to understand,
though, that the implementations do differ in certain ways. You can expect
that basic a tool such as ls will work the same regardless of
implementation, but more advanced features and flags may be missing, or
behave differently.
Where Are We?
Let's imagine that we were dropped into a command line interface in a Linux installation and given no further information about it. Let's gather as much information as we can.
To begin with, we'll start with the Linux kernel. To get information about
the kernel, we'll use the uname program. This program is part
of the coreutils package. The samples presented in this text use the
$ character to signify input from us to the command line. Lines
that do not begin with this character are output for the commands we
execute. The outputs are specific to the machine I am running the commands
on, so you can expect them to be different if you're running the commands
yourself.
$ uname Linux
Well, that didn't give us much information except for letting us know this
is a Linux machine. By default, the uname program prints out
the name of the kernel installed. This is equivalent to running
uname with the -s flag:
$ uname -s Linux
This is a hint to the fact that the coreutils package is portable, meaning it can run on different operating systems, with different kernels, and on different architectures. Speaking of the architecture, let's see what it is:
$ uname -m x86_64
The x86_64 architecture is the 64-bit CPU architecture that most desktops and servers use. You may also see it referred to by its original name, "amd64". Other architectures you are likely to encounter are from the ARM family, which is common on embedded platforms, single board computers such as the Raspberry Pi, mobile phones, and new Apple M1 computers. You may also see 686 or x86, which you may remember from the '90s, when your computer became obsolete every two weeks because 386 was replaced with 486, then 586, then 686. These machines, however, are now quite rare, except for very old servers perhaps.
Let's continue and ask uname which operating system is running:
$ uname -o GNU/Linux
This is an interesting output, but still doesn't give us all the information
we need. Before we continue with recognizing the system, however, it's
important to know how to learn to use these command line programs. How do we
know about the different flags to uname, for example? Most
programs will have a flag we can use to get help about the program's usage.
This flag will either be -h, --help, or just
help. Sometimes, we'll just have to try:
$ uname -h
uname: invalid option -- 'h'
Try 'uname --help' for more information.
$ uname --help
Usage: uname [OPTION]...
Print certain system information. With no OPTION, same as -s.
-a, --all print all information, in the following order,
except omit -p and -i if unknown:
-s, --kernel-name print the kernel name
-n, --nodename print the network node hostname
-r, --kernel-release print the kernel release
-v, --kernel-version print the kernel version
-m, --machine print the machine hardware name
-p, --processor print the processor type (non-portable)
-i, --hardware-platform print the hardware platform (non-portable)
-o, --operating-system print the operating system
--help display this help and exit
--version output version information and exit
GNU coreutils online help:
Full documentation
or available locally via: info '(coreutils) uname invocation'
The help flag is mostly useful for a quick review of the supported flags,
options, and arguments, but doesn't teach us how to use the program
and what its purpose is. For this, we'll use man. This program
is not part of the coreutils package, but is installed by default
in most distributions:
$ man uname
NAME
uname - print system information
SYNOPSIS
uname [OPTION]...
DESCRIPTION
Print certain system information. With no OPTION, same as -s.
-a, --all
print all information, in the following order, except omit -p and -i if unknown:
-s, --kernel-name
print the kernel name
[...]
--help display this help and exit
--version
output version information and exit
AUTHOR
Written by David MacKenzie.
REPORTING BUGS
GNU coreutils online help:
Report any translation bugs to
COPYRIGHT
Copyright © 2022 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later .
This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.
SEE ALSO
arch(1), uname(2)
Full documentation
or available locally via: info '(coreutils) uname invocation'
The man program will open a "pager" with a manual for the
program we've asked. A pager is a program that allows us to scroll up and
down through the output of another program, search through it, and more.
This is useful when our terminal is short, doesn't have a scrollbar, or the
output is long.
We can scroll through the output with the Enter key, the up and down arrow
keys, and the j and k keys. The latter two are there for compatibility with
the vi text editor. You will find that many programs that
display data will have such compatibility. You can search by pressing the
slash (/) key, and entering some text. You can then cycle through the
results (if any) by pressing the n key. Move backwards in the results by
pressing Shift+n.
The manual pages on a Linux machine (much like other Unix-like OSs) are
categorized into different sections. We can see them if we run
man on itself:
$ man man
NAME
man - an interface to the system reference manuals
SYNOPSIS
man [man options] [[section] page ...] ...
man -k [apropos options] regexp ...
man -K [man options] [section] term ...
man -f [whatis options] page ...
man -l [man options] file ...
man -w|-W [man options] page ...
DESCRIPTION
man is the system's manual pager. Each page argument given to
man is normally the name of a program, utility or function. The manual
page associated with each of these arguments is then found and displayed.
A section, if provided, will direct man to look only in that section
of the manual. The default action is to search in all of the available
sections following a pre-defined order (see DEFAULTS), and to show only
the first page found, even if page exists in several sections.
The table below shows the section numbers of the manual followed by the
types of pages they contain.
1 Executable programs or shell commands
2 System calls (functions provided by the kernel)
3 Library calls (functions within program libraries)
4 Special files (usually found in /dev)
5 File formats and conventions, e.g. /etc/passwd
6 Games
7 Miscellaneous (including macro packages and conventions),
e.g. man(7), groff(7), man-pages(7)
8 System administration commands (usually only for root)
9 Kernel routines [Non standard]
A manual page consists of several sections.
Conventional section names include NAME, SYNOPSIS, CONFIGURATION,
DESCRIPTION, OPTIONS, EXIT STATUS, RETURN VALUE, ERRORS, ENVIRONMENT,
FILES, VERSIONS, CONFORMING TO, NOTES, BUGS, EXAMPLE, AUTHORS, and
SEE ALSO.
I have found that many developers are unaware of the existence of the
man command, leading them to make convoluted Google searches to
try to understand how to use a Linux machine. You should make
man an integral part of your Linux command arsenal.
Sometimes, we don't know what the name of a manual page is. We can search
for manual pages with the apropos command. Let's say I want to
search manual pages related to file compression:
$ apropos compress 7z (1) - A file archiver with highest compression ratio 7za (1) - A file archiver with highest compression ratio 7zr (1) - A file archiver with highest compression ratio archive_util (3) - libarchive utility functions archive_read_filter (3) - functions for reading streaming archives archive_write_filter (3) - functions enabling output filters brotli (1) - brotli, unbrotli - compress or decompress files bunzip2 (1) - a block-sorting file compressor, v1.0.8 bzcat (1) - decompresses files to stdout bzip2 (1) - a block-sorting file compressor, v1.0.8 ...
The output shows us the name of the manual page, and the section that it
belongs to in parentheses. Sometimes, the same name can belong to multiple
manual pages, in different sections. In that case, you may need to use
man SECTION PAGE to get the correct manual, e.g.
man 3 printf will show us the manual for the
printf function in the standard C library, rather than the
printf program from coreutils.
To get a list of all manual pages, we can use apropos .. This
will probably yield a very long list. It may be more useful to list all
pages in a specific section: apropos -s 5 ., for example, will
list all manual pages in section 5, which contains manual pages about file
formats and conventions.
We still haven't figured out the Linux distribution we're running on. For
this, let's first introduce the cat command. This command
concatenates files and prints them to the console. We won't need to
concatenate multiple files, though, but remember that this is something that
cat can do. The first thing we will do with cat is
look at a file called /etc/os-release:
$ cat /etc/os-release NAME="Arch Linux" PRETTY_NAME="Arch Linux" ID=arch BUILD_ID=rolling ANSI_COLOR="38;2;23;147;209" HOME_URL="https://archlinux.org/" DOCUMENTATION_URL="https://wiki.archlinux.org/" SUPPORT_URL="https://bbs.archlinux.org/" BUG_REPORT_URL="https://bugs.archlinux.org/" PRIVACY_POLICY_URL="https://terms.archlinux.org/docs/privacy-policy/" LOGO=archlinux-logo
Finally, we can see which distribution is installed. In this case, the
distribution is
Arch Linux. You want to
learn what all these keys and values mean? I'll leave this one to you, just
run man os-release and read the manual. You can expect to find
this file on most modern Linux installations. Older installations may not
have it, in which case you can expect to find a file called
/etc/lsb-release, or to have a tool called lsb_release.
LSB
was a standard some distributions complied with in the past, but it has
mostly been abandoned.
Some distributions will also provide their own specific identification files. For example, Debian and its descendants (including Ubuntu) will include an /etc/debian_version file:
$ cat /etc/debian_version 11.6
Another thing to note is the name of the machine, or "host", we are
connected to. Every Linux machine (as most other OSs) will have a name,
called "hostname". We'll talk more about hostname when we discuss
networking, but for now it's enough to just run the
hostname command and note the output:
$ hostname my-server
When Are We?
Let's get the operating system's current date:
$ date Wed Mar 15 09:47:06 PM IST 2023
By default, the date tool returns the date in a
non-standardized format. Let's ask the tool to return the date in the
ISO 8601
format:
$ date -Iseconds 2023-03-15T21:57:25+02:00
The -I flag tells the date tool to use the ISO
8601 format. By default, though, it will only print the date. By adding
seconds, I am instructing the tool to print the date along with
the time, in second resolution. As always, use man date for
more information.
Let's also see how long this operating system has been running continuously:
$ uptime 21:44:32 up 123 days, 6:22, 2 users, load averages: 1.00, 1.00, 1.00
Here we see that the operating system has been running nonstop for the past 123 days. We also get some additional information, such as the current date, the number of users logged in, and the system's load average, which we'll talk about in a later article. It is quite common for servers to have uptimes of years. Your desktop/laptop machine will often have much shorter ones.
Who Are We?
The next tools in our arsenal deal with users. Linux machines can work in one of two modes: single user, and multi user. In single user mode, only one user can be logged in to the operating system - the root user. This user is akin to the "Administrator" user in other operating systems. It has full access to all features of the operating system, all hardware interfaces, all file systems and files, everything. In single user mode, only a subset of system services are started, and many applications cannot work. It is mostly used for administrative purposes, rescue operations, etc.
Multi user mode, the one which you'll use almost always, allows multiple
sessions by multiple users. The same user can even login multiple times. Of
course, this includes the root user. Hopefully, your organization hasn't
given you root access to its servers, so lets see which user we are logged
in as. We can do this with the whoami tool from coreutils:
$ whoami ido50
My personal user is named ido50. The whoami tool is really a
wrapper around a different tool called id:
$ id -un ido50
Let's run id without any flags:
$ id uid=1000(ido50) gid=2000(devs) groups=2000(devs),977(docker)
This time we get more information that reveals more about Linux (well, POSIX) user management. Users not only have names, but also integer IDs. In my case, my user ID (a.k.a uid) is 1000. Users are also assigned to groups. Groups also have names and IDs (a.k.a gid). Every user, by default, has a primary group. On desktops, it is common for that group to have the same name as the user itself. On servers, this may not be the case. In the above example, my primary group is called "devs", and its gid is 2000. I also belong to another group, called "docker", whose gid is 977.
Different activities and features of the operating system can be made
available to a user either by merit of its uid, or by its association with a
group. While not part of the POSIX standard, nor coreutils, it is likely
that you will also find a tool called groups installed on the
machine. This tool allows you to see all the groups available on the
machine:
$ groups devs admins docker
Now let's use who to see which users are currently logged in to
the machine:
$ who ido50 tty1 2023-03-14 11:40 (:0) jimbo tty2 2023-03-15 16:30 (:0)
Here we get a list of all users currently logged in, including the date they
had logged in. Ignore the "tty" thing for now, we'll get to that later. We
can also use the users tool for simpler output (but read the
manual page to learn the difference from who):
$ users ido50 jimbo
What's Our Package Manager?
At this point, we know the kernel version, the identity of the Linux distribution, the system's time, hostname, our user identity, and how to use the system manual. Let's identify the package manager as well.
Since multiple package managers can be installed, or even no package manager, no one package manager can "claim ownership" of the system. Our best bet (and usually the right one) is to look for the package manager based on external knowledge. We know that Debian and its descendants use the APT/dpkg package manager. Still, we have to just try.
In your work, you will most probably be using one of the "big" Linux distributions: Red Hat Enterprise Linux (RHEL), CentOS, or Ubuntu. We can probably expect to find Amazon Linux and Alpine Linux too if the company you work for uses cloud services such as AWS or uses containers. Some distributions are based on other ones. Ubuntu is based on Debian. Amazon Linux and CentOS (now discontinued, but still in extensive usage) are based on Red Hat Enterprise Linux. That means the number of package managers we need to "know" isn't that big for most of our needs doing professional work with Linux.
Let's look at a map of common distributions, the package managers they use, and how to check if that package manager is installed:
| Debian-based | RHEL-based | Arch-based | Alpine | ||||
|---|---|---|---|---|---|---|---|
| Debian | Ubuntu | RHEL | CentOS | Amazon Linux | Arch Linux | ||
| Package Manager | APT/dpkg | yum/dnf | pacman | apk | |||
| Check With |
dpkg --version
apt --version
|
rpm --version
yum --version
dnf --version
|
pacman --version |
apk --version
|
|||
If we run the commands described and receive a non-error response, then that
package manager is installed. Note that some columns show multiple commands.
Let's look at the Debian-based distributions, for example. The
dpkg program is the main part of the package manager, it is the
program that installs and uninstalls package archives in the "deb" format,
verifies dependencies between packages, etc. It can only work with locally
available package archives. The apt program uses
dpkg internally, providing a higher-level interface to it, and
adds the ability to download and install packages from the distribution's
package repositories (i.e. the Internet). The same is true for
yum, which is a higher-level interface to rpm.
My development machine, as we've already seen, uses the Arch Linux
distribution. While being one of the most popular Linux distributions out
there, it is not commonly found in professional usage. Arch Linux uses the
pacman package manager. On my machine, the command
pacman --version
returns the following output:
$ pacman --version
.--. Pacman v6.0.2 - libalpm v13.0.2
/ _.-' .-. .-. .-. Copyright (C) 2006-2021 Pacman Development Team
\ '-. '-' '-' '-' Copyright (C) 2002-2006 Judd Vinet
'--'
This program may be freely redistributed under
the terms of the GNU General Public License.
Which Packages Are Installed?
To get a list of packages that were installed by the package manager, lets refer to the table again:
| Debian-based | RHEL-based | Arch-based | Alpine | ||||
|---|---|---|---|---|---|---|---|
| Debian | Ubuntu | RHEL | CentOS | Amazon Linux | Arch Linux | ||
| List With |
dpkg --list
|
rpm -qa
|
pacman -Q |
apk info
|
|||
When I run pacman -Q on my machine, I get a long list that
starts like this:
$ pacman -Q acl 2.3.1-3 acorn 1:8.8.2-1 alsa-lib 1.2.8-1 alsa-plugins 1:1.2.7.1-2 alsa-topology-conf 1.2.5.1-1 alsa-ucm-conf 1.2.8-1 alsa-utils 1.2.8-1 android-file-transfer 4.2-3 aom 3.6.0-1 apparmor 3.1.3-1 ...
We see the names of packages installed, and their versions. It's important to delay here to talk about version numbers for a bit. In the previous part, I explained how the original source of a software is called the "upstream", and how distributions package binary versions of upstream software and publish them to their package repositories. The numbers we see above reflect both the upstream version, and the package version.
Most software uses a three-part versioning scheme: three numbers, separated by dots, e.g. "1.22.3". The first number (1) is the MAJOR part of the version number, the second number (22) is the MINOR part, and the last number (3) is the PATCH part. Many only use two numbers (e.g. "1.22"). Some use a completely different scheme, but not many.
When an upstream software gets packaged, two more parts are added to this versioning scheme. This is true for almost all package managers and distributions, with the biggest difference being that different package managers have different algorithms for sorting version numbers. Any new version of a software/package must be "larger than" any previous version.
In the list above, the version number for the "acorn" package is "1:8.8.2-1". The upstream of this software is located here on GitHub. Let's look at this version number more closely:
| 1 | : | 8 | . | 8 | . | 2 | - | 1 |
| E | : | <------ U ------> | - | P | ||||
The meanings of the letters in the above figure are as follows:
- U: The upstream version. This is the version of the upstream "acorn" software that is being packaged. This is a version that you can find in the upstream website/repository.
- P: The package version. This is an integer that gets increased every time the same upstream version is packaged. This happens when there are bugs in the package itself (rather than the actual upstream software), or any other reason the package maintainers may choose to repackage the software. The first time a software version is packaged will receive the value 1 for its "P" component.
- E: Epoch. This is another integer that gets increased, but this one begins at 0, and is omitted if it is indeed 0 (so, you won't see "0:1.2.3-1", you'll see "1.2.3-1"). The epoch is only used when there is a sorting issue with the other two components. If a new version of a package is released, but for any reason the version isn't "larger than" previous versions, the epoch forces it to be so.
I will not be showing you how to install packages or perform other package manager-related actions in this part, as this will require more information that I am not ready to give you just yet. This will come later.
In our next part of the series, we will be learning about the file system, how to use files, device drivers, partitions, mount points, and more. Stay tuned.