User Tools

Site Tools


linux:basics

This page covers basic usage of / working with Linux.

A great resource are the man pages but also the official documentation of the GNU Coreutils

Getting Help

type

type is a builtin command of bash, that shows the type of a command, e.g. binary, script, alias, builtin command,.. Depending on the result you must use the respective help system.

type ls        # show type of ls
type -a kill   # show all commands with the name kill (=one program & one builtin!)
help

help is a bash builtin that gives you help for bash builtins (try help help).

help           # list all builtin bash commands
help type      # get help for builtin command 'type'
man, whatis, apropos

man shows you the man(ual) page for nearly every Linux program but also config files or system calls. Man pages are divided in sections, the most important ones are programs (1), config files (5) and sysadmin tools (8).

Man pages are preprocessed with the program groff and displayed using the program less. Use the arrow keys to navigate, type /searchphrase and enter to search, and type q to quit.

man man        # view man page of man itself
man passwd     # view man page of programm passwd (section 1)
man 5 passwd   # view man page of /etc/passwd (section 5)

whatis shows shows man pages whose names contain your search phrase.

whatis
man -f
man -aw #show full path to man file instead of name only

apropos shows man pages relevant to a topic by searching through both the “Name” and “Description” sections of the man page database.

apropos
man -k
info

info shows info page for many GNU programs. Sometimes info gives more in-depth information than man (e.g. info ls). Type h in info to get a help on how to navigate with the keyboard.

info
whereis, which

whereis shows the paths of a program, its config files and man pages. which only shows the full path of a program

whereis ls
which ls

Bash

bash is a commonly used shell a.k.a. command-line interpreter a.k.a. terminal. It is used to run programs, scripts, and basically interact with the system.

Here configuration, basic usage and bash scripting is explained.

Files and Directories

Exploring the File System

cd - change directory
cd /etc/init.d                  # change into directory /etc/init.d
cd  or  cd ~                    # change into home directory
cd ..                           # change into parent directory
cd -                            # change into last directory you visited before

Note, that this is a shell builtin, not a program.

ls - list directory contents
## list files, check free / used disk space
ls [<dir>]                      # list files (in dir)
ls -lah [<dir>]                 # list *all* files in *long format* (permissions, owner, size in *human readable format*,..)
ls -ld [<dir>]                  # info for directory itself

By default ls sorts files by name in ascending order, this can of course be changed using either short or long forms:

ls -lr                          # reverse order
ls -lS                          # ls -l --sort=size (biggest first)
ls -lX                          # ls -l --sort=extension (ascending)
ls -lt                          # ls -l --sort=time (modification time, newest first)
ls -lut                         # ls -l --sort=time --time=atime (last access time, newest first)
ls -lct                         # ls -l --sort=time --time=ctime (creation time, newest first)

ls can also be used to search in the current directory. You can of course make use of bash wildcards here.

ls *.txt                        # show all text files in current directory
ls -d *dir*                     # show all directories in current directory containing 'dir'

Disc Space

du - disc usage

du always calculates the size of a directory recursively. The options limit verbosity of how many lines are printed, but the totals do not change.

du                              # recursively print disc usage of all directories below the current directory (+ the current directory)
du -sh <file>                   # human readable summary of total disk usage, or current directory if no file is given
du -c <file1> <file2>           # grand total is displayed in addition to usage for two directories
du -h -d <depth>                # total disk usage dir and subdirs (show usage for each subdir as well)
du --exclude=*.mp3              # disk usage except for mp3s
du -a                           # also print usage for files
du -s                           # only print a summary-line (disc usage of current directory)
df - disc free

df is (kind of) the opposite of du, it shows remaining size (or inodes) on a partition.

df                              # show free space on all mounted filesystems
df -h                           # use human-readable format
df -i                           # show used / free nr of inodes
df -h <file>                    # show free space on the file system where file lies
GUI approaches

Great graphical insights into disc usage are given by programs that analyze a whole directory tree and let you easily identify files or directories that take up the most space:

ncdu is a text-based (ncurses) program, filelight a KDE-based GUI program and baobab aka Disk Usage Analyzer is the GNOME equivalent.

File Handling

file - determine file type

This tool determines the file type (and encoding for text files) by analyzing its contents.

file myfile                # print file type
file -i myfile             # print MIME type
file -bi myfile            # print MIME type (without prepending file name)
touch, mkdir - create
touch file.txt             # creates an empty file using touch
> file.txt                 # creates an empty file using bash redirection
mkdir directory            # creates an empty directory
mkdir -p parent/subdir     # create the whole directory tree (p = parent)

Actually, touch was written to change times (by default: atime and mtime - access & modification time) of a file. By default it also creates files if they do not exist yet, the parameter -c suppresses file creation.

touch file.txt                                      # set atime and mtime of file.txt to current time
touch --date="2012-05-01 18:12" file.txt            # set atime and mtime of file.txt
touch --time=atime --date="2012-05-01" file.txt     # only set atime of file.txt
cp - copy
cp srcfile targetfile      # copy files (dereferences symlinks automatically!)
cp -r srcdir targetdir     # resursively copy directories (dereferences symlinks automatically!)
cp -a  or cp -Rpd          # archive (backup) files: recursive, preserve symlinks, owner, timestamps, mode

Some important parameters:

  • -L: always dereference symlinks
  • -P: do not dereference symlinks
  • -d: do not dereference and preserve symlinks
  • -p: preserve owner, timestamps, mode (permissions)
  • -R or -r: recursive copy
  • -u: update - copy only if source file is newer than destination or missing
mv - move (or rename)

The program mv moves files or directories from one location to another. If both locations are on the same partition, it is a simple rename, otherwise the content is copied and deleted.

mv src target              # move a file or directory. 

When moving a file to an existing target that is also a file the target is overwritten without any notice. If the existing target is a directory, the file or directory to be moved is moved into the directory. It is therefore advisable to be careful when using mv with files.

mv -i srcfile targetfile   # interactive mode (asks if a file should be overwritten)
mv -n srcfile targetfile   # noclobber mode (does not overwrite files - but also does not print a message)

A practical tool for bulk renaming is the perl program rename. There you can use the full power of perl regular expressions. More examples.

rename -n s/old/new/ *           # dry-run of what would happen when renaming all files and directories in the current directory
rename -v s/old/new/ *           # actually rename and print all changes
rename 's/\.htm$/\.html/' *.htm  # complex example: rename all .htm files to .html

To make it easier to remember: as with cp, the first argument is the part that already exists.

ln src target  or  cp -l src target      # create a hardlink (both file reference the same inode)
ln -s src target  or  cp -s src target   # create a symbolic link (symlink)
rm, rmdir - remove
rm file.txt                 # remove a file
rm -r directory             # recursively remove a directory
rmdir directory             # remove an empty directory
rmdir -p parent/sub         # remove an empty directory tree

Secure File Handling

scp: secure copy

scp can copy files between hosts (also between two remote hosts)

Recursively copy localDir to a remote host:

scp -r localDir user@host:remoteDir 

To copy between two remote hosts either the source host must have credentials to log into the target host or you use the issuing host as third party via -3 (recommended).

scp -3 -r user1@host1:remoteDir2 user2@host2:remoteDir2
rsync: copy / synchronize directories locally and remotely

rsync can keep two directories synchronized intelligently by only syncing changes, which makes it useful for e.g. remote backups.

Simple local usage, where localDir and all its contents will be copied to otherLocalDir/localDir:

rsync -a localDir otherLocalDir

When copying remotely by default ssh is used. Quick setup:

# @destination (where files are copied to)
# set up ssh server (listens for incoming connections)
sudo apt-get install openssh-server
 
# @source
ssh-keygen
ssh-copy-id user@remotehost
rsync -av localdirectory user@remotehost:directory

More on this topic

File Permissions and Attributes

Each file in Linux has

  • a type
  • an owner
  • a group (similar to owner)
  • permissions (can owner/group/others read/write/execute?)
  • attributes

Note, that for root can not only change file permissions but they are also do not apply (except for the execution of files)! E.g. root can can always write/delete write-protected files or create files in in a write-protected directory. To protect files / directories from unwanted changes by root attributes can be used.

The output of ls -l e.g. looks like drwxrwxrwx. The first character identifies the file type, then there follow three access-tuples (for owner, group, others).

List of possible values for the file type (first character):

  • - file
  • d directory
  • l symlink
  • c character device (e.g. /dev/random)
  • b block device (e.g. /dev/sda1)
  • p fifo
  • s socket
chown - change owner

With chown both the owner and the group can be changed. Simple usage examples:

chown myuser file.txt           # change the owner of file.txt to 'myuser'
chown myuser:mygroup file.txt   # change the owner of file.txt to 'myuser' and the group to 'mygroup'
chown :mygroup file.txt         # only change the owner of file.txt to 'mygroup'
chown -R myuser directory       # change the owner of directory recursively to 'myuser'

If a symbolic link is given, by default the referenced file is changed, not the link itself. Use -h to avoid dereferencing and to change the owner of the symlink itself.

In recursive mode symbolic links not traversed by default (-P). With -L every encountered symlink can be traversed. (FIXME ausprobieren)

To only change the owner for files owned by a certain user and/or group use the from option with the user:group syntax:

chown -R --from=myuser:mygroup otheruser:othergroup directory

More examples

chgrp - change group

FIXME

chmod - change permission

A combination of the letters ugoa controls which users' access to the file will be changed:

  • u the user who owns it
  • g other users in the file's group
  • o other users not in the file's group
  • a all users

The letters rwxXst select file mode bits for the affected users:

  • r read
  • w write
  • x execute (or search for directories)
  • X execute/search only if the file is a directory or already has execute permission for some user
  • x set user or group ID on execution
  • t restricted deletion flag or sticky bit

The letter for affected users and the ones for the mode can be combined with one of =+-. (equals, add, remove)

chmod o+rw file    # add permissions for other (everybody) to read and write file

Explicit setting of all permissions is also possible in four octal numbers (e.g. 0644 or 0755 are common). The first represents the suid/sgid/sticky bits, the other three permissions for ugo.

  • number 1: read=4, write=2, execute=1
  • numbers 2-4: suid=4, sgid=2, sticky=1
chmod 0644 file    # do not set suid/sgid/sticky bit but allow owner rwx and group/other rw

Some advanced examples:

chmod u+srw,g+r file  or  chmod 4740 file # set suid, read and write access for user,
                                          # only read acces for group, nothing for other
chmod -R a+awX <dir>   # recursively give read/write access to a directory tree 
                       # (executable bit is only set for directories - and files where
                       # one of ugo already has the executable bit set)
chattr - change attributes
chattr - change file attributes (a=append only, i=immutable, s=secure deletion,..)
lsattr file       # view file attributes
chattr +a file    # allow only appends to file (requires root to (un)set)

SUID, SGID, Sticky Bit

suid,sgid on files

When users execute executables with those bits, the effective suid/sgid of the process is the owner of file (This is why regular useres can edit /etc/shadow with /bin/passwd)

sgid on directory

All files will get the same group as directory, and all subdirectories will inherit the sgid bit.

sticky bit

In sticky directories only owners of a file/directory and root can delete it (as in /tmp)

see also: http://www.bashguru.com/2010/03/unixlinux-advanced-file-permissions.html

umask

The umask determines which file permissions are set for files and directories when they are created. Note, that the umask definition is negative, i.e. will be subtracted from the default permission (e.g. 666 for files). A typical umask is 022: 666 - 022 = 644, which means the owner can read and write to a file, group and other can only read.

umask is a bash built-in to set & view the current umask of a user. Typically it is invoked in /etc/profile (or a file referenced from there).

umask              # print currently active umask (octal)
umask -S           # print currently active umask (symbolic output)
umask 0xxx         # set a umask (octal)

Two important tools can be used for searching files in Linux: find and locate. The big difference between those two: find searches the system 'live', locate uses a database that is typically updated only once a day.

find

In the simplest invocation find just prints all files in the current directory and all its subdirectories (similar to tree).

find                        # print all filenames in dir + subdirs
find dir1 dir2              # print many dirs + subdirs

To filter the files an “expression” must be added as last parameter. An expression can be quite complex and can consist of options, tests and actions. The most straight-forward test is -name:

find dir1 dir2 -name "*.jpg"      # find all .jpg files in both directories
find dir1 dir2 -iname "*.jpg"     # same, but ignore case (e.g. also find .JPG)

Some examples for options

find dir -maxdepth 1              # find all files in dir (but not its subdirs)
find dir -xdev -name "*.jpg"      # find files only in the current file system
                                  # (i.e. does not search NFS shares or /proc)

Some examples for tests:

find dir -size +10M               # find files bigger than 10 megabytes
find dir -size -50c               # find files smaller than 50 bytes (c=byte, k=kilobyte, M=Megabyt, G=Gigabyte)
find /bin -perm -u=s              # find all executables in /bin that have the suid bit set for its owner
find dir -perm 644                # find files with permissions being exactly 0644
find dir -atime -3                # find files accessed in the last two days
find dir -cmin +10                # find files created more than 10 minutes ago (for modified files: -mmin/-mtime)
find /usr/bin -executable -type f # find all executable files in /usr/bin

Some examples for actions: delete files, execute commands

find dir -exec command {} \;      # execute a command for each of the found file - {} will be replaced with the filename
find dir -exec command {} +       # execute one command for all of the found files - {} will be replaced with the filenames
find dir -execdir command {} \;   # same as -exec, but the execution directory is the directory of the file, not the starting directory
find dir -delete                  # delete found files

http://linux.101hacks.com/linux-commands/find-command-examples/

locate

locate & slocate..

grep

For searching contents of files grep can be used:

# find all files in or below the current directory that contain "searchString"
grep -r searchString .

Split & Merge

split can split one file into several files. Splitting can be done by lines (-l N) or into a nr of equal chunks (-n N). By default files are named x__: x plus two suffixes iterating through the alphabet, from xaa to xzz. The resulting names can changed: The prefix is an optinal argument after the file name, the nr of suffixes is set with -a.

split file.txt                      # split file.txt in chunks of 1000 lines and name them xaa, xab,..
cat xa* > merged.txt                # merge split file again
split -a 3 -l 1 big.csv splitfile   # create files with the names 'splitfile***', containing one line each
split -b 700M archive.tar.gz        # split an archive into 700MB pieces

csplit splits files according to a context (regex), hence the c.

csplit file.txt "/regex/"           # split file at first occurrence of regular expression
csplit file.txt "/regex/" "{*}"     # split file at all occurrences regular expression

Merging files is handled by cat (concatenate)

cat file1 file2 file3 > mergedfile  

Zip

Uncompressed archives

tar -cf archive.tar a b c            # create tar archive of file/folders a, b, c
tar -tf archive.tar                  # list contents of archive (actually: tests integrity of archive)
tar -xf archive.tar  -C destination  # extract content of tar archive into folder destination (otherwise: current directory)

tar creates uncompressed archives by default.

Compressing files

For compressing files the zip formats gzip (.gz), bzip2 (.bz2) and xz (.xz, which uses LZMA compression) are commonly used.

Gzip is the most commonly used one, but according to this in-depth comparison .xz is the superior format - it is fast to decompress, achieves high compression rates and also has reasonable compression times until level 2 or 3.

The following commands (un)compress a single file and then remove the old (un)compressed file.

gzip my.txt                          # creates the gzipped file my.txt.gz and remove my.txt
gunzip my.txt.gz                     # recreates my.txt and removes my.txt.gz; same as gzip -d

The commands bzip2 + bunzip2 and xz + unxz work in the same fashion.

For all zip formats there is a cat-style alias which allows for the creation of an uncompressed data stream:

zcat                 # stream uncompressed gzip file to stdout, same as gzip -cd
bzcat
xzcat                # same as xz --decompress --stdout

Compressed Archives

Often you want to compress archives as e.g. tar.gz files. This can either be done with separate calls to tar and gzip or directly with tar:

tar -czf archive.tar.gz a b c                  # create gzipped archive
tar -cf archive.tar a b c && gzip archive.tar  # does the same
tar -xzf archive.tar.gz                        # extract gzipped archive
gunzip archive.tar.gz && tar -xf archive.tar   # does the same

For taring or untaring zipped archives use the following flags: gzip (-z), bzip2 (-j), or xz (-J).

Working with archives

tar -tf my.tar           # print list of contents
cpio

cpio can copy files to and from archives FIXME

System Information

System information can be gathered from many places. Apart from various programs the two virtual filesystems procfs and sysfs are quite important. They allow us to peek into the kernel by providing /proc and /sys. There information about processes and other system information is presented in a hierarchical file-like structure - meaninging we can use it like every other file in Linux with our favorite tools.

Get help for procfs with man proc or on Wikipedia.

Sysfs is a successor of procfs and exports kernel data structures, their attributes, and the linkages between them to userspace. Documentation is here.

Hardware Information

With procfs we can get info about the CPU, Interrupts (IRQ) and IO ports of devices can be viewed via procfs - the latter two only if the kernel module for devices is loaded.

cat /proc/cpuinfo    # CPU
cat /proc/scsi/scsi  # available SCSI devices
cat /proc/interrupts # IRQs
cat /proc/ioports    # IO ports

These programs also find devices without working or loaded kernel module by live probing the hardware. If the standard output of these programs is not informative enough, try the verbose flag (-v or even -vv).

lspci                # list PCI devices - live information about PCI buses in the system and devices connected to them.
lsusb                # list USB devices
lshw                 # extract detailed information on the hardware configuration
                     # (basically every possible piece of hardware)

Advanced lshw

lshw -html > /tmp/html # generate a nice html to view it in a browser
lshw -short            # quick overview
lshw -class network    # filter by class (see -short for classes)

The tool get-edid (in the package read-edid) can help to identify monitors (which are not included in lshw or lspci):

sudo apt-get install read-edid
sudo get-edid

GUI applications available include kinfocenter or hardinfo.

Memory

free -m              # quick overview of total, free and used memory (and swap) in megabytes
cat /proc/meminfo    # in-depth look at memory stats via procfs

Sensors

Install the package lm-sensors to view sensor values like temperature or fan rpm:

sensors-detect       # set up which sensors to use
sensors              # view all sensore values

OS / distribution version

uname -a             # print kernel version, platform (i.e. 32/64 bit),..
lsb_release -a       # print distribution information, e.g. release name and version
cat /etc/os-release  # text file containing about the same information as in lsb_release
cat /etc/issue       # text file containing login greeting (typically the release name)

Kernel

Get kernel-related information:

cat /var/log/dmesg   # kernel messages (also the ones during boot)
dmesg                # print all kernel messages in kernel ring buffer (look for error messages there; and local printers)
cat /proc/cmdline    # kernel boot time arguments
lsmod                # show the status of modules in the Linux Kernel
cat /proc/modules    # loaded kernel modules (see also: lsmod, /etc/modules)
modinfo              # get information about kernel modules (e.g. parameters, version)
cat /lib/modules/<kernelversion>/modules.dep   # available kernel modules
lspci -v             # show hardware and which kernel driver they use

Programs & config files to load or unload kernel modules:

modprobe             # program to add and remove modules from the Linux Kernel (handles dependencies)
insmod               # simple program to insert a module into the Linux Kernel
rmmod                # simple program to remove a module from the Linux Kernel
depmod               # program to generate modules.dep
cat /etc/modules.conf   # config file specifying modules loaded at startup

Note: in Ubuntu 12.04 the file /etc/modules.conf seems to be replaced by the combination of /etc/modules and /etc/modprobe.d/*.

Other system information:

cat /var/log/messages   # system log - after start of the logging daemon (since Ubuntu 11.04: /var/log/syslog)
cat /var/log/boot.log   # system log - also before start of the logging daemon (written by dmesg after boot?)

Filesystem information

cat /proc/filesystems  # supported filesystems (currently loaded kernel modules)
cat /proc/swaps      # current swap partitions
cat /etc/mtab        # currently mounted partitions (the same output as when calling mount without parameters)
cat /etc/fstab       # config file for mountpoints

Users

These programs give (slightly different) information about the currently logged in users (and more)

w                    # currently logged in users & what they are doing (+uptime as first line)
who                  # currently logged in users

Uptime & boot date

who -b               # date+time of last system boot
uptime               # current time, how long the system has been running, how many users are currently logged on
                     # and the system load averages for the past 1, 5, and 15 minutes
cat /proc/uptime     # uptime of the system (seconds), and the amount of time spent in idle process (seconds)

Date & Time

date is a versatile tool for formatting dates / times and setting the system time (see also).

Print the current time in different formats

date                      # default - a human readable format
date --rfc-3339=date      # only the date - similar to ISO 8601
date --rfc-3339=seconds   # date and time - similar to ISO 8601
date +%Y-%m-%dT%H%M       # custom format useful for scripts, e.g. 2019-08-14T1129

Convert unix timestamps

date -d @1278923870       # print the unix timestamp in a human readable format

Working with Text

more file.txt                       # page through contents of file
less file.txt                       # page through contents of file more conveniently
paste file1 file2                   # view lines of two files next to each other, separated by a tab
paste -d ';' a.csv b.csv            # merge the columns of two .csv files line by line

paste allows viewing files next to each other i.e. veritcally concatenate them. This is useful to e.g. merge .csv files or as a simple drop-in for diff. The delimiter can be changed with -d.

nl file.txt                         # view content of file with line numbers
nl -i 5 -s '|--|' sort.txt          # increment line number by 5 and display the string |--| between line number and line 

nl numbers lines. The width of the line-number can be adjusted (-w N).

fmt lorem.txt                       # reformat text to 75 characters per line

fmt does the same as dynamic word wrap in modern editors. Paragraphs are reformatted to a character width (-w N), optionally with a different indentation for the first line per paragraph (-t).

rev lorem.txt                       # reverse each line and print it to stdout

rev reverses lines of text

pr lorem.txt                        # create pages with 66 lines including a header (in one text file9
pr -2 -l 50 lorem.txt               # 2-column layout, only 50 lines per page

pr converts text files for printing and should be combined with fmt because it takes lines as-is.

tee                                 # write stdin to stdout
echo 'hello' | tee file.txt         # writes 'hello' to stdout and into file.txt

tee writes it input to stdout and as many files as desired. Files can be appended (-a).

wc -l file1.txt file2.txt           # show word count for both files

wc aka 'word count' counts lines (-l), word (-w), (UTF-8) characters (-m), bytes (-c).

Filtering & Selecting

tr

tr, aka translate characters, is a good supplement to sed when it comes to special characters. With tr newlines, spaces, and non-printing characters can easily be replaced, deleted or squeezed. Important arguments are delete (-d), squeeze (-s) and complement (-c).

tr -d '[:space:]' < lorem.txt    # delete all white-space from lorem.txt and print it to stdout
tr -s '\n' ' ' < file.txt        # join all lines of a file into a single line
tr '[A-Za-z]' '[N-ZA-Mn-za-m]'   # rot13 
cat & tac

cat and tac concatenate files (opposite of split). cat is also often used to simply print a file to stdout to view its content or pipe it into other programs.

cat file1 file2    #print file1 and then file2 to stdout
cat -A file1       #show non-printing characters, tabs, line ends. ugly results for non-ASCII characters.
tac file1 file1    #print file1 and then file2 to stdout, but reverse the line order for each file
sort

sort can be used to sort lines in one (or more) files. The ordering can be alphanumeric (default), numeric (-n), human-readabe numeric e.g. 2K 1M (-h),… The input can also be sorted reverse (-r) or randomized (-R).

sort file1.txt file2.txt     #print sorted lines from all input files
sort -k 1.3 file.txt         #sort file by the line-content starting with the third character
sort -t ';' -k 3 file.csv    #print lines of csv, sorted by third the column
sort -u file.txt             #sort file and only print unique lines (see also: ''uniq'')
shuf

Shuffle - the opposite of sort. People on StackOverflow claim it's faster than sort --random-sort.

uniq

With uniq you can filter (omit) or report adjacent repeated lines.

uniq file.txt                 #filter adjacent repeated lines
uniq -i -u file.txt           #filter while ignoring case and only print unique lines
uniq -c -d file.txt           #filter, and only print duplicates and their count
uniq -s 2 -w 4 file.txt       #filter, but only compare 4 chars not including the first 2 for each line
head & tail

head and tail allow you to print the first / last lines or bytes of a file.

head file.txt             # print first 10 lines
head -n 5 file.txt        # print first 5 lines
head -c 10 file.txt       # print first 10 bytes (no word-option)
tail -n 10 file.txt       # print last 10 lines
tail -n +2 file.txt       # print everything from line 2 on (everything except first line)

tail can also be used to “follow” a file, i.e. print changes to the screen in real time:

tail -f file.txt    # print last 10 lines and all lines that are appended to the file in the future
tail -F file.txt    # same as --retry -f (realizes, when the file to be tailed is deleted and created again)

Regular Expressions

Regular expressions are a very powerful tool to match strings of text. It comes in various flavours, the one of concern here are POSIX.2 regexes, which are documented in man 7 regex. A great resource that also lists the slight differences between regular expressions in different programming languages,.. is Rex Egg

Tools making heavy use of regular expressions

awk

awk .. 1liners

sed

sed aka 'stream editor' is commonly used to replace strings in lines of text files. See this list of one-liners for more inspiration.

sed
sed s/abc/xxx/ file.txt    #simple use-case: replace the first 'abc' in each line of file.txt with 'xxx' and print it to stdout
sed s/abc/xxx/g file.txt   #global replace: replace all 'abc' per line
sed -i -e 's/a c/x x/g' -e 's/a/b/' file.txt    # replace in-line (overwrite file.txt) with -i, and several expression with -e

Two good tricks to create readable regexes are to (1) escape the whole regex with single quotes and (2) use a different separator than / when appropriate. See for yourself in this example of replacing all backslashes in a line with slashes.

sed s/\\\\/\\//g count.txt   #wtf
sed s#\\\\#/#g count.txt     #use any character as separator-char
sed 's#\\#/#g' count.txt     #avoid escaping for the shell. we still must escape the backslash for the regex.

Advanced usage examples:

sed -n 1~2p file.txt         #only print odd lines (=print line 1 and then print each line at step 2)

By default sed regular expressions are limited, e.g. do not support matching groups. Use the switch -E to activate extended regular expressions and enjoy matching groups:

sed -E 's#(.*),(.*)#\1;\2#g' #replace a single comma with a semicolon

However, even with extended expressions sed does not support non-greedy expressions as it is possible with PCRE (perl compatible regular expressions). A good resort is to simply use perl itself:

perl -pe 's/.*thevalue="(.*?)".*/\1/g' file.txt
grep

grep is used to print selected lines of a file, that match the regex.

grep regex file.txt              #print lines containing the string 'regex' in file.txt
grep '^[[:digit:]]' file.txt     #print lines starting with a digit
grep '[[:alpha:]]$' file.txt     #print lines ending with an alphanumeric character
grep '[1-3]a' file.txt           #print lines containing 1a, 2a or 3a

Flavours of grep:

rgrep         #same as grep -r: recursively execute grep for each file under a directory
fgrep         #same as grep -F: search for fixed strings, do not interpret the search string as regex
egrep         #same as grep -E: use extended regular expressions
egrep '([1-3]a){2}' file.txt   #print lines containing 1a, 2a or 3b exactly twice after each other
egrep 'one|two' file.txt       #print lines containing either 'one' or 'two' (using extended regular expressions)

Regular expression tipps

Capture groups

If you want to insert text with regular expressions, groups come in handy. A group is defined by parenthesis () in the find-expression. All that is matched by the expression within the parenthesis can be added to the replace-expression with ${n} or \{n}, where n is the group number. For example:

rename 's/(^\d*)/${1}_insert/' 1234_abc.txt     # renames file to: 1234_insert_abc.txt

CVS handling à la Database

expand file.txt           #replace each tab with (up to) 8 spaces
expand -t 4 file.txt      #replace each tab with (up to) 4 spaces
expand -t 8,16 file.txt   #first replaced tab has last space at position 8, second tab at 16. Other tabs are replaced with a single space.

expand replaces tabs with spaces. unexpand does the opposite.

cut
cut -c 3-5 file.txt       #print characters 3 to 5 of each line 
cut -d ';' -f 1,3 csv.csv #print columns 1 and 3 of a CSV

cut can be used to cut bytes (-b), characters (-c) or fields of a tab-separated file (-f). Currently cut is buggy in Ubuntu 12.04, -c is treated as -b and so the tool breaks for UTF-8 characters.

join a.csv b.csv                   #join two files on first field
join -t ';' -j 2 a.csv b.csv       #join two semicolon-separated files on the second field
join -t ';' -1 1 -2 4 a.csv b.csv  #join first field of a.csv on fourth field of b.csv
join --header a.csv b.csv          #joins and treats first line as header

join is used to join e.g. two CSVs on a common column, which must be sorted. The default field separator is blanks, i.e. spaces or tabs. By default the result for a non-successful join (no equal fields) is empty, because unpairable lines are omitted. These can be printed with -a.

Internationalization

Text can come in various encodings. recode, iconv and dox2unix convert file contents from one encoding to another, convmv converts filenames from one encoding to another.

Common encodings encountered in Austria are:

  • ASCII: 7 bit (128 characters) and base for all other encodings in the list
  • ISO-8859-1 (latin1): old western european ASCII extension
  • ISO-8859-15 (latin9): slightly updated version of latin1 (e.g. with €)
  • CP-1252: code page used by German versions of Windows (superset of ISO-8859-1)
  • UTF-8: Unicode
recode - in-place recoding & filtering

recode can operate in two modes:

  • as in-place-recoding tool
  • as filter

The 'request' typically looks like from..to, where from and to consist of charset/surface. Charsets are e.g. UTF-8 or ISO-8859-1 but also HTML, JAVA. Common surfaces for line-ends are carriage returns /CR (Unix) or carriage return and line feed /CL (Windows), but you can also convert e.g. from/to base 64 (/b64), hexadecimal (/x1), decimal (/d1) or quoted printable (/QP).

When from or to or the surface are ignored, the default is used. The default charset depends on the system locale, the surface on the charset.

Some in-place examples:

recode HTML file.txt                # from html to default
recode ..HTML file.txt              # from default to hml
recode UTF-8..HTML file.txt         # from utf8 to hml

In-place line-end switching:

recode ../CR file.txt               # convert to Unix line endings
dos2unix file.txt                   # same
recode ../CL file.txt               # convert to Windows line endings
unix2dos file.txt                   # same

Filter example:

cat file.txt | recode /b64 > newfile.txt   # convert base64 encoded file to default encoding
iconv - stream-based recoding
iconv -f ISO-8859-1 -t UTF-8 in.txt > out.txt  # stream-based recoding of file contents to UTF-8
convmv - recoding of file names

By default this command only prints what it would do, use --notest when the results are as expected.

convmv -f ISO-8859-1 -t UTF-8 --notest file

Power Management

Suspend / Hibernate with the package pm-utils

pm-suspend           # suspend to RAM
pm-hibernate         # suspend to disk

Turn off screen

xset dpms force off
linux/basics.txt · Last modified: 2019/09/03 10:57 by mstraub