silico.biotoul.fr
 

Linux tips

From silico.biotoul.fr

Jump to: navigation, search

Contents

Paths & I/O & files

Linux filesystem organization

/ The root directory.
/boot Boot directory (kernel and boot loader)
/etc Configuration files for the system. e.g. /etc/fstab specifies which drives to mount where. /etc/hosts lists network hosts and IP addresses.
/bin
/usr/bin
The /bin directory has the essential programs that the system requires to operate, while /usr/bin contains applications for the system's users.
/sbin
/usr/sbin
The sbin directories contain programs for system administration, mostly for use by the superuser (root).
/usr contains things that support user applications.
/usr/local /usr/local and its subdirectories (bin, lib, share, ...) are used for the installation of software and other files for use on the local machine i.e., not part of the official distribution.
/var contains files that change as the system is running. This includes: log (logs!), spool (files that are queued for some process, such as mail messages and print jobs)
/lib
/lib64
shared libraries (similar to DLLs of windows)
/home users personal directories
/root System administrator's home directory
/tmp holds temporary files (anybody/program can write)
/dev In linux, devices are represented by files under that directory (e.g. disks are block devices such as /dev/sda or /dev/hda usually for the 1st hard drive)
/proc virtual directory giving access to the running kernel and system. e.g. /proc/cpuinfo /proc/meminfo /proc/uptime
/media
/run/media
/mnt
removable devices (usb sticks, usb drives, ...) are usually mounted in one of those when plugged

File systems & partitions: fdisk, gdisk, parted, mount, lsof, mkfs

  1. storage devices (hard drives, CD/DVD, usb sticks, ...) are divided in partitions (at least one to be usable)
  2. partitions are formatted with a filesystem
    • fat16, fat32, ntfs from microsoft for windows systems
    • iso9660 for CDs
    • ext2, ext3, ext4 (and others) for linux systems
  3. partition are mounted somewhere in the filesystem e.g. / or /boot or /mnt/cdrom or /home to access the files
  4. there is another type of partition: LVM (Logical Volume Management) which is not a filesystem but allows a partition to span on more than one drive:
    1. LVM partitions of possibly different drives are converted to physical volumes
    2. physical volumes are combined in a volume group
    3. logical volumes are created in a volume group and formatted with a filesytem that can then be mounted

Note: different filesystems have different capabilities. For example, FAT does not manage permissions.

The advantage of using LVM is that if a filesystem becomes too small, one dedicate more physical space and resize the logical volume.

TO DO:

  • /etc/fstab
  • mount
  • umount
  • lsof for when umount fails because some files are currently used on the filesystem
  • partitioning with fdisk, gdisk, and other
  • formatting with mkfs (mkfs.ext4, mkfs.iso9660, ...)
  • mount bind, loop fs

Paths & directories: pwd, mkdir, rmdir, rm

  • pwd returns current directory
  • relative to current directory: e.g. ls subdir/subsubdir or ls ../whatever/
  • absolute ls ~user/path or ls /home/user/path
  • mkdir: create directory. e.g. mkdir ~/newdir or with subdirs mkdir -p ~/new/newsub/newsubsub
  • rmdir dirname or if not empty rm -fr dirname

Permissions: chown, chgrp, chmod

$ ls -l /home
drwxr-x---  69 barriot    gsi   4.0K Mar  5 12:09 barriot
drwx------   2 root       root   16K Jul 12  2010 lost+found
drwxr-xr-x  36 micas      stage 4.0K Jul 31  2012 micas
...
 
[barriot@gamborimbo ~]$ ls -lh Documents/TEACHING/2012-2013/M1-MABS/Graph/TP3-igraph.layout/
total 80K
drwxr-xr-x 1 barriot gsi 4.0K Mar 14  2012 HDE.old
-rw-r--r-- 1 barriot gsi  24K Mar 14  2012 91347.nwk
-rw-r--r-- 1 barriot gsi  942 Mar  1 16:02 Cleandb_Luca_1_S_1_1_65_Iso_Tr_1-CC1.cod
-rw-r--r-- 1 barriot gsi  28K Sep  7  2010 Cleandb_Luca_1_S_1_1_65_Iso_Tr_1-CC1.gr
-rw-r--r-- 1 barriot gsi 2.3K Sep  7  2010 Cleandb_Luca_1_S_1_1_65_Iso_Tr_1-CC1.tgr
-rw-r--r-- 1 barriot gsi 4.7K Mar  5 11:42 cmds.R
-rw-r--r-- 1 barriot gsi  871 Mar 14  2012 sample_tree_with_branchlengths.nwk
-rwxr-xr-x 1 barriot gsi  670 Mar 14  2012 drawTree.py
-rw-r--r-- 1 barriot gsi 5.6K Feb 27 16:57 Tree.py

First character corresponds to file type. d for directory, - for a regular file, ... Then by 3 for the owner (user), the group and the others.

For a regular file :

  • r for permission to read
  • w for permission to modify
  • x for being able to execute the file (binary executable or script)

For a directory :

  • r to be able to read the content (list files in the directory)
  • w to be able to add or remove files
  • x to be able to pass through that directory, i.e. cd to that dir or a subdir

Modify ownership of a file or directory :

# change owner
chown newuser file
# recursive
chown -R newuser directory
# change group
chgrp newgroup filename
# change both
chown newuser.newgroup filename

Modify permissions:

# numeric notation: r=4, w=2, x=1, thus for rwx-r-x---
chmod 760 file
# recursively on a sub directory
chmod -R 760 dirname
# symbolic notation:
chmod u=rwx,g=rx,o= filename
# add execute permission for all:
chmod a+x filename
# revoke write permission for others:
chmod o-w filename

File info & type: stat, file

[barriot@gamborimbo ~]$ stat /home/barriot
  File: `/home/barriot'
  Size: 12288     	Blocks: 24         IO Block: 4096   directory
Device: fd02h/64770d	Inode: 1048577     Links: 119
Access: (0755/drwxr-xr-x)  Uid: (  500/ barriot)   Gid: (  501/     gsi)
Access: 2013-03-05 10:39:08.927051453 +0100
Modify: 2013-03-05 10:39:00.240074369 +0100
Change: 2013-03-05 10:39:00.240074369 +0100
 Birth: -
[barriot@gamborimbo ~]$ stat .bashrc
  File: `.bashrc'
  Size: 517       	Blocks: 8          IO Block: 4096   regular file
Device: fd02h/64770d	Inode: 1052239     Links: 1
Access: (0755/-rwxr-xr-x)  Uid: (  500/ barriot)   Gid: (  501/     gsi)
Access: 2013-03-02 16:04:19.268619379 +0100
Modify: 2012-10-12 17:24:24.818899216 +0200
Change: 2012-11-18 23:25:18.869870338 +0100
 Birth: -
[barriot@gamborimbo ~]$ file /home/barriot
/home/barriot: directory
[barriot@gamborimbo ~]$ file .bashrc
.bashrc: ASCII text

File content, concatenation, split, ... and redirections: cat, split, head, tail, more, less, tac

# display content
cat somefile.txt
# concatenate 2 or more files
cat file_1.txt file_2.txt
cat *.txt
# redirect to a file (if file exists it will be overwritten otherwise it gets created)
cat file_1.txt file_2.txt > result.txt
# redirect to a file (if file exists it will be appended at the end otherwise it gets created)
cat others*.txt >> result.txt
 
# split a file into smaller parts
## by file size (1kb)
split --bytes 1024 big.file
split -b 1024 big.file
## by number of lines per output files
split --lines 100 big.text.file.txt
split -l 100 big.text.file.txt
## by number of output files
split --number 10 big.file
split -n 10 big.file
## specify output files prefix and numbered numerically (3 digits)
split -n 100 -a 3 -d big.file part_ 
split -n 100 --suffix-length 3 --numeric-suffixes big.file part_ 
 
# displays line of a file in reverse order
tac file.txt
 
# first 10 lines of files
head -n 10 *.txt
# last 10 lines
tail -n 10 *.txt
# last lines of a file and keeps outputting new lines added to the file
tail -f /var/log/httpd/error.log
 
# content of a file page by page (space for next page, enter for next line)
more file.txt
# content of a file: page up/down to browse. /expr to search (then n for next match and p for previous match). q to exit
less file.txt


grep, cut, sort, wc, find

  • grep

To find files containing some string or regular expression:

grep myWeirdFunctionName *.cpp

Or recursively:

grep -r myWeirdFunctionName *

To display at what line it is found:

grep -n myWeirdFunctionName myweirdlibrary.cpp

  • cut

To display only some columns: Media:Data_Mining_heart.txt

$ head Data_Mining_heart.txt
age	sex	chest_pain_type	resting_blood_pressure	serum_cholesterol	fasting_blood_sugar	resting_ecg_results	max_heart_rate	exercise_induced_anginadepression_induced	slope	major_vessels	thal	disease
continuous	discrete	discrete	continuous	continuous	discrete	discrete	continuous	discrete	continuous	continuous	continuous	discrete	discrete
													class
70	M	4	130	322	FALSE	2	109	FALSE	2.4	2	3	normal	TRUE
67	F	3	115	564	FALSE	2	160	FALSE	1.6	2	0	reversable_defect	FALSE
57	M	2	124	261	FALSE	0	141	FALSE	0.3	1	0	reversable_defect	TRUE
64	M	4	128	263	FALSE	0	105	TRUE	0.2	2	1	reversable_defect	FALSE
74	F	2	120	269	FALSE	2	121	TRUE	0.2	1	1	normal	FALSE
65	M	4	120	177	FALSE	0	140	FALSE	0.4	1	0	reversable_defect	FALSE
56	M	3	130	256	TRUE	2	142	TRUE	0.6	2	1	fixed_defect	TRUE
 
$ head heart.txt.orange.tab | cut -f 1,4
age	resting_blood_pressure
continuous	continuous
 
70	130
67	115
57	124
64	128
74	120
65	120
56	130
 
# sometimes we need to specify the character delimiting the columns
$ tail clinical_info.csv 
"X86A40";12.17;;"F";"Uppsala";"F";61;"F";24;"G1"
"X87A79";12.08;;"T";"Uppsala";"T";36;"T";12;"G2"
"X88A67";4.25;"F";"T";"Uppsala";"T";63;"T";24;"G3"
"X89A64";12.08;;"T";"Uppsala";"T";60;"T";23;"G1"
"X8B87";11.33;;"T";"Uppsala";"T";58;"T";17;"G2"
"X90A63";2.67;;"T";"Uppsala";"T";76;"T";26;"G3"
"X94A16";11.08;;"T";"Uppsala";"T";73;"T";6;"G2"
"X96A21";0.08;"F";"T";"Uppsala";"T";63;"T";38;"G3"
"X99A50";10.5;;"T";"Uppsala";"T";82;"F";19;"G2"
"X9B52";11.33;;"T";"Uppsala";"T";71;"T";12;"G3"
$ tail clinical_info.csv | cut -f 10 -d';'
"G1"
"G2"
"G3"
"G1"
"G2"
"G3"
"G2"
"G3"
"G2"
"G3"

  • sort, wc
# sort thal values
$ head Data_Mining_heart.txt | cut -f 13 | sort
 
discrete
fixed_defect
normal
normal
reversable_defect
reversable_defect
reversable_defect
reversable_defect
thal
# remove duplicates
$ cat Data_Mining_heart.txt | cut -f 13 | sort -u
discrete
fixed_defect
normal
reversable_defect
thal
 
# number of characters, words, lines
$ wc *.loocv
   281   1941  14174 knn.10.loocv
   281   1941  14154 knn.11.loocv
   281   1941  14171 knn.12.loocv
   281   1941  14174 knn.13.loocv
   281   1941  14150 knn.14.loocv
   281   1941  14167 knn.15.loocv
   281   1941  14150 knn.16.loocv
   281   1941  14166 knn.17.loocv
   281   1941  14172 knn.18.loocv
   281   1941  14161 knn.19.loocv
   281   1941  14194 knn.1.loocv
   281   1941  14179 knn.20.loocv
   281   1941  14153 knn.2.loocv
   281   1941  14162 knn.3.loocv
   281   1941  14166 knn.4.loocv
   281   1941  14149 knn.5.loocv
   281   1941  14164 knn.6.loocv
   281   1941  14159 knn.7.loocv
   281   1941  14184 knn.8.loocv
   281   1941  14148 knn.9.loocv
   281   1941  14210 NaiveBayes.loocv
  5901  40761 297507 total
# unique values of thal
$ cat Data_Mining_heart.txt | cut -f 13 | sort -u | wc -l
6

  • find

Find allows to filter files and dirs based on various attributes:

  • name/pattern
  • date/age
  • size
  • type
  • permissions
  • and others...
# find files by name recursively starting from the current subdirectory
find ./ -name what*I*am*looking*
 
# by time accessed (amin in minutes or atime in days), changed (cmin in minutes or ctime in days), modified (mtime in days)
## accessed less than 10 minutes ago
find ./ -amin -10
## changed more than 1 hour
find ./ -ctime +60

  • sed

To replace something (e.g. jamaica.biotoul.fr) by somethingelse (e.g. jamaica.ibcg.biotoul.fr) in a file:

sed  -i 's/jamaica.biotoul.fr/jamaica.ibcg.biotoul.fr/g' gsiwikidb.after_sed.sql

Remove sequence limits from Jalview output:

sed -i 's/\/[0-9]*-[0-9]*//' CleanupFile_slimites.fa

To replace from a file to a file:

sed 's/jamaica.biotoul.fr/jamaica.ibcg.biotoul.fr/g' < gsiwikidb.before_sed.sql > gsiwikidb.after_sed.sql

To apply that to a set of files using find:

find /var/www -type f -exec sed -i 's/jamaica.biotoul.fr/jamaica.ibcg.biotoul.fr/g' {} \;

(I'm not sure about the ending \; .. it was in my bash script).

Processes

Voir aussi https://opensource.com/article/21/8/linux-procps-ng

# list processes. The 1st column is the PID (process id) which can be used to send signals
$ ps faux | less
 
# top processes, hit M or P to sort by memory or CPU, q to exit
# physical memory used is the RES column (RESident)
$ top
# a more user friendly version: htop http://htop.sourceforge.net/
 
# launch a program in background with &
$ longtask &
# you can run multiple commands in parallel:
$ cmd1 & cmd2 & cmd3
$ date & ls
[1] 8271
Tue Mar  5 17:15:57 CET 2013
heart.txt.orange.tab  knn.13.loocv  knn.17.loocv  knn.20.loocv  knn.5.loocv  knn.9.loocv       NaiveBayes.py
knn.10.loocv          knn.14.loocv  knn.18.loocv  knn.2.loocv   knn.6.loocv  knn.py            sample-heart.tab
[1]+  Done                    date
 
# if you forget the &
# it is possible to stop the process (in the foreground) with control+Z
$ sleep 99999
^Z
[1]+  Stopped                 sleep 99999
$ sleep 1239999
^Z
[2]+  Stopped                 sleep 1239999
# list of running jobs
$ jobs
[1]-  Stopped                 sleep 9999
[2]+  Stopped                 sleep 1239999
# put job 1 in the foreground
$ fg 1
$ fg 1
sleep 9999
^Z
[1]+  Stopped                 sleep 9999
# put it in the background
$ bg 1
[1]+ sleep 9999 &
$ jobs
[1]-  Running                 sleep 9999 &
[2]+  Stopped                 sleep 1239999
$ fg 2
^C
$ ps
  PID TTY          TIME CMD
 8342 pts/0    00:00:00 sleep
 8465 pts/0    00:00:00 ps
10318 pts/0    00:00:00 bash
# kill a process by its PID
$ kill 8342
[1]+  Terminated              sleep 9999
# by default, the kill command asks the process to stop
# but sometimes the process is not listening (e.g. it is stopped)
$ sleep 888888
^Z
^Z
[1]+  Stopped                 sleep 888888
[barriot@gamborimbo TP-Classification]$ ps
  PID TTY          TIME CMD
 8513 pts/0    00:00:00 sleep
 8520 pts/0    00:00:00 ps
10318 pts/0    00:00:00 bash
$ kill 8513
$ ps
  PID TTY          TIME CMD
 8513 pts/0    00:00:00 sleep
 8535 pts/0    00:00:00 ps
10318 pts/0    00:00:00 bash
$ jobs
[1]+  Stopped                 sleep 888888
# nothing happened because the process is stopped and thus cannot listened and respond to our demand (until it is running again)
# to kill such a process, we have to send a SIGKILL (9) instead of the default SIGTERM (15)
$ kill -9 8513
[1]+  Killed                  sleep 888888
# this way we ask the system to kill the process instead asking to the process
 
# list of signals
$ kill -l
 1) SIGHUP	 2) SIGINT	 3) SIGQUIT	 4) SIGILL	 5) SIGTRAP
 6) SIGABRT	 7) SIGBUS	 8) SIGFPE	 9) SIGKILL	10) SIGUSR1
11) SIGSEGV	12) SIGUSR2	13) SIGPIPE	14) SIGALRM	15) SIGTERM
16) SIGSTKFLT	17) SIGCHLD	18) SIGCONT	19) SIGSTOP	20) SIGTSTP
21) SIGTTIN	22) SIGTTOU	23) SIGURG	24) SIGXCPU	25) SIGXFSZ
26) SIGVTALRM	27) SIGPROF	28) SIGWINCH	29) SIGIO	30) SIGPWR
31) SIGSYS	34) SIGRTMIN	35) SIGRTMIN+1	36) SIGRTMIN+2	37) SIGRTMIN+3
38) SIGRTMIN+4	39) SIGRTMIN+5	40) SIGRTMIN+6	41) SIGRTMIN+7	42) SIGRTMIN+8
43) SIGRTMIN+9	44) SIGRTMIN+10	45) SIGRTMIN+11	46) SIGRTMIN+12	47) SIGRTMIN+13
48) SIGRTMIN+14	49) SIGRTMIN+15	50) SIGRTMAX-14	51) SIGRTMAX-13	52) SIGRTMAX-12
53) SIGRTMAX-11	54) SIGRTMAX-10	55) SIGRTMAX-9	56) SIGRTMAX-8	57) SIGRTMAX-7
58) SIGRTMAX-6	59) SIGRTMAX-5	60) SIGRTMAX-4	61) SIGRTMAX-3	62) SIGRTMAX-2
63) SIGRTMAX-1	64) SIGRTMAX	
 
# notice SIGSTOP and SIGCONT which are the same as Ctrl-Z fg or bg
 
# it is possible to kill all process having a given name
$ killall anoying_process
 
# when you run a command in the shell, the shell is its parent process
# if you log to a remote host to run a very long analysis and get disconnected
# the remote shell dies and all of its children also die
# to prevent this behavior, it is possible to run the command
$ nohup ./long_analysis &
 
# nohup stands for no hang up. If you forgot the nohup, it is possible to achieve the same as follows:
$ sleep 999999999
^Z
[1]+  Stopped                 sleep 999999999
$ bg
[1]+ sleep 999999999 &
$ ps
  PID TTY          TIME CMD
 8878 pts/0    00:00:00 sleep
 8883 pts/0    00:00:00 ps
10318 pts/0    00:00:00 bash
$ disown -h 8878

bash shell

variables and aliases

# list of environment variables
$ set | head
AUTOJUMP_DATA_DIR=/home/barriot/.local/share/autojump
AUTOJUMP_HOME=/home/barriot
BASH=/usr/bin/bash
BASHOPTS=checkwinsize:cmdhist:expand_aliases:extquote:force_fignore:hostcomplete:interactive_comments:progcomp:promptvars:sourcepath
BASH_ALIASES=()
BASH_ARGC=()
BASH_ARGV=()
BASH_CMDS=()
BASH_LINENO=()
BASH_SOURCE=()
 
# value of a given variable (PATH is the list of directories in which executables are search when a command is issued)
$ echo $PATH
/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/local/bin:/usr/local/bin:/home/barriot/bin:/usr/local/bin:/software/bin:/opt/bin:/home/barriot/.bin:/opt/bin:/usr/lib64/openmpi/bin
# PS1 is the formating of the bash prompt. e.g. for [user@host current_dir]$
$ echo $PS1
[\u@\h \W]\$
 
# set or modify a variable (local to the current shell)
$ MYVAR=youpi
$ echo $MYVAR 
youpi
 
# remove a variable
$ unset MYVAR 
$ echo $MYVAR 
 
$ 
 
# set or modify a variable for the current shell and its future children
$ export MYVAR=yopla

It is possible to customize the environment through the ~/.bashrc script which is run every time the user starts a bash shell. You can for example, add your own bin directory to the PATH variable or alias some commands you use very often:

# .bashrc
 
# Source global definitions
if [ -f /etc/bashrc ]; then
        . /etc/bashrc # the . (or source command) is like an include or import (it executes the script as if it was typed in the current shell)
fi
 
# add my own bin directories
export PATH=$PATH:$HOME/.bin:/opt/bin
 
# User specific aliases and functions
 
alias l='ls -lh'
alias ssh='ssh -Y'
alias psl='ps faux | less'
alias top='htop'
 
export VISUAL=geany
export EDITOR=geany
export RUBYOPT=rubygems
 
# OPENMPI (for phyml)
export PATH=$PATH:/usr/lib64/openmpi/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64/openmpi/lib


exit code of a process $?, test, if

Based on the core processing of a C program (int main), every process exits with an integer value. Usually, 0 means fine and something different means trouble. This value corresponds to the special environment variable $?.

$ date
Wed Mar  6 09:56:58 CET 2013
$ echo $?
0
 
$ date -unrecognized_option
date: invalid option -- 'n'
Try `date --help' for more information.
$ echo $?
1

This exit code can be used by the calling program to take appropriate actions.

First, let's focus on && and || boolean operators and the lazy evaluation of an expression:

  • true AND false AND .... will evaluate as false whatever comes after the 1st false encountered (thus, no need to evaluate what's left),
  • false OR false OR true OR ... will evaluate to true whatever comes after the 1st true encountered.

This lazy evaluation can be used to chain commands: with && between commands, commands will be executed until one fails, and with || between commands, they will be executed until one succeeds.

$ ls > /dev/null || echo "unable to ls" && date -unrecognized 2> /dev/null || echo "problem with date command"
problem with date command

The above can be useful but sometimes too limited for some more elaborated tests. That's were the test program comes in.

$ which test
/usr/bin/test

It is a common source of confusion when beginners code their 1st programs which they often name test, and when invoking their program they actually call this one. No matter what modification is made to their source code, the execution does nothing, no error, nothing... until they invoke ./test

The test program allows to evaluate expressions:

  • integers and strings comparisons
  • file existence, type, permissions, date
# string length (-n non zero length?, -z zero length?)
$ test -n "youpi"
$ echo $?
0
$ test -n ""
$ echo $?
1
 
# string comparison
$ test $HOME = "/home/barriot" && echo $?
0
$ test $HOME != "/root" && echo $?
0

See the man for other tests.

The exit code of any process (including test) can be used by an if statement:

$ date
Wed Mar  6 10:43:51 CET 2013
$ date +%H
10
$ current_hour=$(date +%H) # the output of date is stored in the current_hour environment variable
$ echo $current_hour 
10
$ if test $current_hour -gt 12; then echo "already 12"; else echo "patience..."; fi
patience...
$
# often seen shortcut:
$ if [ $current_hour -gt 12 ]; then echo "already 12"; else echo "patience..."; fi
patience...

loops: for

for iterates through a list of values:

# from a given list
$ myList='1st 2nd last'
$ for i in $myList; do echo "Processing $i task"; done
Processing 1st task
Processing 2nd task
Processing last task
 
# combined with seq:
$ seq 2
1
2
$ seq 4 6
4
5
6
$ for i in $(seq 4 6); do echo "Processing task $i"; done
Processing task 4
Processing task 5
Processing task 6

Then one can elaborate, for example to apply sed to files containing a given expression:

files=$(grep -R silico * | grep -v .svn | cut -f 1 -d':')
for i in $files; do sed -i 's/silico.biotoul.fr/jamaica.ibcg.biotoul.fr/g' $i; done

Archive

gzip, bzip

gzip filename will produce filename.gz

gunzip filename.gz will do the opposite.

bzip2 produces smaller files:

bzip2 file
bunzip2 file.bz2

tar

tar allows to backup or restore directories and files (preserving permissions, owner, ...).

# tar some directory tree
$ tar cvf myproject.tar project
# c creates new archive
# v is for verbose (prints out what files are archived)
# f is for the name of the archive (must be followed by a filename.tar)
 
# same with gzipped archive
$ tar cvzf myproject.tar project
 
 
# backup whole file system
cd /
tar cpjf system_backup.tar.bz2 \
    --exclude=/system_backup.tar.bz2 \
    --exclude=/lost+found \
    --exclude=/media \
    --exclude=/mnt \
    --exclude=/proc \
    /
# c is for create
# p is for preserving permissions
# j is for producing compressed (bzip2) archive
# f is for specifying the archive filename
 
 
# list content of archive
$ tar tvf archive.tar
$ tar tvzf archive.tar.gz
$ tar tvzf archive.tgz
$ tar tvjf archive.tar.bz2
 
# extract whole archive
$ tar xf archive.tar
$ tar xzf archive.tar.gz
$ tar xjpf archive.tar.bz2 # preserve permissions and owner
 
# extract only one file
$ tar xjpf system_backup.tar.bz2 etc/fstab

rsync

rsync is a great program for backups. It allows to backup/transfer only what differs or is newer, and supports backup over the network (through ssh).

Useful options:

  • --archive
recursive copy, preserves symlinks, permissions, times, owner, group, devices/specials
  • --compress
if over ssh
  • --itemize-changes
nice display of what's done
  • --stats
prints statistics on what happened
  • -h
  • -v
  • --progress
  • --dry-run
to perform a simulation
  • --delete
to remove what's no more in the source
  • -c
skip based on checksum, not mod-time & size

examples:

  • simulate a backup
rsync --dry-run --archive --itemize-changes --stats -h src_dir dest_dir
  • backup a directory
rsync --archive --itemize-changes --stats -h src_dir dest_dir
  • backup and remove what's no more in source dir
rsync --delete --archive --itemize-changes --stats -h src_dir dest_dir
  • backup over ssh
rsync --archive --itemize-changes --stats -h -e ssh src_dir user@host:dest_dir

Network

nslookup, ping

All symbolic names are translated to IP addresses (v4 v6). IPv4 are of the form 10.0.0.1 while IPv6 are longer (to allow more machines on the network).

When contacting a host, the system needs to find its IP address. This is name resolution and is provided by a DNS (Domain Name System).

$ nslookup silico.biotoul.fr
Server:		192.168.11.1
Address:	192.168.11.1#53
 
Name:	silico.biotoul.fr
Address: 193.48.191.15

To know if a host is accessible and powered on:

$ ping www.google.com
PING www.google.com (74.125.230.242) 56(84) bytes of data.
64 bytes from par08s10-in-f18.1e100.net (74.125.230.242): icmp_req=1 ttl=51 time=19.9 ms
64 bytes from par08s10-in-f18.1e100.net (74.125.230.242): icmp_req=2 ttl=51 time=21.6 ms

However, some hosts are configured not to answer to ping requests.

wget

Another useful command is wget. It allows to retrieve files and websites from internet (http, https, http proxy, ftp).

wget www.google.com
wget ftp://ftp.ncbi.nlm.nih.gov/pub/geo/DATA/SOFT/by_platform/GPL199/GPL166_family.soft.gz

See the man page for various usage (authenticated, whole site retrieval, follow links on a page, ...).

ssh, scp

ssh stands for secure shell and allows to connect to a host with an encrypted connection.

ssh host_or_ip
ssh user@host_or_ip

Sometimes, you will want to launch graphical programs, thus the X server connection must be forwarded (the X server is what displays graphics on a linux system)

ssh -X host

or sometimes

ssh -Y host

scp allows to copy from or to a distant server:

scp mylocalfile user@host:path/newname

This can be done on a directory:

scp -r user@host:/home/barriot ./barriot_copy

Job scheduling: cron, at

Programs can be scheduled to run once (with the at command) or periodically (with the cron system).

cron allows to run commands periodically:

# list of cron job
$ crontab -l
no crontab for barriot
 
[root@fidji ~]# crontab -l
0 0 * * * /root/backup_scripts/iroise_db_backup daily 
0 8 * * 6 /root/backup_scripts/iroise_db_backup weekly 
0 2 1 * * /root/backup_scripts/iroise_db_backup monthly

crontab syntax (from wikipedia):

*    *    *    *    *  command to be executed
┬    ┬    ┬    ┬    ┬
│    │    │    │    │
│    │    │    │    │
│    │    │    │    └───── day of week (0 - 7) (0 or 7 are Sunday, or use names)
│    │    │    └────────── month (1 - 12)
│    │    └─────────────── day of month (1 - 31)
│    └──────────────────── hour (0 - 23)
└───────────────────────── min (0 - 59)

To modify the crontab:

crontab -e

This launches the default text editor (vi?) to alter the schedules. Exit the editor with saving the modifications to install the new schedule.

Other things

package managers: yum, synaptic, apt

  • updates:
# yum list updates
Loaded plugins: langpacks, presto, refresh-packagekit
Updated Packages
google-chrome-stable.x86_64                                                      25.0.1364.152-185281                                                      google-chrome
libpinyin.x86_64                                                                 0.8.1-1.fc17                                                              updates      
libpinyin-data.x86_64                                                            0.8.1-1.fc17                                                              updates      
phpMyAdmin.noarch                                                                3.5.7-1.fc17                                                              updates      
ruby-libs.x86_64                                                                 1.9.3.392-29.fc17                                                         updates      
systemd.i686                                                                     44-24.fc17                                                                updates      
systemd.x86_64                                                                   44-24.fc17                                                                updates      
systemd-analyze.x86_64                                                           44-24.fc17                                                                updates      
systemd-sysv.x86_64                                                              44-24.fc17                                                                updates      
# yum update
  • search
# yum search nvidia
Loaded plugins: langpacks, presto, refresh-packagekit
========================================================================= N/S Matched: nvidia ==========================================================================
akmod-nvidia.x86_64 : Akmod package for nvidia kernel module(s)
akmod-nvidia-173xx.x86_64 : Akmod package for nvidia-173xx kernel module(s)
akmod-nvidia-96xx.x86_64 : Akmod package for nvidia-96xx kernel module(s)
kmod-nvidia.x86_64 : Metapackage which tracks in nvidia kernel module for newest kernel
...


  • search a particular file:
# yum provides */libglx.so
Loaded plugins: langpacks, presto, refresh-packagekit
google-chrome/filelists                                                                                                                          | 1.1 kB     00:00     
updates/filelists_db                                                                                                                             |  14 MB     00:02     
xorg-x11-drv-catalyst-12.10-1.fc17.x86_64 : AMD's proprietary driver for ATI graphic cards
Repo        : rpmfusion-nonfree-updates
Matched from:
Filename    : /usr/lib64/xorg/modules/extensions/catalyst/libglx.so



xorg-x11-drv-catalyst-legacy-12.6-3.fc17.x86_64 : AMD's proprietary driver for ATI legacy graphic cards
Repo        : rpmfusion-nonfree-updates
Matched from:
Filename    : /usr/lib64/xorg/modules/extensions/catalyst-legacy/libglx.so 


  • list (installed or available)
# yum list htop
Loaded plugins: langpacks, presto, refresh-packagekit
Installed Packages
htop.x86_64                                                                    1.0.2-1.fc17                                                                     @updates
  • install
yum install kernel.x86_64
  • remove
yum remove xorg-x11-drv-nvidia

diff & diffuse

diff display differences between 2 files:

$ diff ~/Documents/Dev/perllibs/DBConnection.pm /software/perllibs/DBConnection.pm | more
2c2
< # Version: $Id: DBConnection.pm 34 2011-03-01 09:46:00Z gsi $
---
> # Version: $Id: DBConnection.pm 45 2012-10-10 10:28:48Z gsi $
14c14
<  # CONNECTION
---
>  # CONNECT
17a18,20
>  # DISCONNECT
>  $db->close;
> 
46a50,55
>  ###########
>  # BONUSES #
>  ###########
> 
...

Or side by side:

$ diff -y ~/Documents/Dev/perllibs/DBConnection.pm /software/perllibs/DBConnection.pm | more
package DBConnection;						package DBConnection;
# Version: $Id: DBConnection.pm 34 2011-03-01 09:46:00Z gsi $ |	# Version: $Id: DBConnection.pm 45 2012-10-10 10:28:48Z gsi $

=head1 NAME							=head1 NAME

DBConnection - a helper/wrapper for MySQL (DBI) database conn	DBConnection - a helper/wrapper for MySQL (DBI) database conn

=head1 SYNOPSYS							=head1 SYNOPSYS

 # USE: installed in /software/perllibs				 # USE: installed in /software/perllibs
 use lib '/software/perllibs';					 use lib '/software/perllibs';
 use DBConnection;						 use DBConnection;

 # CONNECTION						      |	 # CONNECT
 my $db = DBConnection->new(host=>'localhost', db=>'cgdb', us	 my $db = DBConnection->new(host=>'localhost', db=>'cgdb', us
 $db->init || die 'Cannot connect to database';			 $db->init || die 'Cannot connect to database';

							      >	 # DISCONNECT
							      >	 $db->close;
							      >
 # SINGLE SELECT						 # SINGLE SELECT

The diffuse program offers a graphical interface and allows to merge file more easily.

TO ADD

  • init, runlevel, telinit, systemd, systemctl, service, chkconfig
  • X
  • kernel, kmod, akmod
  • bootloader, grub, mkinit
  • git, svn
  • ldd
  • chroot
  • du, df
  • script + parameters, arrays, functions
  • autojump

slash bang: make an R script executable

Place an executabe runRscript file in your home directory e.g. /home/user/.bin/runRscript:

#!/bin/bash
Rscript --vanilla $*

A sample script (executable also):

#! /bin/bash /home/user/.bin/runRscript
 
# GET THE COMMAND LINE PARAMETERS
params=commandArgs(TRUE)
 
# USAGE
if (length(params)==0) {
	print('Usage:')
	print('   sample_script.R Hello world'); 
	q();
}
 
hello=params[1]
world=params[2]
 
print(hello)
print(world)

Then run as follows:

>./sampleScript.R Hello bye


Exercises

  • How many sequences are there in the following file? SaurH.proteome.2011.fasta
  • How many residues?
  • Remove SaurH01 from the sequence IDs
  • Replace SaurH01 in the sequence IDs by StaphAureusNCTC8325
  • Get all the sequence IDs (without SaurH01)
  • Get all the sequence IDs sorted alphabetically
  • Is the process httpd running? if yes, then how many? X?
  • In one line, display YES if the process is running and NO otherwise
  • What is the value of the log_errors variable in /etc/php.ini? display only the value