The fun starts here

What can we do with all these (and tons of other) tools?

  • Record their output
  • Get the error status
  • Knit them together to build “pipelines”
  • Running them over batches of files

Input/Ouput/Error streams

POSIX* has 3 standard data streams:

  • 0 STDIN: Standard Input, can send data to a program
  • 1 STDOUT: Standard Output, output data from a program
  • 2 STDERR: Standard Error, errors/warnings from a program

*Portable Operating System Interface: API in Unix

Redirecting Standard Streams

Streams can be redirected using arrows

  • > filename Redirects STDOUT (>> appends)
  • < filename Redirects STDIN
  • 2> filename Redirects STDERR
  • 2>&1 Sends STDERR to STDOUT results in one output stream
  • > /dev/null Sends output to the data Nirvana
$ ls -1 *.fastq.gz > fastq_list.txt
$ cat fastq_list.txt
sample_1.fastq.gz
sample_2.fastq.gz
[...]

$  program sample_1.fastq.gz > result.txt 2> error.txt
$  program sample_1.fastq.gz > result.txt 2>&1

Pipes

This is the Unix philosophy:

  • Write programs that do one thing and do it well
  • Write programs to work together
  • Write programs to handle text streams, because that is a universal interface.

$ cat file.txt | grep -v "^$" | sort k2,2n

$ grep ">" seq.fasta | tr -d ">" > out.txt

$ cat seq.fasta | rev | tr "ACGT" "TGCA" > rev_comp.fasta

Named pipes

Named pipes also known as FIFO:

  • First In, First Out principle
  • extension for traditional pipe concept
  • in contrast to unnamed pipes they use the filesystem
  • mkfifo creates the FIFO
  • do not consume disk space

Two separate processes can access the pipe by name one process can open it as a reader, and the other as a writer.

Named pipes example

$ mkfifo R1
$ mkfifo R2
$ yara_mapper -e 3 -t 4 -f bam y_idx reads_R1.fq reads_R2.fq | \
    samtools view -@ 4 -h -F 4 -b1 | \
    tee R1 R2 > /dev/null &
    samtools view -@ 2 -h -f 0x40 -b1 R1 > mapped_1.bam &
    samtools view -@ 2 -h -f 0x80 -b1 R2 > mapped_2.bam &
wait
rm -f R1 R2

tee reads standard input and writes it to both standard output and one or more files, effectively duplicating its input

System / Process monitoring

  • ps
    • Show all processes:
      ps -e
    • Show all processes for $USER in full format as tree:
      ps -efxu
    • Show all processes for all users in full format as tree:
      ps -afxu
  • Continuous monitoring:
    • top, atop, htop, or glances
  • Disk usage:
    • du, du -h -d <N>, du -sh
    • spacereporter (only on CePH)

Process management

  • Abort the current process: <CTRL>+<c>
  • Stop the current process: <CTRL>+<z>
  • Send job to the background
    bg [jobspec]
  • Bring job to the foreground and make it current
    fg [jobspec]
  • List jobs in current background
    jobs
  • Run job in background:
    samtools sort -o sorted.bam unsorted.bam > sort.log 2>&1 &
    or if you want the job to continue even if you close the shell
    nohup samtools sort -o sorted.bam unsorted.bam > sort.log 2>&1 &
    use disown <PID> if you forgot nohup

Process management

  • Send a signal to a process kill [-signal] <PID>
    • e.g. -TERM, -KILL, -STOP, -CONT
  • Use the k key in top or atop to send a signal to a selected process
  • Use <F9> to send signal to selected processes in htop
  • Kill all processes by name
    • killall <processname>
    • killall -i <processname>

Note:
A user may only kill processes the s*he owns, only root can kill all processes

Loops

for value in {5,10,20,50}
  do
    run_simulation --iterations=$value > ${value}_iterations.log 2>&1
  done

for value in {10..100}
  do
    run_simulation --iterations=$value > ${value}_iterations.log 2>&1
  done

for file in *txt
  do
    echo $file
    grep .sam $file | wc -l
  done

If you need to use loops, consider switching straight to nextflow instead (later lecture!)

tmux

tmux is a terminal multiplexer

  • You can start a tmux session
    • open multiple windows inside a session
    • split into rectangular panes.
  • With tmux it is easy to switch between multiple programs in one terminal
  • sessions may be detached and reattached to a different terminal
  • tmux sessions are persistent

All commands in Tmux start with a prefix, which by default is <CTRL>+b.

tmux

Basic tmux commands

$ tmux new -s <session_name>

<CTRL>+b c Create a new window (with shell)
<CTRL>+b w Choose window from a list
<CTRL>+b , Rename the current window
<CTRL>+b % Split current pane horizontally into two panes
<CTRL>+b " Split current pane vertically into two panes   <CTRL>+b o Go to the next pane
<CTRL>+b ; Toggle between the current and previous pane
<CTRL>+b x Close the current pane

List and attach tmux sessions

$ tmux ls
$ tmux a -t <session_name>

Setting up passwordless authentication for ssh

  • Generate a key pair:
    • private key: stays on your computer
    • public key: put it on the server

Using the public key the server can check that a request comes from your computer.

# generate a key pair
ssh-keygen

# copy the key pair on the server
ssh-copy-id user@zeus

Other resources