Housekeeping

From next week on, we will start with student presentations. You can either present

  • a Bioinformatics method
  • a cool R or Python package
  • your own code (we will review it together)

We will send around a google sheet where you can enter your topic. If you don’t choose a topic, we will assign one.

Command line completion

Interacting with the OS using the command line implies a lot of typing:

  • long paths to input files
  • program arguments

Typing is cumbersome and prone to errors

  • Bash offers auto completion of:
    • paths
    • programs names
    • program options
  • Command line completion is achieved by typing a partial path/program and then pressing the TAB key (to the left of Q)

Accessing history / edit commands

  • Up and down arrows keys
  • Ctrl + R: search your history

To move to the beginning of the current line, use [Ctrl][A]
To move to the end of the current line, use [Ctrl][E]
To move forward one word on the current line, use [Alt][F]
To move backwards one word on the current line, use [Alt][B]

You can also use key commands to do more than move around on the current line. They can be used to manipulate text on the current line as well.

To clear the characters on the line before the current cursor position, use [Ctrl][U]
To clear the characters on the line after the current cursor position, use [Ctrl][K]

The Unix filesystem

The Unix file system is a hierarchical tree of directories.

  • The filesystem root is: /
  • There are no drive letters as in Windows
  • Storage devices may be mounted into any place of the filesystem tree
  • No file extensions required but they are helpful (.fastq, tar.gz)

Addressing files in the filesystem

Addressing files by their full or absolute path

/home/gringo/Documents/document1.txt /home/gringo/scripts/script.sh

Addressing files by relative paths

$ cd /home/gringo/scripts
$ ls -l ../Documents/document1.txt
equals to
/home/gringo/Documents/document1.txt


$ ls -l ../../ (?)

Addressing files in the filesystem

Some shortcuts

  • ~ : tilde - the current user’s home directory
    $ cd ~/scripts
  • . : single dot - the current directory
    $ ./script.sh
  • .. : double dot - one directory above the current directory
    $ cd ../Documents
  • cd -: change to the previous directory
  • pwd : print full path of current directory
  • pushd : change + remember dir in directory stack
    $ pushd .
  • popd : return to last remembered dir in stack
  • dirs : show dir stack

Wildcards

  • Wildcard are provided by the shell

  • allow a quick addressing of multiple files in a single operation

  • Different types of wildcards

  • * … any number of characters

  • ? … single any character

  • {<a>,<b>,<..>} … choice of a,b,…

Wildcards examples

  $ ls -1 data/s*.fastq.gz
  data/sample_a.fastq.gz
  data/sample_ab.fastq.gz
  data/sample_b.fastq.gz
  data/sub_sample_x.fastq.gz
  $ ls data/sample_??.fastq.gz
  data/sample_ab.fastq.gz
  $ ls data/s*_{a,x}.fastq.gz
  data/sample_a.fastq.gz data/sub_sample_x.fastq.gz
  $ ls data/reads_R{1,2}.fastq.gz
  data/reads_R1.fastq.gz data/reads_R2.fastq.gz

Soft links / Hard links

Soft links / Hard links

Soft links / Hard links

Some basic file commands

  • cd, pwd, mkdir, rmdir
  • touch, rm, ls, mv
  • tar xvf, tar cvf, tar xzvf, tar czvf
  • gzip, gunzip, pigz, bgzip
  • cp, ln, ln -s
  • rsync
  • tree
  • which, find, locate

Viewing files

  • cat, zcat shows entire content file (compressed)
  • tac shows entire content in reverse order
  • less scroll through file
  • head shows first (10, or -<x>) lines of file
    • head file.txt or head -2 file.txt
  • tail shows last (10, or -<x>) lines of file
    • tail file.txt or tail -2 file.txt or tail -n+2
  • sort shows sorted file, type, key(s) and direction
    • sort file.txt or sort -k1,1 -k3,3n file.txt
  • diff shows differences between files
  • wc line, word, character counter

Filtering files

  • cut shows specified columns
    • cut -f2,5 file.txt or cut -d"," -f3-6 file.txt
  • grep filters file by expression
    • grep Foo file.txt
    • grep '^>' sequences.fasta or grep -v '^>' sequences.fasta
    • grep -v '^$' file.txt
  • uniq shows unique/duplicate lines only (needs sorting)
    • uniq file.txt
    • uniq -d file.txt
    • sort -u file.txt

Editing files or streams

  • sed stream editor
    • sed -i -r 's/>\ />/g' seq.fasta
  • tr translate characters
    • tr "ATGC" "TACG"
  • rev reverse strings
  • paste merge lines of files
  • awk powerful pattern scanning and processing language
  • shuf write lines of file in random order