What can we do with all these (and tons of other) tools?
- Record their output
- Get the error status
- Knit them together to build “pipelines”
- Running them over batches of files
What can we do with all these (and tons of other) tools?
POSIX* has 3 standard data streams:
STDIN: Standard Input, can send data to a programSTDOUT: Standard Output, output data from a programSTDERR: Standard Error, errors/warnings from a program*Portable Operating System Interface: API in Unix
Streams can be redirected using arrows
> filename Redirects STDOUT (>> appends)< filename Redirects STDIN2> filename Redirects STDERR2>&1 Sends STDERR to STDOUT results in one output stream> /dev/null Sends output to the data Nirvana$ ls -1 *.fastq.gz > fastq_list.txt $ cat fastq_list.txt sample_1.fastq.gz sample_2.fastq.gz [...] $ program sample_1.fastq.gz > result.txt 2> error.txt $ program sample_1.fastq.gz > result.txt 2>&1
This is the Unix philosophy:
$ cat file.txt | grep -v "^$" | sort k2,2n
$ grep ">" seq.fasta | tr -d ">" > out.txt
$ cat seq.fasta | rev | tr "ACGT" "TGCA" > rev_comp.fasta
Named pipes also known as FIFO:
mkfifo creates the FIFOTwo separate processes can access the pipe by name one process can open it as a reader, and the other as a writer.
$ mkfifo R1
$ mkfifo R2
$ yara_mapper -e 3 -t 4 -f bam y_idx reads_R1.fq reads_R2.fq | \
samtools view -@ 4 -h -F 4 -b1 | \
tee R1 R2 > /dev/null &
samtools view -@ 2 -h -f 0x40 -b1 R1 > mapped_1.bam &
samtools view -@ 2 -h -f 0x80 -b1 R2 > mapped_2.bam &
wait
rm -f R1 R2
tee reads standard input and writes it to both standard output and one or more files, effectively duplicating its input
ps
ps -e$USER in full format as tree:ps -efxups -afxutop, atop, htop, or glancesdu, du -h -d <N>, du -shspacereporter (only on CePH)<CTRL>+<c><CTRL>+<z>bg [jobspec]fg [jobspec]jobssamtools sort -o sorted.bam unsorted.bam > sort.log 2>&1 &nohup samtools sort -o sorted.bam unsorted.bam > sort.log 2>&1 &disown <PID> if you forgot nohup kill [-signal] <PID>
-TERM, -KILL, -STOP, -CONTk key in top or atop to send a signal to a selected process<F9> to send signal to selected processes in htopkillall <processname>killall -i <processname>Note:
A user may only kill processes the s*he owns, only root can kill all processes
for value in {5,10,20,50}
do
run_simulation --iterations=$value > ${value}_iterations.log 2>&1
done
for value in {10..100}
do
run_simulation --iterations=$value > ${value}_iterations.log 2>&1
done
for file in *txt
do
echo $file
grep .sam $file | wc -l
done
If you need to use loops, consider switching straight to nextflow instead (later lecture!)
tmux is a terminal multiplexer
tmux session
tmux it is easy to switch between multiple programs in one terminaltmux sessions are persistentAll commands in Tmux start with a prefix, which by default is <CTRL>+b.
Basic tmux commands
$ tmux new -s <session_name>
<CTRL>+b c Create a new window (with shell)<CTRL>+b w Choose window from a list<CTRL>+b , Rename the current window<CTRL>+b % Split current pane horizontally into two panes<CTRL>+b " Split current pane vertically into two panes <CTRL>+b o Go to the next pane<CTRL>+b ; Toggle between the current and previous pane<CTRL>+b x Close the current pane
List and attach tmux sessions
$ tmux ls $ tmux a -t <session_name>
Using the public key the server can check that a request comes from your computer.
# generate a key pair ssh-keygen # copy the key pair on the server ssh-copy-id user@zeus