Installation¶

Code in this repository is provided under a MIT license. This documentation is provided under a CC-BY-4.0 license.

Visit our lab website here. Contact Benjamin Hogan at ben.hogan@petermac.org.

Install¶

Apptainer¶

First, install apptainer.

Run the provided apptainer image file.

apptainer pull docker://tyronechen/emumadz:latest
singularity pull docker://tyronechen/emumadz:latest

Conda¶

First, install conda, mamba or micromamba. You can find install instructions at https://mamba.readthedocs.io/en/latest/.

Install the provided conda environment.

to be written.

Manual¶

Base software¶

The critical packages and their version numbers are listed below for reference:

bcftools==1.19-gcc-13.2.0
gatk/4.5.0.0-gcc-13.2.0
pandas==2.3.0
python==3.13.3
pysam==0.16.0
samtools==1.19.2-gcc-13.2.0
snpeff==5.2
ensembl-vep==109.3

Variant annotators¶

Caution

Make sure to install the correct genome assemblies for your use case so you have the right coordinates. Having the exact version number is a lower priority, since it affects genome annotations only.

VEP¶

For a manual install, you can follow the install instructions on their github:, or use the conda / docker environments provided.

Note

The conda environment is maintained by a third party unaffiliated with the authors. If that does not work, try building from source or using the official container instead.

conda install bioconda::ensembl-vep

apptainer pull --name vep.sif docker://ensemblorg/ensembl-vep
singularity pull --name vep.sif docker://ensemblorg/ensembl-vep

Manually install the following libraries if you are running their INSTALL.pl script:

perl==5.32.1
perl-dbi==1.643
perl-archive-zip==1.6.8
perl-dbd-mysql==4.050
perl-set-intervaltree==0.12
perl-json==4.10
perl-perlio-gzip==0.20
perl-bio-bigfile==1.07
perl-list-moreutils==0.430

Regardless of container or install method used, additional setup is required for the offline cache to work. 1. First, download the cache corresponding to the latest zebrafish Zv9 assembly ENSEMBL 79. 2. Unpack, rename and move the cache to its required location. During use, you will need to provide specific options.

Note

We want the regulatory regions also, so we get the merged tarball specifically.

Caution

The cache directory defaults to ~/.vep, but suggest saving your downloaded cache files elsewhere as home directory usually has limited storage. You can then symlink the cache files to the directory. For example:

cd /place/with/storage/
mkdir .vep
mv my_cache_dir /place/with/storage/.vep/
cd ~ && ln -s /place/with/storage/.vep

# get cache
wget 'https://ftp.ensembl.org/pub/release-79/variation/VEP/danio_rerio_merged_vep_79_Zv9.tar.gz'
tar -xzvf danio_rerio_merged_vep_79_Zv9.tar.gz

# if a cache dir isnt generated, this will default to ~/.vep
# the directory must be renamed to drop the trailing _merged
mv danio_rerio_merged ~/.vep/danio_rerio

# then this should work, note that these specific options are required
# [--cache --dir_cache --species --assembly --cache_version --offline]
# this will be covered in detail in the relevant documentation section
vep \
  --cache \
  --dir_cache ~/.vep/ \
  --species danio_rerio \
  --assembly Zv9 \
  --cache_version 79 \
  --offline \
  --regulatory \
  --vcf \
  ... \
  --input_file /path/to/input.vcf \
  --output_file /path/to/output.vcf

Caution

There may be problems if there is conflict between VEP version, genome version and chromosome nomenclature. If you get any errors please check your files and follow the guidelines on the ENSEMBL-VEP website.

snpEff¶

Recommend installing through conda.

conda install bioconda::snpeff

Warning

Manually installing this software is not recommended.

If necessary, follow the instructions on the website.

You will need to download the corresponding zebrafish genome database. If install is successful, run the code below.

snpEff download Zv9.75

Visualisation module (OPTIONAL)¶

Caution

This part is more involved and is intended for developers who want to host a web server to view the data. This will also work on a local machine though.

Install npm following the instructions for your own operating system. A few linux examples are provided.

# Ubuntu/Debian
sudo apt install nodejs npm

# CentOS/RHEL
sudo yum install nodejs npm

Hint

If you get an error saying the host name cannot be resolved, try the following (at your own risk).

sudo sed -i 's/mirrorlist/#mirrorlist/g' /etc/yum.repos.d/CentOS-*
sudo sed -i 's|#baseurl=http://mirror.centos.org|baseurl=http://vault.centos.org|g' /etc/yum.repos.d/CentOS-*

Then install igv-dist.

npm install express

mkdir igv-dist
curl -o igv-dist/igv.min.js https://cdn.jsdelivr.net/npm/igv@2.15.11/dist/igv.min.js
curl -o igv-dist/igv.css https://cdn.jsdelivr.net/npm/igv@2.15.11/dist/igv.css