💻🧬📝👨🏻‍🎓

Andrey Tomarovsky's CV

View My GitHub Profile

Andrey Tomarovsky

Software enthusiast in bioinformatics based in St. Petersburg, Russia

📧Email / 💬Telegram / 💬Facebook / 📜Google Schular

📝PDF CV Download

👨🏻‍🎓 Education

September, 2021 - present PhD student in Genetics. Novosibirsk State University, Novosibirsk, Russia. PhD thesis: “Obtaining genomic assemblies and phylogenetic analysis of members of the genus Martes (fam. Mustelidae)”.
September, 2019 - July 2021 MS in Bioinformatics. Saint-Petersburg State University, St. Petersburg, Russia. MS thesis: “Assembly and annotation of the sable (Martes zibellina) and pine marten (Martes martes) genomes”.
September, 2015 - July 2019 BS in Biotechnologies. Belgorod State National Research University, Belgorod, Russia.

🏆 Work experience

1) March, 2021 - present Research programmer at the Genomic Diversity Research Center, ITMO University. Conducts research on genomics of the genus Martes:

2) July, 2021 - present Technical and software support for Blastim courses:

3) December, 2020 - January, 2024 ResOps and system administration experience on computing cluster MSU FBB.

🛠 Skills

OS: Linux, Windows
Shell: Bash.
A good knowledge of the various shell tools, such as Awk, Grep, Sed
Programming: Python
Python libraries: Biopython, Matplotlib, Numpy, Pandas, Scikit-learn
Experience in writing various scripts and data visualizations in Jupyter Notebook and individual Python packages.
- Parsing data from files or websites to Pandas dataframes.
- Calculation of average, median, minimum and maximum values in datasets.
- Visualization of results using Matplotlib in the form of plots, histograms, Venn diagrams.
- A little experience in ML (kNN, clustering, linear regression)
Statistics: R
R libraries: readxl, dplyr, car, cowplot, ggplot2
Experience analyzing various datasets, such as those containing information on different types of cancer and patient survival times.
- Linear and multiple regression.
- Description and significance testing of linear models.
- Comparison of linear models.
- Testing statistical hypotheses.
Workflow managers: Snakemake
Experience in writing complex Snakemake pipelines including benchmarking, logging, task grouping and running on a compute cluster. There is experience in collaborative development.
Workload managers: Slurm, PBS.
ResOps experience on computing clusters MSU FBB, ICG, IMCB and ITMO.
- Running large-scale computational tasks using Slurm, PBS and Snakemake.
- Installation and interaction with Conda environments.
Others: - SQL (creating a database, simple and medium complexity queries)
- Circos (basic level, experience in visualization of mDNA and its coverage)
- Tcl (basic level, experience in writing module files)

📌 On The Side

Snakemake pipelines:

BuscoClade. Pipeline to construct species phylogenies using universal single-copy orthologs BUSCOs.
ITSpipe. Pipeline for the analysis of ITS sequences from the ribosomal cluster. Coverage visualization using Matplotlib and variant calling using Gatk, Pisces, and Bcftool is performed.
varcaller. Pipeline for calling genetic variants correctly. Includes visualization of coverage and calculation of PAR coordinates.

Others:

Biocrutch. A custom python package for bioinformatics research. My project contains bioinformatics scripts for genome and coverage statistics, repeats masking, determining coordinates of pseudoautosomal region, filtering 10XGenomics linked reads, PSMC date combine and others.
Bashare. The repository contains custom Bash scripts and pipelines for data processing.

📝 Grants

📝 Articles

👨🏻‍💼 Conferences

💬 Languages

Russian: Native
English: Pre-Intermediate