Andrey Tomarovsky's CV
Software enthusiast in bioinformatics based in St. Petersburg, Russia
📧Email / 💬Telegram / 💬Facebook / 📜Google Schular
September, 2021 - present | PhD student in Genetics. Novosibirsk State University, Novosibirsk, Russia. PhD thesis: “Obtaining genomic assemblies and phylogenetic analysis of members of the genus Martes (fam. Mustelidae)”. |
September, 2019 - July 2021 | MS in Bioinformatics. Saint-Petersburg State University, St. Petersburg, Russia. MS thesis: “Assembly and annotation of the sable (Martes zibellina) and pine marten (Martes martes) genomes”. |
September, 2015 - July 2019 | BS in Biotechnologies. Belgorod State National Research University, Belgorod, Russia. |
1) March, 2021 - present Research programmer at the Genomic Diversity Research Center, ITMO University. Conducts research on genomics of the genus Martes:
2) July, 2021 - present Technical and software support for Blastim courses:
3) December, 2020 - January, 2024 ResOps and system administration experience on computing cluster MSU FBB.
OS: | Linux, Windows |
Shell: | Bash. A good knowledge of the various shell tools, such as Awk, Grep, Sed |
Programming: | Python |
Python libraries: | Biopython, Matplotlib, Numpy, Pandas, Scikit-learn Experience in writing various scripts and data visualizations in Jupyter Notebook and individual Python packages. - Parsing data from files or websites to Pandas dataframes. - Calculation of average, median, minimum and maximum values in datasets. - Visualization of results using Matplotlib in the form of plots, histograms, Venn diagrams. - A little experience in ML (kNN, clustering, linear regression) |
Statistics: | R |
R libraries: | readxl, dplyr, car, cowplot, ggplot2 Experience analyzing various datasets, such as those containing information on different types of cancer and patient survival times. - Linear and multiple regression. - Description and significance testing of linear models. - Comparison of linear models. - Testing statistical hypotheses. |
Workflow managers: | Snakemake Experience in writing complex Snakemake pipelines including benchmarking, logging, task grouping and running on a compute cluster. There is experience in collaborative development. |
Workload managers: | Slurm, PBS. ResOps experience on computing clusters MSU FBB, ICG, IMCB and ITMO. - Running large-scale computational tasks using Slurm, PBS and Snakemake. - Installation and interaction with Conda environments. |
Others: | - SQL (creating a database, simple and medium complexity queries) - Circos (basic level, experience in visualization of mDNA and its coverage) - Tcl (basic level, experience in writing module files) |
Snakemake pipelines:
BuscoClade. Pipeline to construct species phylogenies using universal single-copy orthologs BUSCOs. |
ITSpipe. Pipeline for the analysis of ITS sequences from the ribosomal cluster. Coverage visualization using Matplotlib and variant calling using Gatk, Pisces, and Bcftool is performed. |
varcaller. Pipeline for calling genetic variants correctly. Includes visualization of coverage and calculation of PAR coordinates. |
Others:
Biocrutch. A custom python package for bioinformatics research. My project contains bioinformatics scripts for genome and coverage statistics, repeats masking, determining coordinates of pseudoautosomal region, filtering 10XGenomics linked reads, PSMC date combine and others. |
Bashare. The repository contains custom Bash scripts and pipelines for data processing. |
Russian: Native
English: Pre-Intermediate