Hanna Paulava



Wrocław, Poland
LinkedIn


Creativity can be boosted by mathematical exactness - I found this fact to be true while studying.
Since then I love Mathematics and like its harmony and making sense of many things in the world.
It’s great to see how the data that was a complete mess at first sight gives us some really interesting results!


Experience

Immersive Fox, R&D Specialist

August 2022 – July 2023 | Wrocław, PL (remote)

  • Building Immersive Fox main engine - a set of Deep Learning models to create “AI presenter” - a model that is trained on video data and then able to produce talking face of the person using text or audio data

  • Involved in a whole pipeline of ML team - data collecting, storing, preparing; model selection, training and fine-tuning; model deployment

  • Working in close collaboration with Web and Business teams in order to ensure fast and efficient problem solving the startup way :)

  • Partially working as a project manager for ML and Web teams

  • Meetings with investors and customers, project presentation

  • Toolbox & frameworks:

    • Python: PyTorch, opencv, numpy, audio processing libraries
    • MLOps: GCP, AWS, docker, lambda, ansible, wandb, bash scripting - involving storage solutions, instances setup for ML flows
    • Studying related works - open source libraries, arXiv articles, applying them in order to solve several use cases
    • JIRA, agile (partially as a scrum master), Notion, GitHub

Pivotics, Senior Data Scientist

August 2021 – August 2022 | Wrocław, PL

  • Deep Learning and image processing
  • Time series - logs analysis, outlier detection
  • Model deployment

More on this position:

Aug 2021 - Aug 2022

Project under NDA, Senior Data Scientist

  • Responsibilities:

    • Working with customers for identifying internal data sources with valuable data to solve the task stated
    • Data preprocessing, augmentation, automatic outlier detection pipelines
    • Building Deep Learning, statistical and time series models for image data analysis
    • Building log analysis pipeline, including data cleaning, outlier detection, clustering, semi-supervised learning methods
    • Close communication with the customer, at all stages (task statement, progress reports, model deployment).
  • Toolbox & frameworks:

    • Python: TensorFlow, Keras, scikit-learn, opencv, numpy, pandas, jupyter
    • Time series analysis, outlier detection, semi-supervised learning

Teqniksoft, Data Analyst, then Data Scientist

January 2016 – August 2021 | Minsk, BY

  • Statistical data analysis - models, pipelines
  • Machine learning - Computer Vision, Deep Learning
  • Unsupervised learning - clustering, outlier detection
  • Data pipelines - data cleaning, image preprocessing and augmentation, model deployment

More on this position:

Jul 2019 - Aug 2021

Project under NDA, Senior Data Scientist

  • Responsibilities:

    • Closely working with the customer on data collection, storage methods
    • Formulating hypothesis on what inference is possible with the data available (mostly image data / photos)
    • Data preparation, augmentation, automatic outlier detection pipelines
    • Feature engineering
    • Building Deep Learning, statistical and time series models for image data analysis
    • Results aggregation and presentation, documentation
    • Model deployment into customer’s internal systems.
  • Toolbox & frameworks:

    • Python: TensorFlow, Keras, scikit-learn, opencv, numpy, pandas, jupyter
    • GPUs
    • TensorFlow Object detection API, VGG, mask RCNN
    • Outlier detection, Unsupervised methods (PCA, clustering, variance analysis).
April 2018 - Jul 2019

Project under NDA, Data Scientist

  • Responsibilities:

    • Data preprocessing
    • Feature engineering
    • Building pipelines of image data processing with statistical methods
    • Creating deep learning models for image data of different sources
    • Fine-tuning famous deep learning models as well as developing custom pipelines
    • Results aggregation and presentation, documentation
    • Communicating with the customer.
  • Toolbox & frameworks:

    • Python: TensorFlow, Keras, scikit-learn, opencv, plotly, numpy, pandas, jupyter
    • GPUs
    • TensorFlow Object detection API
    • Outlier detection.
February 2017 - April 2018

Project under NDA, Data Analyst - Data Scientist

  • Responsibilities:

    • Data mining and data analysis of huge amounts of structured data with observations of different physical processes
    • Building and analyzing linear regression models, automation of model building process, working with feature selection methods (backward & forward subset selection), model improvement
    • Working with huge imbalance samples in classification problem
    • Graphical analysis of data
    • Performing a full stack of data analysis procedures (get & clean data, evaluate and tune analysis, present results)
    • Preparing a documentation framework
    • Automation of analysis and preparation of DA scripts sets
    • Data cleaning pipeline with advanced techniques
    • Communicating with the customer.
  • Toolbox & frameworks:

    • R and Rstudio (data.table, dplyr, caret, xgboost, lm, ggplot2)
    • SQL, Hive, Impala (through user interface), integrated database tool in R
    • Atlassian set of tools for project management (Jira, Bitbucket, Confluence)
    • Python (IPython, pandas, xgboost).
March 2016 - March 2017

Project under NDA, Data Analyst

  • Responsibilities:

    Sanity care & intelligent maintenance of databases. Development of integrated Data Analysis automatic reporting module. Data-driven research for future project’s needs. Including but not limited to:

    • Sanity care & intelligent maintenance of databases
    • Development of integrated Data Analysis automatic reporting module
    • Data-driven research for future project’s needs
    • Maintain the documentation on database design
    • Reporting on unusual data behavior and incorrect data usage
    • Constantly checking the correctness of data flow and integration
    • Communicating with the customer.
  • Toolbox & frameworks:

    • R and Rstudio
    • MongoDB
    • Regular Expressions
    • server-side analytics
    • Jenkins
    • Linux, Cron (Linux daemon).
January 2016- March 2016

Project under NDA, Junior Data Analyst

  • Responsibilities:

    Text mining improvements for project’s needs. Including but not limited to:

    • Communications with customer
    • Text mining using Regular Expressions
    • Text-specific pattern recognition.
  • Toolbox & frameworks:

    • • R and Rstudio, Excel, regular expressions.

Research Institute for Applied Problems of Mathematics and Informatics, Assistant

October 2013 – December 2015 | BSU, Minsk, BY

  • Research on autoregressive time series
  • Working for big data laboratory
  • Statistical analysis of social data (together with InData Labs)
  • Data cleaning
  • Simulations using C, C++, Python
  • Real-data applications
  • Statistical inference.

EXADEL, System QA Programmer

August 2012 – August 2013 | HTP, Minsk, BY

  • Automated tests
  • C on Linux
  • Low-level software (OS).

Side projects & Open Source

Autoranking

A side project originally owned by me - a small framework for computing sport rankings for Belarusian orienteers. Active in 2016-2022

  • Written on R with GoogleSheets as a database.
  • Can be found under this link.
Open Source
Contributing to OpenCV and Tensorflow models.

Skills

  • Data Science: Neural Networks, CNN, GAN, losses, hyperparameter tuning, autoencoders

  • Data Analysis: all the standard methods of supervised/unsupervised learning (regression, classification), outlier detection, data cleaning, time series analysis

  • Technical:

    • Python: NumPy, SciPy, scikit-learn, PyTorch, TensorFlow, Keras, Juoyter, opencv, matplotlib, plotly, librosa
    • MLOps: GCP, AWS, docker, lambda, ansible, wandb, bash scripting - involving storage solutions, instances setup for ML flows
    • R: incl. base & ggplot graphics, Rmd, other popular R packages
    • Databases: SQL, MongoDB
    • Reporting and work organization: JIRA, GitHub, , Markdown, Microsoft Office
    • Inactive knowledge: C, C++, HTML
    • Familiar: Java • Matlab • Wolfram Mathematica • Statistica
  • Human languages:

    • Belarusian, Russian (native speaker)
    • English (advanced)
    • Polish (intermediate)
    • Want to learn: Spanish, German.
  • Other soft-skills:

    • Communicative and initiative;
    • Responsible for myself and my team
    • Self-organized
    • Good at problem solving, use logic and creativity in order to find best solutions
    • Aim to solve the task, not to struggle with busy work
    • Able to work both as team player and independent player
    • Love to study and motivate myself!
    • Yes, I’m intelligent :D
  • What I love to do:

    • Tuning a model after initial experiment is done and you know the direction
    • Communications on task statements aaaaand brainstormings!
    • Data preparation for analysis, especially the cleaning and tidying part
    • Data pipelines design, model deployment
    • Documentation for reproducibility
  • What I can be bad at:

    • Defining a moment where that’s enough for exploring and time to finalize
    • Stuck to weird routines if not automatized on time (i.e. running docker manually for each experiment)
    • Sometimes i get too enthusiastic about a task that can unlikely be solved and spend time actually trying to solve it instead of moving forward

Education

2015-2016

MSc, Applied Computer Data Analysis; Belarusian State University, Faculty of Applied Mathematics and Computer Science

Studies focused on Data Analysis. Scientific research on autoregressive time series.

Thesis title: Autoregressive time series observed under classification.

2010-2015

BSc, Applied Mathematics; Belarusian State University, Faculty of Applied Mathematics and Computer Science

Department of Mathematical Modelling and Data Analysis

Finished with distinction.

Self-Education

2012
Coursera, edx, Kaggle
Mar, 2017 – 2022
OpenDataScience; meetups & community work, ML Course in data science (http://mlcourse.ai/).
Mar, 2017 – 2019

Belarus Big Data User Group

Attending and performing talks for Belarus Big Data User Group (https://www.facebook.com/groups/big.data.nerds.minsk/)

Nov, 2017

CS231n

Convolutional Neural Networks for Visual Recognition (http://cs231n.stanford.edu/).

Thesis Directions

Graduate
Time series analysis
Autoregressive processes
Statistical estimation of parameters
Estimation using ML, MM
Simulations (Python)
Computer Graphics + Practicum (R)
Undergraduate
Time series analysis
Autoregressive processes
Statistical estimation of parameters
Least Squares
Simulations (C, C++, Python)
Computer Graphics + Practicum (Wolfram Mathematica, R)

Research

BSU | Mathematical Modelling and Data Analysis Dept.

Jan 2015 – Dec 2015 | BSU, Minsk, BY

Granted task on advanced time series models – Autoregressive time series observed under classification.

BSU | Mathematical Modelling and Data Analysis Department

Jan 2014 – Dec 2014 | BSU, Minsk, BY

Granted task on advanced time series models. Report is done.

APMI Research Institute, Applied Mathematics Lab

Jan 2013 – Dec 2013 | BSU, Minsk, BY

Worked on advanced time series analysis models and methods. Completed a granted task ``Statistical analysis of discrete time series and geospatial data’’. Scientific adviser Yu. S. Kharin. Report is done.

Awards

  • 2015 – Gran Prix – FAMCS young scientists contest;
  • 2015 – 1st Prize – Republican contest of scientific works of students;
  • 2014 – Diploma – Best report of section “Mathematics” (XVI Republican scientific conference of young scientists);
  • 2014 – 2nd Prize – Republican contest of scientific works of students.

Societies

2014 - 2016 – Belarus Statistical Association (member).