Hanna Paulava
hanna.v.paulava@gmail.com
Wrocław, Poland
LinkedIn
Creativity can be boosted by mathematical exactness - I found this
fact to be true while studying.
Since then I love Mathematics and like its harmony and making sense of
many things in the world.
It’s great to see how the data that was a complete mess at first sight
gives us some really interesting results!
Experience
Immersive
Fox, R&D Specialist
August 2022 – July 2023 | Wrocław, PL (remote)
Building Immersive Fox main engine - a set of Deep Learning
models to create “AI presenter” - a model that is trained on video data
and then able to produce talking face of the person using text or audio
data
Involved in a whole pipeline of ML team - data collecting,
storing, preparing; model selection, training and fine-tuning; model
deployment
Working in close collaboration with Web and Business teams in
order to ensure fast and efficient problem solving the startup way
:)
Partially working as a project manager for ML and Web
teams
Meetings with investors and customers, project
presentation
Toolbox & frameworks:
- Python: PyTorch, opencv, numpy, audio processing libraries
- MLOps: GCP, AWS, docker, lambda, ansible, wandb, bash scripting -
involving storage solutions, instances setup for ML flows
- Studying related works - open source libraries, arXiv articles,
applying them in order to solve several use cases
- JIRA, agile (partially as a scrum master), Notion, GitHub
Pivotics, Senior
Data Scientist
August 2021 – August 2022 | Wrocław, PL
- Deep Learning and image processing
- Time series - logs analysis, outlier detection
- Model deployment
More on this position:
- Aug 2021 - Aug 2022
-
Project under NDA, Senior Data Scientist
Responsibilities:
- Working with customers for identifying internal data sources with
valuable data to solve the task stated
- Data preprocessing, augmentation, automatic outlier detection
pipelines
- Building Deep Learning, statistical and time series models for image
data analysis
- Building log analysis pipeline, including data cleaning, outlier
detection, clustering, semi-supervised learning methods
- Close communication with the customer, at all stages (task
statement, progress reports, model deployment).
Toolbox & frameworks:
- Python: TensorFlow, Keras, scikit-learn, opencv, numpy, pandas,
jupyter
- Time series analysis, outlier detection, semi-supervised
learning
Teqniksoft,
Data Analyst, then Data Scientist
January 2016 – August 2021 | Minsk, BY
- Statistical data analysis - models, pipelines
- Machine learning - Computer Vision, Deep Learning
- Unsupervised learning - clustering, outlier detection
- Data pipelines - data cleaning, image preprocessing and
augmentation, model deployment
More on this position:
- Jul 2019 - Aug 2021
-
Project under NDA, Senior Data Scientist
Responsibilities:
- Closely working with the customer on data collection, storage
methods
- Formulating hypothesis on what inference is possible with the data
available (mostly image data / photos)
- Data preparation, augmentation, automatic outlier detection
pipelines
- Feature engineering
- Building Deep Learning, statistical and time series models for image
data analysis
- Results aggregation and presentation, documentation
- Model deployment into customer’s internal systems.
Toolbox & frameworks:
- Python: TensorFlow, Keras, scikit-learn, opencv, numpy, pandas,
jupyter
- GPUs
- TensorFlow Object detection API, VGG, mask RCNN
- Outlier detection, Unsupervised methods (PCA, clustering, variance
analysis).
- April 2018 - Jul 2019
-
Project under NDA, Data Scientist
Responsibilities:
- Data preprocessing
- Feature engineering
- Building pipelines of image data processing with statistical
methods
- Creating deep learning models for image data of different
sources
- Fine-tuning famous deep learning models as well as developing custom
pipelines
- Results aggregation and presentation, documentation
- Communicating with the customer.
Toolbox & frameworks:
- Python: TensorFlow, Keras, scikit-learn, opencv, plotly, numpy,
pandas, jupyter
- GPUs
- TensorFlow Object detection API
- Outlier detection.
- February 2017 - April 2018
-
Project under NDA, Data Analyst - Data Scientist
Responsibilities:
- Data mining and data analysis of huge amounts of structured data
with observations of different physical processes
- Building and analyzing linear regression models, automation of model
building process, working with feature selection methods (backward &
forward subset selection), model improvement
- Working with huge imbalance samples in classification problem
- Graphical analysis of data
- Performing a full stack of data analysis procedures (get & clean
data, evaluate and tune analysis, present results)
- Preparing a documentation framework
- Automation of analysis and preparation of DA scripts sets
- Data cleaning pipeline with advanced techniques
- Communicating with the customer.
Toolbox & frameworks:
- R and Rstudio (data.table, dplyr, caret, xgboost, lm, ggplot2)
- SQL, Hive, Impala (through user interface), integrated database tool
in R
- Atlassian set of tools for project management (Jira, Bitbucket,
Confluence)
- Python (IPython, pandas, xgboost).
- March 2016 - March 2017
-
Project under NDA, Data Analyst
- January 2016- March 2016
-
Project under NDA, Junior Data Analyst
Research Institute for Applied
Problems of Mathematics and Informatics, Assistant
October 2013 – December 2015 | BSU, Minsk, BY
- Research on autoregressive time series
- Working for big data laboratory
- Statistical analysis of social data (together with InData Labs)
- Data cleaning
- Simulations using C, C++, Python
- Real-data applications
- Statistical inference.
EXADEL, System
QA Programmer
August 2012 – August 2013 | HTP, Minsk, BY
- Automated tests
- C on Linux
- Low-level software (OS).
Side projects & Open Source
- Autoranking
-
A side project originally owned by me - a small framework for
computing sport rankings for Belarusian orienteers. Active in
2016-2022
- Written on R with GoogleSheets as a database.
- Can be found under this link.
- Open Source
-
Contributing to OpenCV
and Tensorflow
models.
Skills
Data Science: Neural Networks, CNN, GAN, losses,
hyperparameter tuning, autoencoders
Data Analysis: all the standard methods of
supervised/unsupervised learning (regression, classification), outlier
detection, data cleaning, time series analysis
Technical:
- Python: NumPy, SciPy, scikit-learn, PyTorch,
TensorFlow, Keras, Juoyter, opencv, matplotlib, plotly, librosa
- MLOps: GCP, AWS, docker, lambda, ansible, wandb,
bash scripting - involving storage solutions, instances setup for ML
flows
- R: incl. base & ggplot graphics, Rmd, other
popular R packages
- Databases: SQL, MongoDB
- Reporting and work organization: JIRA, GitHub, ,
Markdown, Microsoft Office
- Inactive knowledge: C, C++, HTML
- Familiar: Java • Matlab • Wolfram Mathematica •
Statistica
Human languages:
- Belarusian, Russian (native speaker)
- English (advanced)
- Polish (intermediate)
- Want to learn: Spanish, German.
Other soft-skills:
- Communicative and initiative;
- Responsible for myself and my team
- Self-organized
- Good at problem solving, use logic and creativity in order to find
best solutions
- Aim to solve the task, not to struggle with busy work
- Able to work both as team player and independent player
- Love to study and motivate myself!
- Yes, I’m intelligent :D
What I love to do:
- Tuning a model after initial experiment is done and you know the
direction
- Communications on task statements aaaaand brainstormings!
- Data preparation for analysis, especially the cleaning and tidying
part
- Data pipelines design, model deployment
- Documentation for reproducibility
What I can be bad at:
- Defining a moment where that’s enough for exploring and time to
finalize
- Stuck to weird routines if not automatized on time (i.e. running
docker manually for each experiment)
- Sometimes i get too enthusiastic about a task that can unlikely be
solved and spend time actually trying to solve it instead of moving
forward
Education
- 2015-2016
-
MSc, Applied Computer Data Analysis; Belarusian
State University, Faculty of Applied Mathematics and Computer
Science
Studies focused on Data Analysis. Scientific research on
autoregressive time series.
Thesis title: Autoregressive time series observed under
classification.
- 2010-2015
-
BSc, Applied Mathematics; Belarusian State
University, Faculty of Applied Mathematics and Computer Science
Department of Mathematical Modelling and Data Analysis
Finished with distinction.
Thesis Directions
- Graduate
-
Time series analysis
Autoregressive processes
Statistical estimation of parameters
Estimation using ML, MM
Simulations (Python)
Computer Graphics + Practicum (R)
- Undergraduate
-
Time series analysis
Autoregressive processes
Statistical estimation of parameters
Least Squares
Simulations (C, C++, Python)
Computer Graphics + Practicum (Wolfram Mathematica, R)
Awards
- 2015 – Gran Prix – FAMCS young scientists
contest;
- 2015 – 1st Prize – Republican contest of scientific
works of students;
- 2014 – Diploma – Best report of section
“Mathematics” (XVI Republican scientific conference of young
scientists);
- 2014 – 2nd Prize – Republican contest of scientific
works of students.
Societies
2014 - 2016 – Belarus Statistical Association
(member).