Mustafa Genc

I AM A
ML ENGINEER & NLP EXPERT

My Work

Portfolio

Welcome to my portfolio! Here, you'll find a showcase of projects that blend my professional expertise with academic insights. Each project represents a unique idea and practical application, crafted to demonstrate the skills and knowledge I've developed throughout my journey. For each project, I’ve included links to explore further on GitHub. In addition to GitHub, I’ve also recorded some walkthroughs and demos on YouTube so you can see the projects in action. I hope you find these projects inspiring, insightful, and reflective of my passion for data science and technology.

Streamlit, Tesseract OCR, LLM (Gemini), Prefect, Selenium
Resume Job Matcher
Chainforge, Llama3, Mistral, Ollama
Chainforge
Streamlit, MyMemory API, Ollama, Transformers Local TTS, Python, Regex
German Translator
spaCy, NetworkX, Sentence-Transformers Qdrant, Ollama, Streamlit
Literary GraphRAG
Gradio, FastAPI, Uvicorn, Whisper LlamaIndex, PyMuPDF, MLflow, Docker
Elderly Care Assessment Assistant
pandas, numpy, matplotlib, seaborn, sklearn, xgb
Fairness in ML
pyLDAvis, gensim, string, re
Topic extraction
spaCy, web scraping, Selenium
AUDI Google reviews scraping
transformers, spacy, numpy, pandas
AUDI Reviews Sentiment Analysis
sklearn, selenium, BeautifulSoup, nltk, matplotlib
Customer review analysis on Amazon dataset

About Me

I am an ML Engineer with a strong foundation in NLP, machine learning, and document processing. My experience spans a range of impactful projects, including fine-tuning large language models (LLMs), building Retrieval-Augmented Generation (RAG) systems, and developing semantic similarity search engines using vector databases. I have expertise in document clustering, OCR solutions, and creating advanced Named Entity Recognition (NER) models for information extraction. Additionally, I’ve implemented Explainable AI to make model outputs accessible to stakeholders and used visualization tools like Tableau to drive data insights. My technical skills extend to deploying scalable microservices with Docker and FastAPI, and managing projects with GitLab and Confluence. My thesis focused on enhancing text simplification in LegalTech using curriculum learning and the T5 model, showcasing my dedication to pushing the boundaries of NLP applications.

ML Engineer / NLP Expert
2024-Current

• Fine-tuned LLMs (GPT-4, LLaMA 2, Deepseek, BERT) for various NLP tasks
• Developed RAG and GraphRAG systems using vector databases (Qdrant/FAISS) for document search and retrieval
• Built a document clustering pipeline with OCR, layout analysis, and embedding-based similarity
• Designed an NLP microservice pipeline for chatbot deployment with retrieval and synthetic text generation
• Developed NLP microservices with FastAPI, Docker, and Streamlit for scalable model deployment
• Implemented semantic similarity search using vector databases
• Applied unsupervised document clustering to organize and cluster related documents
• Benchmarked tokenization strategies (WordPiece, Byte-Pair Encoding, SentencePiece) for structured document processing
• Developed NER models for extracting information from unstructured text
• Implemented OCR solutions for structured text extraction
• Applied Explainable AI techniques to interpret model decisions
• Collaborated with cross-functional teams to integrate AI-driven solutions

Data Scientist
2022-2023

• Fine-tuned transformers (BERT, RoBERTa, T5) for classification and emotion detection
• Applied Explainable AI techniques to interpret model insights for stakeholders and clients
• Created visualizations and dashboards with Tableau
• Analyzed dataset features for model optimization
• Web scraping across countries for automotive data using Playwright, API calls, Selenium, Beautiful Soup
• Processed PDF documents for text extraction and analysis
• Managed projects with GitLab, Confluence, Anaconda, Poetry, Pre-commit
• Integrated AWS S3 and Lambda for data storage and processing

Data Scientist Intern
2022

• Contributed to an AI-based persona project for the automotive sector using German and English datasets
• Fine-tuned transformer models (BERT, RoBERTa, T5) for text classification and emotion detection
• Prepared and processed complex datasets with Pandas for analysis
• Benchmarked ML (sklearn) and DL (PyTorch) models for customer review insights
• Analyzed customer interviews using spaCy to support persona development
• Documented code and presented findings to stakeholders

Master's Degree
2019-2023

• Completed a Master’s degree in Politics & Technology at the Technical University of Munich.
• Focused on data science and machine learning modules with an emphasis on:
• Advanced programming skills
• Hands-on, practical applications
• Gained a broad and interdisciplinary foundation, enhancing expertise in:
Natural Language Processing (NLP)
Machine Learning
• Developed the skills and knowledge needed to address complex, real-world challenges across multiple domains.

Thesis: Enhancing Text Simplification through Baby-Step Curriculum Learning: A Case Study on Privacy Policies Using the T5 Model
NLP for Text Simplification: Addressed the challenges of simplifying complex texts, specifically privacy policies, using advanced NLP techniques.
Curriculum Learning Approach: Applied a baby-step curriculum learning approach to improve text simplification, a novel technique that structures training in progressive stages.
Model and Tools: Leveraged the T5 Transformer model, with evaluations based on BLEU, SARI, and FKGL metrics.

Familiar Tools & Technologies

A curated list of tools and technologies I frequently use, categorized based on their application areas.

πŸ” LLM Training & Fine-Tuning

Transformers, SentenceTransformers, OpenAI, LangChain, TRL, Ollama, PEFT/QLoRA, DeepSeek-R1, CohereForAI

⚑ LLM Inference & Optimization

Groq LLM Inference Engine, CLIP Models, FastText, Gensim, Mixedbread Tokenizers (German Language Processing)

πŸ—ƒοΈ Vector Databases & Semantic Search

Qdrant, FAISS, ChromaDB, Haystack, Weaviate, JinaAI

βš™οΈ Machine Learning & Experiment Tracking

PyTorch, PyTorch Lightning, Scikit-learn, MLFlow, DVC, ONNX

πŸ“ OCR & Document Layout Analysis

Mistral OCR, Tesseract, HURIDOCS/pdf-document-layout-analysis, LayoutLMv3, PDFMiner, PyMuPDF, Textract, PaddleOCR

πŸ“– NLP & Text Processing

SpaCy, NLTK, Doccano

πŸ“Š Data Visualization

Tableau, Matplotlib, Seaborn, Plotly

πŸ“ Data Processing & Analysis

Pandas, NumPy

🌐 Web Scraping & Automation

BeautifulSoup, Selenium, Playwright, Scrapy

☁️ Cloud Infrastructure & Containers

AWS-EC2, AWS-Lambda, AWS-S3, Docker, Prefect

πŸš€ Backend & APIs

FastAPI, Celery, Redis, Flower

πŸ’¬ Conversational AI & UI

Chainlit, Chainforge, Streamlit, Gradio

πŸ“‚ Databases

SQL, PostgreSQL, MongoDB

πŸ’» System & Workflow

Linux, Agile-Scrum

Volunteering & Mentorship

In addition to my professional work, I am passionate about giving back to the community and helping others on their career paths.

Volunteer Teaching

Volunteer teacher in Machine Learning at ReDI School, helping to bridge the gap in technology education.

Career Mentorship

If you would like a one-on-one session for career guidance, feel free to arrange a free session by sending an email to mustafa.gencc94@gmail.com.

Contact Me

Thank you for exploring my portfolio! If you have any questions or would like to connect, please feel free to reach out – I’d be happy to discuss my work or explore potential collaborations.