I AM A
ML ENGINEER & NLP EXPERT

My Work

Portfolio

Welcome to my portfolio! Here, you'll find a showcase of projects that blend my professional expertise with academic insights. Each project represents a unique idea and practical application, crafted to demonstrate the skills and knowledge I've developed throughout my journey. For each project, I’ve included links to explore further on GitHub. In addition to GitHub, I’ve also recorded some walkthroughs and demos on YouTube so you can see the projects in action. I hope you find these projects inspiring, insightful, and reflective of my passion for data science and technology.

Streamlit, Tesseract OCR, LLM (Gemini), Prefect, Selenium

Resume Job Matcher

Chainforge, Llama3, Mistral, Ollama

Chainforge

Streamlit, MyMemory API, Ollama, Transformers Local TTS, Python, Regex

German Translator

spaCy, NetworkX, Sentence-Transformers Qdrant, Ollama, Streamlit

Literary GraphRAG

Gradio, FastAPI, Uvicorn, Whisper LlamaIndex, PyMuPDF, MLflow, Docker

Elderly Care Assessment Assistant

pandas, numpy, matplotlib, seaborn, sklearn, xgb

Fairness in ML

pyLDAvis, gensim, string, re

Topic extraction

spaCy, web scraping, Selenium

AUDI Google reviews scraping

transformers, spacy, numpy, pandas

AUDI Reviews Sentiment Analysis

sklearn, selenium, BeautifulSoup, nltk, matplotlib

Customer review analysis on Amazon dataset

About Me

I am an ML Engineer with a strong foundation in NLP, machine learning, and document processing. My experience spans a range of impactful projects, including fine-tuning large language models (LLMs), building Retrieval-Augmented Generation (RAG) systems, and developing semantic similarity search engines using vector databases. I have expertise in document clustering, OCR solutions, and creating advanced Named Entity Recognition (NER) models for information extraction. Additionally, I’ve implemented Explainable AI to make model outputs accessible to stakeholders and used visualization tools like Tableau to drive data insights. My technical skills extend to deploying scalable microservices with Docker and FastAPI, and managing projects with GitLab and Confluence. My thesis focused on enhancing text simplification in LegalTech using curriculum learning and the T5 model, showcasing my dedication to pushing the boundaries of NLP applications.

ML Engineer / NLP Expert

2024-Current

• Fine-tuned LLMs (GPT-4, LLaMA 2, Deepseek, BERT) for various NLP tasks
• Developed RAG and GraphRAG systems using vector databases (Qdrant/FAISS) for document search and retrieval
• Built a document clustering pipeline with OCR, layout analysis, and embedding-based similarity
• Designed an NLP microservice pipeline for chatbot deployment with retrieval and synthetic text generation
• Developed NLP microservices with FastAPI, Docker, and Streamlit for scalable model deployment
• Implemented semantic similarity search using vector databases
• Applied unsupervised document clustering to organize and cluster related documents
• Benchmarked tokenization strategies (WordPiece, Byte-Pair Encoding, SentencePiece) for structured document processing
• Developed NER models for extracting information from unstructured text
• Implemented OCR solutions for structured text extraction
• Applied Explainable AI techniques to interpret model decisions
• Collaborated with cross-functional teams to integrate AI-driven solutions

Data Scientist

2022-2023

• Fine-tuned transformers (BERT, RoBERTa, T5) for classification and emotion detection
• Applied Explainable AI techniques to interpret model insights for stakeholders and clients
• Created visualizations and dashboards with Tableau
• Analyzed dataset features for model optimization
• Web scraping across countries for automotive data using Playwright, API calls, Selenium, Beautiful Soup
• Processed PDF documents for text extraction and analysis
• Managed projects with GitLab, Confluence, Anaconda, Poetry, Pre-commit
• Integrated AWS S3 and Lambda for data storage and processing

Data Scientist Intern

2022

• Contributed to an AI-based persona project for the automotive sector using German and English datasets
• Fine-tuned transformer models (BERT, RoBERTa, T5) for text classification and emotion detection
• Prepared and processed complex datasets with Pandas for analysis
• Benchmarked ML (sklearn) and DL (PyTorch) models for customer review insights
• Analyzed customer interviews using spaCy to support persona development
• Documented code and presented findings to stakeholders

Master's Degree

2019-2023

• Completed a Master’s degree in Politics & Technology at the Technical University of Munich.
• Focused on data science and machine learning modules with an emphasis on:
• Advanced programming skills
• Hands-on, practical applications
• Gained a broad and interdisciplinary foundation, enhancing expertise in:
• Natural Language Processing (NLP)
• Machine Learning
• Developed the skills and knowledge needed to address complex, real-world challenges across multiple domains.

Thesis: Enhancing Text Simplification through Baby-Step Curriculum Learning: A Case Study on Privacy Policies Using the T5 Model
• NLP for Text Simplification: Addressed the challenges of simplifying complex texts, specifically privacy policies, using advanced NLP techniques.
• Curriculum Learning Approach: Applied a baby-step curriculum learning approach to improve text simplification, a novel technique that structures training in progressive stages.
• Model and Tools: Leveraged the T5 Transformer model, with evaluations based on BLEU, SARI, and FKGL metrics.

Familiar Tools & Technologies

A curated list of tools and technologies I frequently use, categorized based on their application areas.

🔍 LLM Training & Fine-Tuning

Transformers, SentenceTransformers, OpenAI, LangChain, TRL, Ollama, PEFT/QLoRA, DeepSeek-R1, CohereForAI

⚡ LLM Inference & Optimization

Groq LLM Inference Engine, CLIP Models, FastText, Gensim, Mixedbread Tokenizers (German Language Processing)

🗃️ Vector Databases & Semantic Search

Qdrant, FAISS, ChromaDB, Haystack, Weaviate, JinaAI

⚙️ Machine Learning & Experiment Tracking

PyTorch, PyTorch Lightning, Scikit-learn, MLFlow, DVC, ONNX

📝 OCR & Document Layout Analysis

Mistral OCR, Tesseract, HURIDOCS/pdf-document-layout-analysis, LayoutLMv3, PDFMiner, PyMuPDF, Textract, PaddleOCR

📖 NLP & Text Processing

SpaCy, NLTK, Doccano

📊 Data Visualization

Tableau, Matplotlib, Seaborn, Plotly

📐 Data Processing & Analysis

Pandas, NumPy

🌐 Web Scraping & Automation

BeautifulSoup, Selenium, Playwright, Scrapy

☁️ Cloud Infrastructure & Containers

AWS-EC2, AWS-Lambda, AWS-S3, Docker, Prefect

🚀 Backend & APIs

FastAPI, Celery, Redis, Flower

💬 Conversational AI & UI

Chainlit, Chainforge, Streamlit, Gradio

📂 Databases

SQL, PostgreSQL, MongoDB

💻 System & Workflow

Linux, Agile-Scrum

Volunteering & Mentorship

In addition to my professional work, I am passionate about giving back to the community and helping others on their career paths.

Volunteer Teaching

Volunteer teacher in Machine Learning at ReDI School, helping to bridge the gap in technology education.

Career Mentorship

If you would like a one-on-one session for career guidance, feel free to arrange a free session by sending an email to mustafa.gencc94@gmail.com.

Contact Me

Thank you for exploring my portfolio! If you have any questions or would like to connect, please feel free to reach out – I’d be happy to discuss my work or explore potential collaborations.

I AM A ML ENGINEER & NLP EXPERT