Welcome to by Kuriko IWAI.

A comprehensive Machine Learning frameworks and MLOps.

This website hosts a complehensive framework on the entire machine learning lifecycle - from algorithmic deep-dives to robust MLOps exercise. Explore:

Masterclass: Build eight AI systems to master LLM techniques.
ML Research & Blogs
- Theory: Technical blogs on LLMs, Generative AI, Deep Learning, and traditional ML.
- Learning Scenario: Specialized research into Unsupervised, Reinforcement, Meta, and Online Learning.
- MLOps: Best practices on CI/CD integration, ML Lineage, and system architectures.
- LLM & NLP: Advanced LLM engineering techniques and neural architecture deep dives.
Labs: Experimentations on ML systems with walk-through tutorials and code snippets.
Solution: ML system and data pipeline engineering, AI audit services.

If your goal is X, read this first:

Getting an AI/ML engineer job 👉 Is 4-Bit All You Need? The Math Behind Modern LLM Compression

Passing a PhD interview 👉 Mastering the Bias-Variance Trade-Off: An Empirical Study of VC Dimension and Generalization Bounds

Scaling on a budget 👉 A Technical Guide to QLoRA and Memory-Efficient Fine-Tuning

Fixing broken AI responses 👉 DoLa Decoding: Mitigating LLM Hallucinations via Layer Contrast

Deploying ML securely (AWS) 👉 Scaling Securely - A Technical Deep Dive into AWS VPC Architecture for MLOps

Get 5-Point AI Security Checklist:

What's New

Aligning LLMs with Direct Preference Optimization (DPO)

Learn the fundamentals and follow a technical walkthrough with Unsloth and Llama.

Machine LearningDeep LearningData SciencePythonLLM

Reinforcement Learning from Human Feedback (RLHF) has long been the gold standard for LLM alignment, but its complexity—requiring separate reward models and unstable PPO loops—is a significant barrier.

Direct Preference Optimization (DPO) simplifies this by treating alignment as a direct classification problem.

This article breaks down the mathematical foundation of DPO, provides a hands-on implementation guide using the Unsloth framework, and explores the strategic trade-offs between DPO and traditional RLHF.

A Technical Guide to QLoRA and Memory-Efficient Fine-Tuning

Master how QLoRA enables 70B model tuning on consumer GPUs, leveraging NF4, Double Quantization, and Paged Optimizers.

Machine LearningDeep LearningData SciencePythonLLM

As Large Language Models scale, the hardware requirements for fine-tuning have become prohibitive for the average developer.

Quantized Low-Rank Adaptation (QLoRA) changes the game by shrinking VRAM requirements by over 95%.

This deep dive explores the core mechanics—NormalFloat 4 (NF4), Double Quantization, and Paged Optimizers—that allow a 70B parameter model to be tuned on a single 48GB GPU without sacrificing 16-bit performance levels.

Is 4-Bit All You Need? The Math Behind Modern LLM Compression

The Engineer’s Guide to LLM Quantization. Learn How Quantization Makes 70B Models Run on Local GPU.

Machine LearningDeep LearningLLM

An technical exploration of numerical precision in Large Language Models.

This article deconstructs standard FP32 formats and evaluates modern quantization schemes—including Integer, NormalFloat, and Microscaling—to help developers balance computational efficiency with model fidelity.

Load More Explore All Technical Blogs

Get 5-Point AI Security Checklist:

AI Engineering Masterclass

Module 1

The LLM Backbone: Building a RAG-Based GPT from Scratch

Explore the core mechanism and hands-on implementation of RAG, tokenizer, and inference logic.

PyTorchTensorHuggingFaceTransformersDecoder-only LLMCausal InferenceWARCStreamlituv

You'll Build: Website Summarizer with LLM Configuration Playground

The LLM Backbone: Building a RAG-Based GPT from Scratch

Production Goals:

Implement custom BPE tokenizer, logits adjustment, and major decoding methods.

LLM Techniques to Master:

Perform Common Crawl & Heuristic filtering.
Build a BPE tokenizer to map text to tokens.
Adjust logits via logits bias, temperature, and repetition penalty.
Interactively apply stochastic/deterministic decoding methods.
Deploy the inference via an API as a microservice.

Agentic AI framework

versionhq is a Python framework for autonomous agent networks that handle complex task automation without human interaction.

Key Features

versionhq is a Python framework designed for automating complex, multi-step tasks using autonomous agent networks.

Users can either configure their agents and network manually or allow the system to automatically manage the process based on provided task goals.

Agent Network

When multiple agents handle a task, agents will adapt to specific network formation based on the task and network complexity.

You can specify a desired formation or allow the leader to determine it autonomously (default).

	Solo Agent	Supervising	Squad	Random
Formation
Usage	A single agent with tools, knowledge, and memory. When self-learning mode is on - it will turn into Random formation.	Leader agent gives directions, while sharing its knowledge and memory. Subordinates can be solo agents or networks.	Share tasks, knowledge, and memory among network members.	A single agent handles tasks, asking help from other agents without sharing its memory or knowledge.
Use case	An email agent drafts promo message for the given audience.	The leader agent strategizes an outbound campaign plan and assigns components such as media mix or message creation to subordinate agents.	An email agent and social media agent share the product knowledge and deploy multi-channel outbound campaign.	1. An email agent drafts promo message for the given audience, asking insights on tones from other email agents which oversee other clusters. 2. An agent calls the external agent to deploy the campaign.

Looking for Solutions?

Deploying ML Systems 👉 Book a briefing session
Hiring an ML Engineer 👉 Drop an email
Learn by Doing 👉 Enroll AI Engineering Masterclass

Related Books

These books cover the wide range of ML theories and practices from fundamentals to PhD level.

Linear Algebra Done Right

Foundations of Machine Learning, second edition (Adaptive Computation and Machine Learning series)

Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications

Machine Learning Design Patterns: Solutions to Common Challenges in Data Preparation, Model Building, and MLOps