Advanced Cross-Validation for Sequential Data: A Guide to Avoiding Data Leakage
Improve generalization capabilities while keeping data in order
By Kuriko IWAI

Table of Contents
IntroductionWhat is Cross Validation (CV)Introduction
Cross-validation is an effective technique to prevent overfitting and provides a more reliable estimate of a model’s performance on unseen data.
When applied to sequential data, it has risks like data leakage and autocorrelation, which can lead to over-optimistic performance estimates.
In this article, I’ll explore cross-validation methods for time series data, including specialized cross-validation techniques with practical implementation on PyTorch GRU and Scikit-Learn SVR.
What is Cross Validation (CV)
Cross-validation (CV) is a statistical technique to evaluate generalization capabilities of a machine learning model.
CV first partitions the original dataset into training sets and validation (or test) sets.
Then, it repeatedly trains a model on a different training set and validates its performance on a separate validation set, making the evaluation more reliable than a single train-test split.
◼ When to Apply Cross-Validation
By repeating these train-validation cycles, CV can prevent overfitting while mitigating biases of the model.
Best practices in applying CV include:
Model selection or tuning: Raises confidence in selection/tuning results by using a small-scale cross-validation,
Sequential data analysis where a single random holdout split might lead to data leakage, and
Highly imbalanced target variables in classification tasks where a random single split might result in extremely imbalanced classes, which biases the model. Aside from many data argumentation techniques, cross-validation can mitigate bias in model performance.
On the other hand, CV is not necessary when:
Limited risks for models to be biased because large, non-sequential training set is available,
The model has already showed stable loss history, achieving competitive generalization capability, and
Need to save computational cost like running a quick initial experimentation.
◼ Major Cross Validation Methods
There are many CV methods, each of which has its own unique way to partition original data.
The diagram below compares common methods of K-fold, Stratified K-fold, and LOOCV:

Kernel Labs | Kuriko IWAI | kuriko-iwai.com
Figure A. Comparing data partition in K-fold based CV methods (Created by Kuriko IWAI)
In the diagram:
K-Fold CV (left) evaluates model performance based on the average results of the K equal-sized folds,
Stratified K-Fold CV (middle) is used for a variation of K-fold CV for classification problems, and
Leave-One-Out CV (LOOCV) (right) is another variation of K-fold CV where K is equal to the number of samples in the dataset so that each sample is used as a validation set exactly once.
These K-fold concepts are applicable to sequential data, but their effectiveness depends on the core principals.
Let us explore more in the next section.
Core Principals for Cross-Validating Sequential Data
When running cross-validation on sequential data, the most critical consideration is to avoid data leakage.
◼ Data Leakage
Data leakage is a problem in machine learning where information from outside the training dataset is used to create (train) the model.
This external information leakage allows a model to cheat during training or validation, but the model cannot achieve the same performance on unseen data.
There are two main types of data leakage:
Direct Leakage:
A simple situation where the target variable to predict is included in training data as a feature.
i.e., A model to predict home sale prices was trained on data including the actual home sale price as a feature.
Indirect Leakage:
A feature highly correlated with the target variable, yet unknown at the time of prediction is included in training data.
i.e., A model to predict home sale prices was trained on data including the current property tax rate as a feature.
The leakage here is that the future property tax rate is unknown when the model predicts the sales price.
The below diagram showcases how the leakage flows:

Kernel Labs | Kuriko IWAI | kuriko-iwai.com
Figure B. Data leakage on sales price prediction (Created by Kuriko IWAI)
First, the model is trained on input features including the property tax.
The model predicts home sales price for $450,000 using the current property tax rate.
The home is sold for $500,000.
The local government updates the assessed value of the home based on the sale price.
The new property tax is calculated based on this new, higher assessed value.
The model is then trained with this new property tax value as a feature.
The model mistakenly learns that a high tax rate means a high sale price.
This learning is inaccurate because the correlation of the tax rate and the sales price exists because the tax rate was calculated after the sale.
So, the tax rate shouldn’t be included in training data, and the model focuses on predicting the pre-tax sales price.
◼ Preventing Data Leakage in Sequential Data
In sequential data, data leakage happens when the information from the future (validation data) directly or indirectly combines with the past information (training data).
To avoid the leakage, we:
Must maintain temporal order: Ensure the sequence of events is preserved.
Use time-series specific validation: Employ validation methods designed for sequential data.
Prevent autocorrelation: Avoid situations where data points in your training and validation sets that are close in time correlate with one another.
Satisfying Condition 1 is mandatory, and depending on the data type, Conditions 2 and 3 also need to be met.
Let us explore more in the next section.
1. Maintaining Temporal Order
When working with sequential data, it’s crucial to preserve its temporal order.
Even when using simple CV methods that aren’t specific to time series, we must sort the data and avoid shuffling it.
◼ Single Train-Test Split (Holdout)
This is the simplest approach where the dataset is divided into two segments: a training set (the first part) and a validation set (the last part).
The model is trained once on the historical data and evaluated on the validation set:

Kernel Labs | Kuriko IWAI | kuriko-iwai.com
Figure C-1. Data partition image (Created by Kuriko IWAI)
Best When:
- Large dataset is available, and a single evaluation is sufficient enough to mitigate the bias.
Disadvantage:
- Underperform if the validation period is not representative of future data.
◼ Monte Carlo Cross-Validation
Monte Carlo CV randomly selects data for each fold, assuming that
The data points are independent, and
Their statistical properties like mean, mod, or median do not change over time.
When these conditions are met, random sampling does not break the underlying data structure, making the method a valid alternative to traditional time-series cross-validation.

Kernel Labs | Kuriko IWAI | kuriko-iwai.com
Figure C-2. Data partition image (Created by Kuriko IWAI)
Best When:
- Stationary data without autocorrelation (i.e., tossing coins multiple times).
Disadvantage:
- Completely breaks the temporal order when applied to non-stationary time series.
◼ Blocked K-Fold Cross-Validation
A modified version of the K-fold where the data is not shuffled.
Data is divided into contiguous blocks (folds), and each block is used as the validation set in turn.

Kernel Labs | Kuriko IWAI | kuriko-iwai.com
Figure C-3. Data partition image (Created by Kuriko IWAI)
Best When:
Non-stationary data without autocorrelation.
Only small size samples are available, and need to use all the samples for training/validation.
Disadvantage:
- Data with strong autocorrelation has data leakage risks.
2. Time-Series Specific Validation
The second principal is to use time-series specific validation methods.
These methods are specially designed to maintain temporal order of data by focusing on validating model performance in forward-looking scenarios.
◼ Time Series Cross-Validation (“Growing Window”)
A sequential approach where each fold’s training set expands to include all previous data.
The model is retrained on this ever-growing history and validated on a fixed-size block of subsequent data.

Kernel Labs | Kuriko IWAI | kuriko-iwai.com
Figure D-1. Data partition image (Created by Kuriko IWAI)
Best When:
Model performance benefits from more data.
Simulate a production environment where the model is regularly updated with new information.
Disadvantage:
- Computationally expensive for later folds due to the increasing training set size.
◼ Walk-Forward Validation (“Rolling Window” or “Sliding Window”)
Walk-forward method shares the same concept of the growing window method, while training and validation window sizes remain constant.
As the validation window moves forward, old training data is discarded.
This method is common method in cross-validating sequential data.

Kernel Labs | Kuriko IWAI | kuriko-iwai.com
Figure D-2. Data partition image (Created by Kuriko IWAI)
Best When:
- Extremely long sequential data where older temporal data is less relevant for future predictions.
Disadvantage:
Computationally expensive because the model is retrained each step of the rolling window.
Discarding older data could lose valuable long-term trends or seasonal patterns.
3. Preventing Autocorrelation
Autocorrelation is typical data leakage for sequential data.
Some CV methods are designed to prevent autocorrelation by intentionally adding gaps between training sets and validation sets.
◼ Time Series Cross-Validation with a Gap (“Gap“)
The method is a validation of time-series specific CV methods with a gap between the training and validation sets.
This gap helps independence between the training and validation sets.

Kernel Labs | Kuriko IWAI | kuriko-iwai.com
Figure E-1. Data partition image (Created by Kuriko IWAI)
Best When:
- Ensure a strict separation between training and validation data to avoid data leakage.
Disadvantage:
- Gaps leave some training data unused. When the data is small, a model might underfit.
◼ hv-Blocked K-Fold Cross-Validation
hv-Blocked K-fold is an advanced form of blocked validation with a time gap between training and validation blocks.

Kernel Labs | Kuriko IWAI | kuriko-iwai.com
Figure E-2. Data partition image (Created by Kuriko IWAI)
Best When:
- Data has strong autocorrelation, and need rigorous form of blocked validation to prevent leakage.
Disadvantage:
- The gaps leave some training data unused. When the data is small, a model might underfit.
◼ Purged & Embargo Cross-Validation
Purged & Embargo method is designed to prevent data leakage.
It “purges” (removes) training data points that are too close to the validation period, and then applies an “embargo” by removing training data after the validation period that could be affected by future information.
Best When:
- The time series has high autocorrelation and strict prevention of data leakage is paramount (i.e., finance)
Disadvantage:
- Some training data is left unused because of gaps, which could lead to underfitting (Compared to the K-fold based methods, the amount of data discarded could be large).
That’s all for cross-validation methods for sequential data.
The choice of method is heavily influenced by the data type.
In the next section, I’ll explore how model sensitivities impact performance after cross-validation.
Simulation
In this section, I’ll explore how each cross-validation method works on:
A GRU (Gated Recurrent Units) network built on PyTorch and
A simpler SVR (Support Vector Regression) on Scikit-Learn.
All code is available in my Github Repo: GRU / SVR
◼ Creating Original Dataset
First, I loaded and engineered CSV data from the UC Irvine Machine Learning repository:

Kernel Labs | Kuriko IWAI | kuriko-iwai.com
Figure F. Screenshot of the variable table (source)
In this project, I’ll predict traffic_volume using the time series data like rain_1h, date_time:
1import pandas as pd
2
3# load data as df
4file_path = 'data/Metro_Interstate_Traffic_Volume.csv'
5df = pd.read_csv(file_path, sep=',')
6
7# add dt related features (critical step for gru)
8df['date_time'] = pd.to_datetime(df['date_time'])
9df['year'] = df['date_time'].dt.year
10df['month'] = df['date_time'].dt.month
11df['hour'] = df['date_time'].dt.hour
12df['day_of_week'] = df['date_time'].dt.dayofweek # cat (0 to 6)
13df['is_weekend'] = df['day_of_week'].isin([5, 6])
14df['is_holiday'] = df['holiday'].notna()
15
16# drop unnecessary columns
17df = df.drop(columns=['holiday', 'weather_description', 'date_time'], axis=1)
18
19# create input and target vars
20target_col = 'traffic_volume'
21y = df[target_col]
22X = df.drop(target_col, axis=1)
23
24
25# split the data into two groups: train and test sets
26from sklearn.model_selection import train_test_split
27
28X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)
29
To preserve the temporal order, original data should not be shuffled when creating the training and test sets.
X_test is for assessing generalization capabilities. It must not be used during training / validation phase to avoid data leakage.
Lastly, transform the input features:
1from sklearn.impute import SimpleImputer
2from sklearn.preprocessing import StandardScaler
3from sklearn.compose import ColumnTransformer
4from sklearn.pipeline import Pipeline
5from category_encoders import BinaryEncoder # type: ignore
6
7# classify numerical and categorical features
8cat_cols, num_cols = [], []
9for col in df.columns.to_list():
10 if col == target_col: pass
11 else:
12 if df[col].dtype == 'object' or df[col].dtype == 'bool': cat_cols.append(col)
13 else: num_cols.append(col)
14
15# define column transformer
16num_transformer = Pipeline(steps=[
17 ('imputer', SimpleImputer(strategy='median')),
18 ('scaler', StandardScaler())]
19)
20cat_transformer = Pipeline(steps=[('encoder', BinaryEncoder(cols=cat_cols))])
21preprocessor = ColumnTransformer(
22 transformers=[
23 ('num', num_transformer, num_cols),
24 ('cat', cat_transformer, cat_cols)
25 ],
26 remainder='passthrough'
27)
28
29# transform the input features
30X_train = preprocessor.fit_transform(X_train)
31X_test = preprocessor.transform(X_test)
32
The final training set has 38,563 samples with 16 input features.
◼ Defining the GRU Model
Next, I defined the GRU class on the PyTorch library:
1import torch.nn as nn
2
3# define a simple gru model (many-to-one architecture)
4class GRU(nn.Module):
5 def __init__(self, input_size=X_train.shape[1], hidden_size=64, output_size=1):
6 super(GRU, self).__init__()
7 self.gru = nn.GRU(input_size=input_size, hidden_size=hidden_size, batch_first=True)
8 self.fc = nn.Linear(hidden_size, output_size)
9
10 def forward(self, x):
11 h_gru, _ = self.gru(x)
12 o_final = self.fc(h_gru[:, -1, :])
13 return o_final
14
The GRU class is on a simple many-to-one architecture where the output by time step is combined into one final output.
◼ Running Cross-Validation
Next, I defined the train_and_validate function where X_train data is split into five folds based on a selected cross-fold method, and the model is trained and validated using these folds.
Following the the best practice in cross-validation, the function initiates the optimizer and the model in every fold.
For comparison, I used the same params across all methods:
num_epochs: The number of training epochs (set as 300)
lr: The learning rate of the optimizer (set as 0.001)
num_folds: The number of folds created by the cross-validation method (set as 5), and
test_size: The size of test (validation) data from the training data (set as 0.2 (20%)).
1import numpy as np
2import torch
3from sklearn.model_selection import KFold, TimeSeriesSplit, train_test_split
4from tqdm import tqdm
5
6# test cross val methods
7def train_and_validate(
8 validation_method,
9 num_epochs,
10 lr,
11 X_train=X_train, y_train=y_train, # split X_train into folds based on the given cross val method
12 num_folds=5,
13 test_size=0.2
14 ) -> dict:
15
16 # recording loss history
17 fold_train_losses = []
18 fold_val_losses = []
19 model = None
20
21 # define splits based on the chosen validation method
22 match validation_method:
23 case "Holdout":
24 train_size = int((1 - test_size) * len(X_train))
25 train_indices = np.arange(train_size)
26 val_indices = np.arange(train_size, len(X_train))
27 splits = [(train_indices, val_indices)]
28
29 case "Monte Carlo":
30 splits = []
31 for _ in range(num_folds):
32 train_indices, val_indices = train_test_split(
33 np.arange(len(X_train)),
34 test_size=test_size,
35 shuffle=True
36 )
37 splits.append((train_indices, val_indices))
38
39 case "K-Fold":
40 kf = KFold(n_splits=num_folds, shuffle=True, random_state=42)
41 splits = kf.split(X_train)
42
43 case "Blocked K-Fold":
44 kf_blocked = KFold(n_splits=num_folds, shuffle=False)
45 splits = kf_blocked.split(X_train)
46
47 case "Growing Window":
48 tss = TimeSeriesSplit(n_splits=num_folds)
49 splits = tss.split(X_train)
50
51 case "Sliding Window":
52 splits = []
53 window_size = int(len(X_train) / (num_folds + 1))
54 for i in range(num_folds):
55 train_start = i * window_size
56 train_end = train_start + window_size
57 val_start = train_end
58 val_end = val_start + window_size
59 if val_end > len(X_train):
60 val_end = len(X_train)
61 splits.append(
62 (np.arange(train_start, train_end), np.arange(val_start, val_end))
63 )
64
65 case "Gap":
66 splits = []
67 tss_gap = TimeSeriesSplit(n_splits=num_folds)
68 for train_idx, val_idx in tss_gap.split(X_train):
69 gap_size = int(0.1 * len(val_idx))
70 train_idx = train_idx[:-gap_size]
71 splits.append((train_idx, val_idx))
72
73 case "hv-Blocked K-Fold":
74 splits = []
75 kf_blocked_gap = KFold(n_splits=num_folds, shuffle=False)
76 for train_idx, val_idx in kf_blocked_gap.split(X_train):
77 gap_size = int(0.1 * len(val_idx))
78 train_idx = train_idx[:-gap_size]
79 splits.append((train_idx, val_idx))
80
81 case _:
82 raise ValueError(f"Unknown validation method: {validation_method}")
83
84
85 # training and validation loop
86 for fold, (train_idx, val_idx) in enumerate(tqdm(splits, desc=f"Training with {validation_method}")):
87 # define model, optimizer, and loss function (initialize a new model and optimizer for each fold )
88 model = GRU(hidden_size=64, output_size=1)
89 optimizer = torch.optim.Adam(model.parameters(), lr=lr)
90 criterion = nn.MSELoss()
91
92 # create training and validation X_train and tensor X_train loader
93 train_idx = range(int(0.8 * len(X_train)))
94 val_idx = range(int(0.8 * len(X_train)), len(X_train))
95
96 # separate features X and target y for training and validation for cv
97 X_train_cv, y_train_cv = X_train[train_idx, :], y_train[train_idx]
98 X_val_cv, y_val_cv = X_train[val_idx, :], y_train[val_idx]
99
100 # convert numpy arrays to pytorch tensors
101 X_train_cv = torch.from_numpy(X_train_cv).float()
102 y_train_cv = torch.from_numpy(y_train_cv.values).float()
103 X_val_cv = torch.from_numpy(X_val_cv).float()
104 y_val_cv = torch.from_numpy(y_val_cv.values).float()
105
106 # create the tensor dataset and data loader
107 train_dataset = torch.utils.data.TensorDataset(X_train_cv, y_train_cv)
108 train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=False)
109
110 val_dataset = torch.utils.data.TensorDataset(X_val_cv, y_val_cv)
111 val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=32, shuffle=False)
112
113 # start validation
114 fold_train_loss_history = []
115 fold_val_loss_history = []
116 for _ in range(num_epochs):
117 # training loop
118 model.train()
119 train_loss = 0
120 for X_batch, y_batch in train_loader:
121 X_batch = X_batch.unsqueeze(1)
122 y_batch = y_batch.unsqueeze(1)
123
124 outputs = model(X_batch)
125 loss = criterion(outputs, y_batch)
126 optimizer.zero_grad()
127 loss.backward()
128 optimizer.step()
129 train_loss += loss.item()
130
131 avg_train_loss = train_loss / len(train_loader)
132 fold_train_loss_history.append(avg_train_loss)
133
134 # validation
135 model.eval()
136 val_loss = 0
137 with torch.inference_mode():
138 for X_batch, y_batch in val_loader:
139 X_batch = X_batch.unsqueeze(1)
140 y_batch = y_batch.unsqueeze(1)
141 outputs = model(X_batch)
142 val_loss += criterion(outputs, y_batch).item()
143
144 avg_val_loss = val_loss / len(val_loader) if len(val_loader) > 0 else 0
145 fold_val_loss_history.append(avg_val_loss)
146
147 fold_train_losses.append(fold_train_loss_history)
148 fold_val_losses.append(fold_val_loss_history)
149
150 # after completing cv loop, retrain the model with the entire X_train / y_train set
151 if model is not None:
152 model.eval()
153
154 # convert numpy arrays to pytorch tensors
155 X_train = torch.from_numpy(X_train).float()
156 y_train = torch.from_numpy(y_train.values).float()
157
158 train_dataset = torch.utils.data.TensorDataset(X_train, y_train)
159 train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=False)
160
161 for _ in range(num_epochs):
162 model.train()
163 for X_batch, y_batch in train_loader:
164 optimizer.zero_grad()
165 X_batch = X_batch.unsqueeze(1)
166 y_batch = y_batch.unsqueeze(1)
167 outputs = model(X_batch)
168 loss = criterion(outputs, y_batch)
169 loss.backward()
170 optimizer.step()
171
172
173 return {
174 'model': model,
175 "fold_train_losses": fold_train_losses,
176 "fold_val_losses": fold_val_losses,
177 "average_train_loss": np.mean(fold_train_losses),
178 "average_val_loss": np.mean(fold_val_losses)
179 }
180
◼ Performing Inference
After training, the model performed inference on a new, unseen data (X_test).
Losses are recorded to assess the model’s generalization capability.
1# convert X_test (numpy) to torch data
2X_test_float = torch.from_numpy(X_test).float()
3y_test_float = torch.from_numpy(y_test.values).float()
4
5# create test loader
6test_dataset = torch.utils.data.TensorDataset(X_test_float, y_test_float)
7test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=32, shuffle=False)
8
9# perform inference
10model.eval()
11test_loss = 0
12criterion = nn.MSELoss()
13with torch.inference_mode():
14 for X_batch, y_batch in test_loader:
15 X_batch = X_batch.unsqueeze(1)
16 y_batch = y_batch.unsqueeze(1)
17 outputs = model(X_batch)
18 test_loss += criterion(outputs, y_batch).item()
19
20# compute average loss (MSE)
21ave_test_loss = test_loss / len(test_loader)
22
◼ Results
▫ 1. GRU
Blocked K-fold achieved the best generalization loss (MSE) across all.
Each graph below plots CV losses (for all folds), the average loss (blue line), and the generalization loss over X_test (red vertical line) by CV method.
The colored area indicates how well the model generalizes its learning (smaller is better).
Overfitting happens when the average CV loss (blue line) overturns the generalization loss (red line).

Kernel Labs | Kuriko IWAI | kuriko-iwai.com
Figure G. Comparison of loss histories by cross-validation methods (GRU) (Created by Kuriko IWAI)
Blocked K-fold achieved the best generalization result, followed by K-fold CV with a Gap, but both start to overfitting at around 150th epoch. Implementing early stopping would further refine the results.
Growing Window achieved well balanced results, generalizing the learning well to avoid overfitting while minimizing the generalization error.
Holdout and Monte Carlo methods show the most severe overfitting to the training setup, resulting in extremely high test errors (large pink area). For this specific dataset and the model, these two are unsuitable CV methods.
▫ 2. SVR
Sliding Window achieved the best performance across all.
Each graph below plots the CV losses across five folds, showing the average CV loss (blue line) and the generalization loss (red line) for each CV method.
Similar to the GRU, discrepancy between the blue line and red lines indicates overfitting.

Kernel Labs | Kuriko IWAI | kuriko-iwai.com
Figure H. Comparison of loss histories by cross-validation methods (SVR) (Created by Kuriko IWAI)
Sliding Window achieved the best accuracy with the lowest average MSE (0.6149) and high stability. This implies that with SVR, using a fixed-size training window is the most effective way to tackle local autocorrelation in this dataset.
Growing Window and hv-Blocked K-Fold showed the worst-case errors in some folds, indicating that the model can be prone to severe overfitting to past data when trained on these methods.
Other standard methods like Holdout, Monte Carlo, and K-Fold performed good generalization capabilities but their losses remained high, indicating that they couldn’t learn much compared to the Sliding Window.
Wrapping Up
Cross-validation is a powerful technique for evaluating time series models, as it helps them generalize effectively and avoid overfitting by mirroring the structure of the original data.
In our simulation, we learned that choosing the right CV technique for the model and data type matters when we assess generalization capabilities of the model.
The goal is to develop a model that performs best in production, and cross-validation is key for achieving this by simulating real-world data.
Continue Your Learning
If you enjoyed this blog, these related entries will complete the picture:
Deep Dive into Recurrent Neural Networks (RNN): Mechanics, Math, and Limitations
Mastering Long Short-Term Memory (LSTM) Networks
Understanding GRU Architecture and the Power of Path Signatures
A Deep Dive into Bidirectional RNNs, LSTMs, and GRUs
Deep Recurrent Neural Networks: Engineering Depth for Complex Sequences
Data Augmentation Techniques for Tabular Data: From Noise Injection to SMOTE
A Guide to Synthetic Data Generation: Statistical and Probabilistic Approaches
Maximum A Posteriori (MAP) Estimation: Balancing Data and Expert Knowledge
Beyond Simple Imputation: Understanding MICE for Robust Data Science
Maximizing Predictive Power: Best Practices in Feature Engineering for Tabular Data
Related Books for Further Understanding
These books cover the wide range of theories and practices; from fundamentals to PhD level.

Linear Algebra Done Right

Foundations of Machine Learning, second edition (Adaptive Computation and Machine Learning series)

Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications

Machine Learning Design Patterns: Solutions to Common Challenges in Data Preparation, Model Building, and MLOps

Practical Time Series Analysis: Prediction with Statistics and Machine Learning
Share What You Learned
Kuriko IWAI, "Advanced Cross-Validation for Sequential Data: A Guide to Avoiding Data Leakage" in Kernel Labs
https://kuriko-iwai.com/advanced-cross-validation-techniques
Looking for Solutions?
- Deploying ML Systems 👉 Book a briefing session
- Hiring an ML Engineer 👉 Drop an email
- Learn by Doing 👉 Enroll AI Engineering Masterclass
Written by Kuriko IWAI. All images, unless otherwise noted, are by the author. All experimentations on this blog utilize synthetic or licensed data.









