src.data.datasets module
- src.data.datasets.create_linear_data_loader(num_workers, batch_size, worker_id, n_samples=100, n_features=110, noise=0.0, val_size=0.01, test_size=0.2, random_state=42)[source]
Return a DataLoader for a shard of the linear training set. Also returns the input dimension.
- src.data.datasets.create_linear_dataset(n_samples=100, n_features=110, noise=0.0, random_state=None)[source]
- Overparameterized linear regression dataset:
X sampled U(-3, 3)
y = X @ w_true + noise
- src.data.datasets.create_poly_varied_data_loader(num_workers, batch_size, worker_id, n_samples=100, n_features=110, max_degree=4, noise=0.0, val_size=0.2, test_size=0.2, random_state=42)[source]
Return a DataLoader for a shard of the poly-varied training set. Also returns the input dimension (same as n_features).
- src.data.datasets.create_poly_varied_dataset(n_samples=100, n_features=110, max_degree=4, noise=0.0, random_state=None)[source]
- Overparameterized nonlinear regression dataset:
X sampled U(-3, 3)
Each feature i raised to its own degree_i ∈ [1, max_degree]
y = sum_i w_true[i] * (X[:, i] ** degree_i) + noise
Return: X_raw, y, degrees
- src.data.datasets.load_linear_data(n_samples=100, n_features=110, noise=0.0, val_size=0.01, test_size=0.2, random_state=None)[source]
Generate a linear overparam dataset and split it. Returns: X_train, y_train, X_val, y_val, X_test, y_test