src.data package
Submodules
src.data.base module
Shared abstract class for dataset builders.
src.data.datasets module
- src.data.datasets.create_linear_data_loader(num_workers, batch_size, worker_id, n_samples=100, n_features=110, noise=0.0, val_size=0.01, test_size=0.2, random_state=42)[source]
Return a DataLoader for a shard of the linear training set. Also returns the input dimension.
- src.data.datasets.create_linear_dataset(n_samples=100, n_features=110, noise=0.0, random_state=None)[source]
- Overparameterized linear regression dataset:
X sampled U(-3, 3)
y = X @ w_true + noise
- src.data.datasets.create_poly_varied_data_loader(num_workers, batch_size, worker_id, n_samples=100, n_features=110, max_degree=4, noise=0.0, val_size=0.2, test_size=0.2, random_state=42)[source]
Return a DataLoader for a shard of the poly-varied training set. Also returns the input dimension (same as n_features).
- src.data.datasets.create_poly_varied_dataset(n_samples=100, n_features=110, max_degree=4, noise=0.0, random_state=None)[source]
- Overparameterized nonlinear regression dataset:
X sampled U(-3, 3)
Each feature i raised to its own degree_i ∈ [1, max_degree]
y = sum_i w_true[i] * (X[:, i] ** degree_i) + noise
Return: X_raw, y, degrees
- src.data.datasets.load_linear_data(n_samples=100, n_features=110, noise=0.0, val_size=0.01, test_size=0.2, random_state=None)[source]
Generate a linear overparam dataset and split it. Returns: X_train, y_train, X_val, y_val, X_test, y_test
src.data.full module
A ultra‑simple builder that always returns the complete dataset.
Use this when you want every worker to look at identical data, exactly like the FullDataLoaderBuilder from your notebook.
src.data.linear module
Synthetic over-parameterised linear regression dataset builder.
- class src.data.linear.LinearRegressionBuilder(num_workers: int, n_samples: int = 100, n_features: int = 110, noise: float = 0.0, shard: bool = False, seed: int | None = None)[source]
Bases:
AbstractDataBuilder
src.data.poly module
Non-linear “poly-varied” synthetic regression builder.
- class src.data.poly.PolyVariedBuilder(num_workers: int, n_samples: int = 100, n_features: int = 110, max_degree: int = 4, noise: float = 0.0, shard: bool = False, seed: int | None = None)[source]
Bases:
AbstractDataBuilder