src.data.datasets module

src.data.datasets.create_linear_data_loader(num_workers, batch_size, worker_id, n_samples=100, n_features=110, noise=0.0, val_size=0.01, test_size=0.2, random_state=42)[source]

Return a DataLoader for a shard of the linear training set. Also returns the input dimension.

src.data.datasets.create_linear_dataset(n_samples=100, n_features=110, noise=0.0, random_state=None)[source]
Overparameterized linear regression dataset:
  • X sampled U(-3, 3)

  • y = X @ w_true + noise

src.data.datasets.create_poly_varied_data_loader(num_workers, batch_size, worker_id, n_samples=100, n_features=110, max_degree=4, noise=0.0, val_size=0.2, test_size=0.2, random_state=42)[source]

Return a DataLoader for a shard of the poly-varied training set. Also returns the input dimension (same as n_features).

src.data.datasets.create_poly_varied_dataset(n_samples=100, n_features=110, max_degree=4, noise=0.0, random_state=None)[source]
Overparameterized nonlinear regression dataset:
  • X sampled U(-3, 3)

  • Each feature i raised to its own degree_i ∈ [1, max_degree]

  • y = sum_i w_true[i] * (X[:, i] ** degree_i) + noise

Return: X_raw, y, degrees

src.data.datasets.load_linear_data(n_samples=100, n_features=110, noise=0.0, val_size=0.01, test_size=0.2, random_state=None)[source]

Generate a linear overparam dataset and split it. Returns: X_train, y_train, X_val, y_val, X_test, y_test

src.data.datasets.load_poly_varied_data(n_samples=100, n_features=110, max_degree=4, noise=0.0, val_size=0.2, test_size=0.2, random_state=42)[source]

Generate a polynomial-varied dataset, split it, and also return degrees. Returns: (X_train, y_train, X_val, y_val, X_test, y_test, degrees)

src.data.datasets.split_data(X, y, val_size=0.0, test_size=0.2, random_state=None)[source]
Splits (X, y) into train/val/test.
  • train: (1 - val_size - test_size)

  • val: val_size

  • test: test_size