TrainingSet (starclass.training_sets.TrainingSet)

class starclass.training_sets.TrainingSet(level='L1', datalevel='corr', tf=0.0, linfit=False, random_seed=42)[source]

Bases: object

Generic Training Set.

key

Unique identifier for training set.

Type:

str

linfit

Indicating if linfit mechanism is enabled.

Type:

bool

testfraction

Test-fraction.

Type:

float

StellarClasses

Enum of the classes associated with this training set.

Type:

enum

random_seed

Random seed in use.

Type:

int

features_cache

Path to directory where cache of extracted features is being stored.

Type:

str

train_idx
Type:

ndarray

test_idx
Type:

ndarray

crossval_folds

Number of cross-validation folds the training set has been split into. If 0 the training set has not been split.

Type:

int

fold

The current cross-validation fold. This is 0 in the original training set.

Type:

int

__init__(level='L1', datalevel='corr', tf=0.0, linfit=False, random_seed=42)[source]

Initialize TrainingSet.

Parameters:
  • level (str) – Level of the classification. Choises are 'L1' and 'L2'. Default is level 1.

  • tf (float) – Test-fraction. Default=0.

  • linfit (bool) – Should linfit be enabled for the trainingset? If linfit is enabled, lightcurves will be detrended using a linear trend before passed on to have frequencies extracted. See BaseClassifier.calc_features() for details.

  • random_seed (int) – Random seed. Default=42.

  • datalevel (str) – Deprecated.

Code author: Rasmus Handberg <rasmush@phys.au.dk>

clear_cache()[source]

Clear features cache.

This will delete the features cache directory in the training-set data directory, and delete all MOAT cache tables in the training-set.

Code author: Rasmus Handberg <rasmush@phys.au.dk>

close()[source]
features()[source]

Iterator of features for training.

Returns:

Iterator of dicts containing features to be used for training.

Return type:

Iterator

Code author: Rasmus Handberg <rasmush@phys.au.dk>

features_test()[source]

Iterator of features for testing.

Returns:

Iterator of dicts containing features to be used for testing.

Return type:

Iterator

Code author: Rasmus Handberg <rasmush@phys.au.dk>

classmethod find_input_folder()[source]

Find the folder containing the data for the training set.

This is a class method, so it can be called without having to initialize the training set.

folds(n_splits=5)[source]

Split training set object into stratified folds.

Parameters:

n_splits (int, optional) – Number of folds to split training set into. Default=5.

Returns:

Iterator of folds, which are also

TrainingSet objects.

Return type:

Iterator of TrainingSet objects

generate_todolist()[source]

Generate todo.sqlite file in training set directory.

Code author: Rasmus Handberg <rasmush@phys.au.dk>

labels()[source]

Labels of training-set.

Returns:

Tuple of labels associated with features in features().

Each element is itself a tuple of enums of StellarClasses.

Return type:

tuple

Code author: Rasmus Handberg <rasmush@phys.au.dk>

labels_test()[source]

Labels of test-set.

Returns:

Tuple of labels associated with features in features_test().

Each element is itself a tuple of enums of StellarClasses.

Return type:

tuple

Code author: Rasmus Handberg <rasmush@phys.au.dk>

load_targets()[source]
reload()[source]

Reload in-memory TaskManager connected to TrainingSet todo-file.

tset_datadir(url)[source]

Setup TrainingSet data directory. If the directory doesn’t already exist,

Parameters:

url (string) – URL from where to download the training-set if it doesn’t already exist.

Returns:

Path to directory where training set is stored.

Return type:

string

Code author: Rasmus Handberg <rasmush@phys.au.dk>