TrainingSet (`starclass.training_sets.TrainingSet`)

class starclass.training_sets.TrainingSet(level='L1', datalevel='corr', tf=0.0, linfit=False, random_seed=42)[source]

Bases: object

Generic Training Set.

key

Unique identifier for training set.

Type:: str

linfit

Indicating if linfit mechanism is enabled.

Type:: bool

testfraction

Test-fraction.

Type:: float

StellarClasses

Enum of the classes associated with this training set.

Type:: enum

random_seed

Random seed in use.

Type:: int

features_cache

Path to directory where cache of extracted features is being stored.

Type:: str

train_idx

Type:: ndarray

test_idx

Type:: ndarray

crossval_folds

Number of cross-validation folds the training set has been split into. If 0 the training set has not been split.

Type:: int

fold

The current cross-validation fold. This is 0 in the original training set.

Type:: int

__init__(level='L1', datalevel='corr', tf=0.0, linfit=False, random_seed=42)[source]

Initialize TrainingSet.

Parameters:

level (str) – Level of the classification. Choises are 'L1' and 'L2'. Default is level 1.
tf (float) – Test-fraction. Default=0.
linfit (bool) – Should linfit be enabled for the trainingset? If linfit is enabled, lightcurves will be detrended using a linear trend before passed on to have frequencies extracted. See BaseClassifier.calc_features() for details.
random_seed (int) – Random seed. Default=42.
datalevel (str) – Deprecated.

Code author: Rasmus Handberg <rasmush@phys.au.dk>

clear_cache()[source]

Clear features cache.

This will delete the features cache directory in the training-set data directory, and delete all MOAT cache tables in the training-set.

Code author: Rasmus Handberg <rasmush@phys.au.dk>

close()[source]

features()[source]

Iterator of features for training.

Returns:: Iterator of dicts containing features to be used for training.
Return type:: Iterator

Code author: Rasmus Handberg <rasmush@phys.au.dk>

features_test()[source]

Iterator of features for testing.

Returns:: Iterator of dicts containing features to be used for testing.
Return type:: Iterator

Code author: Rasmus Handberg <rasmush@phys.au.dk>

classmethod find_input_folder()[source]

Find the folder containing the data for the training set.

This is a class method, so it can be called without having to initialize the training set.

folds(n_splits=5)[source]

Split training set object into stratified folds.

Parameters:

n_splits (int, optional) – Number of folds to split training set into. Default=5.

Returns:

Iterator of folds, which are also: TrainingSet objects.

Return type:

Iterator of TrainingSet objects

generate_todolist()[source]

Generate todo.sqlite file in training set directory.

Code author: Rasmus Handberg <rasmush@phys.au.dk>

labels()[source]

Labels of training-set.

Returns:

Tuple of labels associated with features in features().: Each element is itself a tuple of enums of StellarClasses.

Return type:

tuple

Code author: Rasmus Handberg <rasmush@phys.au.dk>

labels_test()[source]

Labels of test-set.

Returns:

Tuple of labels associated with features in features_test().: Each element is itself a tuple of enums of StellarClasses.

Return type:

tuple

Code author: Rasmus Handberg <rasmush@phys.au.dk>

load_targets()[source]

reload()[source]: Reload in-memory TaskManager connected to TrainingSet todo-file.

tset_datadir(url)[source]

Setup TrainingSet data directory. If the directory doesn’t already exist,

Parameters:: url (string) – URL from where to download the training-set if it doesn’t already exist.
Returns:: Path to directory where training set is stored.
Return type:: string

Code author: Rasmus Handberg <rasmush@phys.au.dk>

TrainingSet (starclass.training_sets.TrainingSet)

TrainingSet (`starclass.training_sets.TrainingSet`)