TrainingSet (starclass.training_sets.TrainingSet
)
- class starclass.training_sets.TrainingSet(level='L1', datalevel='corr', tf=0.0, linfit=False, random_seed=42)[source]
Bases:
object
Generic Training Set.
- key
Unique identifier for training set.
- Type:
str
- linfit
Indicating if linfit mechanism is enabled.
- Type:
bool
- testfraction
Test-fraction.
- Type:
float
- StellarClasses
Enum of the classes associated with this training set.
- Type:
enum
- random_seed
Random seed in use.
- Type:
int
- features_cache
Path to directory where cache of extracted features is being stored.
- Type:
str
- train_idx
- Type:
ndarray
- test_idx
- Type:
ndarray
- crossval_folds
Number of cross-validation folds the training set has been split into. If
0
the training set has not been split.- Type:
int
- fold
The current cross-validation fold. This is
0
in the original training set.- Type:
int
- __init__(level='L1', datalevel='corr', tf=0.0, linfit=False, random_seed=42)[source]
Initialize TrainingSet.
- Parameters:
level (str) – Level of the classification. Choises are
'L1'
and'L2'
. Default is level 1.tf (float) – Test-fraction. Default=0.
linfit (bool) – Should linfit be enabled for the trainingset? If
linfit
is enabled, lightcurves will be detrended using a linear trend before passed on to have frequencies extracted. SeeBaseClassifier.calc_features()
for details.random_seed (int) – Random seed. Default=42.
datalevel (str) – Deprecated.
Code author: Rasmus Handberg <rasmush@phys.au.dk>
- clear_cache()[source]
Clear features cache.
This will delete the features cache directory in the training-set data directory, and delete all MOAT cache tables in the training-set.
Code author: Rasmus Handberg <rasmush@phys.au.dk>
- features()[source]
Iterator of features for training.
- Returns:
Iterator of dicts containing features to be used for training.
- Return type:
Iterator
Code author: Rasmus Handberg <rasmush@phys.au.dk>
- features_test()[source]
Iterator of features for testing.
- Returns:
Iterator of dicts containing features to be used for testing.
- Return type:
Iterator
Code author: Rasmus Handberg <rasmush@phys.au.dk>
- classmethod find_input_folder()[source]
Find the folder containing the data for the training set.
This is a class method, so it can be called without having to initialize the training set.
- folds(n_splits=5)[source]
Split training set object into stratified folds.
- Parameters:
n_splits (int, optional) – Number of folds to split training set into. Default=5.
- Returns:
- Iterator of folds, which are also
TrainingSet
objects.
- Return type:
Iterator of
TrainingSet
objects
- generate_todolist()[source]
Generate todo.sqlite file in training set directory.
Code author: Rasmus Handberg <rasmush@phys.au.dk>
- labels()[source]
Labels of training-set.
- Returns:
- Tuple of labels associated with features in
features()
. Each element is itself a tuple of enums of
StellarClasses
.
- Tuple of labels associated with features in
- Return type:
tuple
Code author: Rasmus Handberg <rasmush@phys.au.dk>
- labels_test()[source]
Labels of test-set.
- Returns:
- Tuple of labels associated with features in
features_test()
. Each element is itself a tuple of enums of
StellarClasses
.
- Tuple of labels associated with features in
- Return type:
tuple
Code author: Rasmus Handberg <rasmush@phys.au.dk>
- tset_datadir(url)[source]
Setup TrainingSet data directory. If the directory doesn’t already exist,
- Parameters:
url (string) – URL from where to download the training-set if it doesn’t already exist.
- Returns:
Path to directory where training set is stored.
- Return type:
string
Code author: Rasmus Handberg <rasmush@phys.au.dk>