BaseClassifier (starclass.BaseClassifier)

class starclass.BaseClassifier(tset=None, features_cache=None, plot=False, data_dir=None, truncate_lightcurves=None)[source]

Bases: object

The basic stellar classifier class for the TASOC pipeline. All other specific stellar classification algorithms will inherit from BaseClassifier.

plot

Indicates wheter plotting is enabled.

Type:

bool

data_dir

Path to directory where classifiers store auxiliary data. Different directories will be used for each classification level.

Type:

str

features_cache

Path to directory where calculated features will be saved/loaded as needed.

Type:

str

classifier_key

Keyword/name of the current classifier.

Type:

str

StellarClasses

Enum of all possible labels the classifier should be able to classify stars into. This will depend on the level which the classifier is run on.

Type:

enum.Enum

features_names

List of names of features used by the classifier.

Type:

list

truncate_lightcurves

Indicating if Kepler/K2 lightcurves will be trunctated to 27.4 days when loaded. Default is to truncate lightcurves if running with short training sets (27.4 days) and not truncate if running with long (90 day) training-sets.

Type:

bool

Code author: Rasmus Handberg <rasmush@phys.au.dk>

__init__(tset=None, features_cache=None, plot=False, data_dir=None, truncate_lightcurves=None)[source]

Initialize the classifier object.

Parameters:
  • tset (TrainingSet) – From which training-set should the classifier be loaded?

  • level (str, optional) – Classification-level to load. Choices are 'L1' and 'L2'. Default is 'L1'.

  • features_cache (str, optional) – Path to director where calculated features will be saved/loaded as needed.

  • plot (bool, optional) – Create plots as part of the output. Default is False.

  • data_dir (str)

  • truncate_lightcurves (bool) – Force truncation of lightcurves to 27.4 days. If None, the default will be decided based on the training-set provided in tset.

Code author: Rasmus Handberg <rasmush@phys.au.dk>

classify(task)[source]

Classify a star from the lightcurve and other features.

Will run the do_classify() method and check some of the output and calculate various performance metrics.

Parameters:

features (dict) – Dictionary of features, including the lightcurve itself.

Returns:

Dictionary of classifications

Return type:

dict

Code author: Rasmus Handberg <rasmush@phys.au.dk>

close()[source]

Close the classifier.

do_classify(features)[source]

Classify a star from the lightcurve and other features.

This method should be overwritten by child classes.

Parameters:

features (dict) – Dictionary of features of star, including the lightcurve itself.

Returns:

Dictionary where the keys should be from StellarClasses and the corresponding values indicate the probability of the star belonging to that class.

Return type:

dict

Raises:

NotImplementedError – If classifier has not implemented this subroutine.

feature_importance_complete(tset=None, features=None, probs=None, diagnostics=None)[source]

Function which will be called when feature importance is finishing.

Parameters:
  • tset

  • features

  • probs

  • diagnostics

Code author: Rasmus Handberg <rasmush@phys.au.dk>

load_star(task)[source]

Receive a task from the TaskManager, loads the lightcurve and returns derived features.

Parameters:

task (dict) – Task dictionary as returned by TaskManager.get_task().

Returns:

Dictionary with features.

Return type:

dict

Code author: Rasmus Handberg <rasmush@phys.au.dk>

parse_labels(labels)[source]

Convert iterator of labels into full numpy array, with only one label per star.

TODO: How do we handle multiple labels better?

test(tset, save=None, feature_importance=False)[source]

Test classifier using training-set, which has been created with a test-fraction.

Parameters:
  • tset (TrainingSet) – Training-set to run testing on.

  • save (callable, optional) – Function to call for saving test-predictions.

test_complete(tset=None, features=None, probs=None, diagnostics=None)[source]

Function which will be called when training is finishing.

Parameters:
  • tset

  • features

  • probs

  • diagnostics

Code author: Rasmus Handberg <rasmush@phys.au.dk>

train(tset)[source]

Train classifier on training set.

This method should be overwritten by child classes.

Parameters:

tset (TrainingSet) – Training-set to train classifier on.

Raises:

NotImplementedError – If classifier has not implemented this subroutine.

property classifier_model
property random_seed

Random seed used in derived classifiers.

property random_state

Random state (numpy.random.RandomState) corresponding to random_seed.