SortingHat Classifier (starclass.SortingHatClassifier)

class starclass.SortingHatClassifier(clfile='sortinghat_classifier_v01.pickle', n_estimators=1000, max_features='auto', min_samples_split=2, *args, **kwargs)[source]

Bases: BaseClassifier

Sorting-Hat Classifier

__init__(clfile='sortinghat_classifier_v01.pickle', n_estimators=1000, max_features='auto', min_samples_split=2, *args, **kwargs)[source]

Initialize the classifier object.

Parameters:
  • clfile (str) – Filepath to previously pickled Classifier_obj.

  • featfile (str) – Filepath to pre-calculated features, if available.

  • n_estimators (int) – number of trees in forest

  • max_features (int) – see sklearn.RandomForestClassifier

  • min_samples_split (int) – see sklearn.RandomForestClassifier

classify(task)

Classify a star from the lightcurve and other features.

Will run the do_classify() method and check some of the output and calculate various performance metrics.

Parameters:

features (dict) – Dictionary of features, including the lightcurve itself.

Returns:

Dictionary of classifications

Return type:

dict

Code author: Rasmus Handberg <rasmush@phys.au.dk>

close()

Close the classifier.

do_classify(features, recalc=False)[source]

Classify a single lightcurve.

Parameters:

features (dict) – Dictionary of features.

Returns:

Dictionary of stellar classifications.

Return type:

dict

featcalc(features, total=None, recalc=False)[source]

Calculates features for set of lightcurves

feature_importance_complete(tset=None, features=None, probs=None, diagnostics=None)

Function which will be called when feature importance is finishing.

Parameters:
  • tset

  • features

  • probs

  • diagnostics

Code author: Rasmus Handberg <rasmush@phys.au.dk>

load(infile)[source]

Load classifier object.

load_star(task)

Receive a task from the TaskManager, loads the lightcurve and returns derived features.

Parameters:

task (dict) – Task dictionary as returned by TaskManager.get_task().

Returns:

Dictionary with features.

Return type:

dict

Code author: Rasmus Handberg <rasmush@phys.au.dk>

parse_labels(labels)

Convert iterator of labels into full numpy array, with only one label per star.

TODO: How do we handle multiple labels better?

save(outfile)[source]

Save the classifier object with pickle.

test(tset, save=None, feature_importance=False)

Test classifier using training-set, which has been created with a test-fraction.

Parameters:
  • tset (TrainingSet) – Training-set to run testing on.

  • save (callable, optional) – Function to call for saving test-predictions.

test_complete(tset=None, features=None, probs=None, diagnostics=None)

Function which will be called when training is finishing.

Parameters:
  • tset

  • features

  • probs

  • diagnostics

Code author: Rasmus Handberg <rasmush@phys.au.dk>

train(tset, savecl=True, recalc=False, overwrite=False)[source]

Train the classifier.

Parameters:
  • labels (ndarray, [n_objects]) – labels for training set lightcurves.

  • features (iterable of dict) – features, inc lightcurves.

  • savecl – save classifier? (overwrite or recalc must be true for an old classifier to be overwritten)

  • overwrite – reruns SOM

  • recalc – recalculates features

property classifier_model
property random_seed

Random seed used in derived classifiers.

property random_state

Random state (numpy.random.RandomState) corresponding to random_seed.