RFGCClassifier (starclass.RFGCClassifier)

class starclass.RFGCClassifier(clfile='rfgc_classifier_v01.pickle', somfile='rfgc_som.txt', dimx=1, dimy=400, cardinality=64, n_estimators=1000, max_features=4, min_samples_split=2, *args, **kwargs)[source]

Bases: BaseClassifier

General Random Forest

Code author: David Armstrong <d.j.armstrong@warwick.ac.uk>

__init__(clfile='rfgc_classifier_v01.pickle', somfile='rfgc_som.txt', dimx=1, dimy=400, cardinality=64, n_estimators=1000, max_features=4, min_samples_split=2, *args, **kwargs)[source]

Initialize the classifier object.

Parameters:
  • clfile (str) – Filepath to previously pickled Classifier_obj.

  • somfile (str) – Filepath to trained SOM saved using fc.kohonenSave

  • featfile (str) – Filepath to pre-calculated features, if available.

  • dimx (int) – dimension 1 of SOM in somfile, if given

  • dimy (int) – dimension 2 of SOM in somfile, if given

  • cardinality (int) – N bins per SOM pixel in somfile, if given

  • n_estimators (int) – number of trees in forest

  • max_features (int) – see sklearn.RandomForestClassifier

  • min_samples_split (int) – see sklearn.RandomForestClassifier

classify(task)

Classify a star from the lightcurve and other features.

Will run the do_classify() method and check some of the output and calculate various performance metrics.

Parameters:

features (dict) – Dictionary of features, including the lightcurve itself.

Returns:

Dictionary of classifications

Return type:

dict

Code author: Rasmus Handberg <rasmush@phys.au.dk>

close()

Close the classifier.

do_classify(features, recalc=False)[source]

Classify a single lightcurve.

Parameters:

features (dict) – Dictionary of features.

Returns:

Dictionary of stellar classifications.

Return type:

dict

featcalc(features, total=None, cardinality=64, linflatten=False, recalc=False)[source]

Calculates features for set features.

feature_importance_complete(tset=None, features=None, probs=None, diagnostics=None)

Function which will be called when feature importance is finishing.

Parameters:
  • tset

  • features

  • probs

  • diagnostics

Code author: Rasmus Handberg <rasmush@phys.au.dk>

load(infile, somfile=None)[source]

Loads classifier object.

somfile MUST match the som used to train the classifier.

load_star(task)

Receive a task from the TaskManager, loads the lightcurve and returns derived features.

Parameters:

task (dict) – Task dictionary as returned by TaskManager.get_task().

Returns:

Dictionary with features.

Return type:

dict

Code author: Rasmus Handberg <rasmush@phys.au.dk>

loadsom(somfile)[source]

Loads a SOM, if not done at init.

parse_labels(labels)

Convert iterator of labels into full numpy array, with only one label per star.

TODO: How do we handle multiple labels better?

save(outfile, somoutfile='som.txt')[source]

Saves the classifier object with pickle.

som object saved as this MUST be the one used to train the classifier.

test(tset, save=None, feature_importance=False)

Test classifier using training-set, which has been created with a test-fraction.

Parameters:
  • tset (TrainingSet) – Training-set to run testing on.

  • save (callable, optional) – Function to call for saving test-predictions.

test_complete(tset=None, features=None, probs=None, diagnostics=None)

Function which will be called when training is finishing.

Parameters:
  • tset

  • features

  • probs

  • diagnostics

Code author: Rasmus Handberg <rasmush@phys.au.dk>

train(tset, savecl=True, recalc=False, overwrite=False)[source]

Train the classifier.

Parameters:
  • tset (TrainingSet) – labels for training set lightcurves.

  • features (iterable of dict) – features, inc lightcurves.

  • savecl (bool, optional) – Save classifier? (overwrite or recalc must be true for an old classifier to be overwritten).

  • overwrite (bool, optional) – Reruns SOM.

  • recalc (bool, optional) – Recalculates features.

property classifier_model
property random_seed

Random seed used in derived classifiers.

property random_state

Random state (numpy.random.RandomState) corresponding to random_seed.