RFGCClassifier (starclass.RFGCClassifier
)
- class starclass.RFGCClassifier(clfile='rfgc_classifier_v01.pickle', somfile='rfgc_som.txt', dimx=1, dimy=400, cardinality=64, n_estimators=1000, max_features=4, min_samples_split=2, *args, **kwargs)[source]
Bases:
BaseClassifier
General Random Forest
Code author: David Armstrong <d.j.armstrong@warwick.ac.uk>
- __init__(clfile='rfgc_classifier_v01.pickle', somfile='rfgc_som.txt', dimx=1, dimy=400, cardinality=64, n_estimators=1000, max_features=4, min_samples_split=2, *args, **kwargs)[source]
Initialize the classifier object.
- Parameters:
clfile (str) – Filepath to previously pickled Classifier_obj.
somfile (str) – Filepath to trained SOM saved using fc.kohonenSave
featfile (str) – Filepath to pre-calculated features, if available.
dimx (int) – dimension 1 of SOM in somfile, if given
dimy (int) – dimension 2 of SOM in somfile, if given
cardinality (int) – N bins per SOM pixel in somfile, if given
n_estimators (int) – number of trees in forest
max_features (int) – see sklearn.RandomForestClassifier
min_samples_split (int) – see sklearn.RandomForestClassifier
- classify(task)
Classify a star from the lightcurve and other features.
Will run the
do_classify()
method and check some of the output and calculate various performance metrics.- Parameters:
features (dict) – Dictionary of features, including the lightcurve itself.
- Returns:
Dictionary of classifications
- Return type:
dict
See also
Code author: Rasmus Handberg <rasmush@phys.au.dk>
- close()
Close the classifier.
- do_classify(features, recalc=False)[source]
Classify a single lightcurve.
- Parameters:
features (dict) – Dictionary of features.
- Returns:
Dictionary of stellar classifications.
- Return type:
dict
- featcalc(features, total=None, cardinality=64, linflatten=False, recalc=False)[source]
Calculates features for set features.
- feature_importance_complete(tset=None, features=None, probs=None, diagnostics=None)
Function which will be called when feature importance is finishing.
- Parameters:
tset
features
probs
diagnostics
See also
Code author: Rasmus Handberg <rasmush@phys.au.dk>
- load(infile, somfile=None)[source]
Loads classifier object.
somfile MUST match the som used to train the classifier.
- load_star(task)
Receive a task from the TaskManager, loads the lightcurve and returns derived features.
- Parameters:
task (dict) – Task dictionary as returned by
TaskManager.get_task()
.- Returns:
Dictionary with features.
- Return type:
dict
See also
Code author: Rasmus Handberg <rasmush@phys.au.dk>
- parse_labels(labels)
Convert iterator of labels into full numpy array, with only one label per star.
TODO: How do we handle multiple labels better?
- save(outfile, somoutfile='som.txt')[source]
Saves the classifier object with pickle.
som object saved as this MUST be the one used to train the classifier.
- test(tset, save=None, feature_importance=False)
Test classifier using training-set, which has been created with a test-fraction.
- Parameters:
tset (
TrainingSet
) – Training-set to run testing on.save (callable, optional) – Function to call for saving test-predictions.
- test_complete(tset=None, features=None, probs=None, diagnostics=None)
Function which will be called when training is finishing.
- Parameters:
tset
features
probs
diagnostics
See also
Code author: Rasmus Handberg <rasmush@phys.au.dk>
- train(tset, savecl=True, recalc=False, overwrite=False)[source]
Train the classifier.
- Parameters:
tset (
TrainingSet
) – labels for training set lightcurves.features (iterable of dict) – features, inc lightcurves.
savecl (bool, optional) – Save classifier? (
overwrite
orrecalc
must be true for an old classifier to be overwritten).overwrite (bool, optional) – Reruns SOM.
recalc (bool, optional) – Recalculates features.
- property classifier_model
- property random_seed
Random seed used in derived classifiers.
- property random_state
Random state (
numpy.random.RandomState
) corresponding torandom_seed
.