SMILE documentation¶

Introduction¶

SMILE package implements Linear Genetic Programming (LGP) algorithm in python, with a scikit-learn style API. LGP is a paradigm of genetic programming that employs a representation of linearly sequenced instructions. A population of diverse candidate models is initialized randomly and will improve prediction accuracy gradually using random sampled training set through a number of generations. After evolution, the best model with highest fitness score (i.e. accuracy on random sampled training set) will be the output.

linear genetic programming package implements LGP algorithm in python, with a scikit-learn compatible API. It retains the familiar scikit-learn fit/predict API and works with the existing scikit-learn modules (e.g. grid search ).

LGP API¶

class linear_genetic_programming.lgp_classifier.LGPClassifier(numberOfInput, numberOfOperation=5, numberOfVariable=4, numberOfConstant=9, max_prog_ini_length=30, min_prog_ini_length=10, maxProgLength=300, minProgLength=10, pCrossover=0.75, pConst=0.5, pInsert=0.5, pRegmut=0.6, pMacro=0.75, pMicro=0.5, tournamentSize=2, maxGeneration=200, fitnessThreshold=1.0, populationSize=1000, showGenerationStat=True, isRandomSampling=True, constInitRange=(1, 11, 1), randomState=None, testingAccuracy=-1, validationScores=None, names=None)¶

Linear Genetic Programming algorithm with scikit learn inspired API.

Parameters

numberOfInputinteger, required: Number of features, can be obtained use X.shape[1]
numberOfOperation: integer, optional: Operation consists of (+, -, *, /, ^) and branch (if less, if more)
numberOfVariable: integer, optional (default=4): A variable number of additional registers used to aid in calculations performed as part of a program. Number of variable size should be at least half of feature size.
numberOfConstant: integer, optional, (default=9): Number of constant in register. Constants are stored in registers that are write-protected. Constant registers are only initialized once at the beginning with values from a constInitRange.
max_prog_ini_length: integer, optional, (default=30): Max program initialization length
min_prog_ini_length: integer, optional, (default=10): Min program initialization length
maxProgLength: integer, optional, (default=300): maximum program length limit during evolution.
minProgLength: integer, optional, (default=10): minimum program length required during evolution.
pCrossover: float, optional, (default=0.75): Probability of exchanging the genetic information of two parent programs
pConst: float, optional, (default=0.5): Control the probability of constant in Instruciton initialization. It controls whether the register will be a constant. It also controls mutation probability in micromutaion. It controls whether a register will mutate to constant.
pInsert: float, optional, (default=0.5): Control probability of insertion in macromutation. It will insert a random instruction into the program.
pRegmut: float, optional, (default=0.6): Control probability of register mutation used in micromutaion. It will either mutate register1, register2 or return register.
pMacro: float, optional, (default=0.75): Probability of macromutation, Macromutation operate on the level of program. It will add or delete instruction. It will affect program size.
pMicro: float, optional, (default=0.5): Probability of micromuation. Micromuation operate on the level of instruction components (micro level) and manipulate registers, operators, and constants.
tournament_sizeinteger, optional, (default=2): The size of tournament selection. The number of programs that will compete to become part of the next generation.
maxGenerationsinteger, optional, (default=200): The number of generations to evolve.
fitnessThreshold: float, optional, (default=1.0): When not using random sampling, terminate the evolution if threshold is met. When using random sampling, fitnessThreshold has no effect.
populationSize: integer, optional, (default=1000): Size of population
showGenerationStat: boolean, optional, (default=True): If True, print out statistic in each generation. Set to False to save time. Some average statistical calculations is time consuming.
isRandomSampling: Boolean, optional, (default=True): Train the genetic algorithm on random sampled dataset (without replacement)
constInitRange: tuple (start, stop, step), optional, (default=(1,11,1)): Initiation of the constant set. range: [start, stop).
randomState: int, default=None: Controls both the randomness of the algorithm.
testingAccuracy: int: used to save testing set accuracy score
validationScores: dict: used to hold validation metrics during running
names: list: feature names of the dataset

Attributes

register_: array of shape (numberOfInput + numberOfVariable + numberOfConstant, ): Register stores the calculation variables, feature values and constants.
bestProg_: class Program: A list of Instructions used for classification calculation
bestEffProg_:: Best program with struct intron and semantic intron removed
bestProFitness_float: Training set accuracy score of the best program
bestProgStr_: str: String representation of the best program
bestEffProgStr_: str: Intron removed program string representation
populationAvg_: float: Average fitness of the final generation

Methods

`fit`(self, X, y)	Fit the Genetic Program according to X, y.
`get_params`(self[, deep])	Get parameters for this estimator.
`load_model`([fname, mode])	load lgp object from a pickle file.
`load_model_directly`(pickle_file_input)	Used to read a file in website
`predict`(self, X)	Predict using the best fit genetic model.
`predict_proba`(self, X)	Probability estimates.
`save_model`(self[, fname, mode])	Save the current object into a pickle file.
`score`(self, X, y[, sample_weight])	Return the mean accuracy on the given test data and labels.
`set_params`(self, \\params)	Set the parameters of this estimator.

fit(self, X, y)¶

Fit the Genetic Program according to X, y.

Parameters

Xarray-like, shape = [n_samples, n_features]: Training vectors, where n_samples is the number of samples and n_features is the number of features.
yarray-like, shape = [n_samples]: Target values.

Returns

selfbest program for classification: Returns self.

get_params(self, deep=True)¶

Get parameters for this estimator.

Parameters

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsmapping of string to any: Parameter names mapped to their values.

classmethod load_model(fname='lgp.pkl', mode='rb')¶

load lgp object from a pickle file. Assuming the file is in the same directory

Parameters

fname: string (default = ‘lgp.pkl’): file name of the output

Returns

lgp: LGPClassifier generator: generator

classmethod load_model_directly(pickle_file_input)¶

Used to read a file in website

Parameters

pickle_file_input: byte stream: BytesIO input

Returns

lgp: LGPClassifier generator: generator

predict(self, X)¶

Predict using the best fit genetic model.

Parameters

Xarray_like or sparse matrix, shape (n_samples, n_features): Samples.

Returns

Carray, shape (n_samples,): Returns predicted values.

predict_proba(self, X)¶

Probability estimates. The returned estimates for all classes are ordered by the label of classes.

Parameters

Xarray-like of shape (n_samples, n_features): Vector to be scored, where n_samples is the number of samples and n_features is the number of features.

Returns

Tarray-like of shape (n_samples, n_classes): Returns the probability of the sample for each class in the model, where classes are ordered as they are in self.classes_.

save_model(self, fname='lgp.pkl', mode='ab')¶

Save the current object into a pickle file. Assuming the file is in the same directory.

Parameters

fname: string (default = ‘lgp.pkl’): file name of the output

Returns

True:: if successfully saved

score(self, X, y, sample_weight=None)¶

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters

Xarray-like of shape (n_samples, n_features): Test samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True labels for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns

scorefloat: Mean accuracy of self.predict(X) wrt. y.

set_params(self, **params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**paramsdict: Estimator parameters.

Returns

selfobject: Estimator instance.

SMILE documentation¶

Introduction¶

LGP API¶

Indices and tables¶

LGP

Navigation

Related Topics