Using Genetic Algor...

# Using Genetic Algorithms To Do Portfolio Optimisation

(@david)
Joined: 4 years ago
Posts: 27
25/10/2019 4:36 pm

In this approach, I will be talking about a problem posted on the Auquan website. Go here to view the problem and code along: quant-quest.auquan.com/competitions/uk-data-science-varsity

If you want to get more detail on the deap implementation head to:

#### Introduction

There are many ways to solve optimisation problems and this is by no means the best or easiest way. Evolutionary algorithms are typically used to provide good approximate solutions to problems that cannot be solved easily using other techniques. Given that this problem has some difficult constraints, this is an approach that would be useful in generating an initial answer.

In general, for any optimization problem where the solution space we are searching isn’t easy to understand or has complex boundaries with many constraints, EA’s may provide good results. This does still require that we are able to develop an effective EA representation of that space.

The problem presented in the Global Recruiting Competition (see above) has many constraints and it’s difficult to understand how that affects the solution space, we will try solving it using an evolutionary algorithm-based approach.

If you are earlier in your data science journey, before you read this article you may want to look at a few articles on EAs.

### Creating a Solution

#### Imports

We first get the required modules for our approach, we will use deap for EA implementation. (Learn more about deap: Paper | Github)

• `random` gives us a way to generate random bits;
• `deap` this implements the functionality for evolutionary algorithms in python, we'll use the following submodules
• `base` gives us access to the Toolbox and base Fitness;
• `creator` allows us to create our types;
• `tools` grants us access to the operators bank;
• `algorithms` enables us some ready generic evolutionary loops.
`import random`
`from deap import base, creator, tools, algorithms`

We are going to try to get the optimal weights for only a single date, but this approach should work for any date in the problem dataset. I am using the optimisation parameters directly from pickle files.

`import pickle`
`("ri", "Dt", "St" and "wi" refer to features of the supplied dataset - see the problem description for details)`
`f = open("ri.pickle","rb") ## predicted returns for this date`
`ri = pickle.load(f)`
`f = open("Dt.pickle","rb") ## Duration values for assets`
`Dt = pickle.load(f)`
`f = open("St.pickle","rb") ## Spread values for assets`
`St = pickle.load(f)`
`f = open("wi.pickle","rb") ## weights in the index for assets`
`wi = pickle.load(f)`
`f.close()`

#### Type Creation

First step with deap is to create the types we are going to need for our algorithm. Usually, the types created are ‘the fitness’ and ‘the individual’. For this particular problem, we want to create a portfolio with the best returns possible. Thus we need a list of real numbers for weights and our fitness would be just the net returns (adjusted for penalties for voilating constraints).

Type creation is done by calling the function `create` in the creator module. This function takes two mandatory arguments and additional optional arguments. The first argument is the actual name of the type that we want to create. The second argument is the base class that the new type created should inherit from. Finally, the optional arguments are members to add to the new type. For a detailed understanding of deap types, check out this.

`creator.create("FitnessMax", base.Fitness, weights=(1.0,))`
`creator.create("Individual", list, fitness=creator.FitnessMax)`

The first line creates a maximizing fitness by replacing, in the base type Fitness, the pure virtual weights attribute by (1.0,) that means to maximize a single objective fitness. The second line creates an Individual class that inherits the properties of list and has a fitness attribute of the type FitnessMax that was just created. (We are using FitnessMax because we want to maximise returns, if this was a minimisation problem we would use FitnessMin).

The classes we have created are made available in the creator module. We can directly instantiate objects of each class as follows:

`ind = creator.Individual([1, 0, 1, 1, 0]) ## create an individual with these weights`
`print(ind)`
`print(type(ind))`
`print(type(ind.fitness))`

#### Initialising the individual and population

An individual in the sense of evolutionary algorithms is just a set of weights. A population is just a set of multiple individuals. The goal is to find the best individual as the population evolves and the best individual is defined on the basis of fitness.

The toolbox class in deap is intended to store functions with their arguments for evolving the population. We can register a function with two mandatory arguments, the alias to give to the function and the function it will be associated with. Any additional arguments can be given as an argument when the alias is called.

`toolbox = base.Toolbox()`
`toolbox.register("attr_bool", random.random)`
`toolbox.register("individual", tools.initRepeat, creator.Individual, toolbox.attr_bool, n=len(ri))`
`toolbox.register("population", tools.initRepeat, list, toolbox.individual)`

In this last block of code, we created a toolbox object and registered three functions:

• This first one `attr_bool` creates a random real number in the interval [0,1][0,1] by using random we imported earlier.
• The second function `individual` will use the initRepeat function (from the deap toolbox) to fill an `Individual` class with what is produced by repeated calls to the previously defined `attr_bool` function, this basically initialises the weights for a random portfolio.
• The final one does the same thing for the `population` function, which initialises our population.

For example, calling every function individually shows how it proceeds.

`bit = toolbox.attr_bool()`
`ind = toolbox.individual()`
`pop = toolbox.population(n=500)`

`print("bit is of type %s\n" % (type(bit)))`
`print("ind is of type %s and contains %d bits\n" % (type(ind), len(ind)))`
`print("pop is of type %s and contains %d individuals\n" % (type(pop), len(pop)))`

#### Evaluation Function

In genetic algorithms, the evaluation function is where all the magic happens. This is the function that is used to define the fitness of an individual, this is where we define our objective. In the cell below we define our objective as the returns. We also apply penalties for voilating any constraints (These are defined in the problem copy).In this walkthrough, we have only made simple duration and spread constraints — it’s up to you to make the rest!

Although the function below doesn’t have all the constraints implemented, it should be easy enough for you to do the rest. The good thing about evolutionary algorithms is that we don’t have to worry about things like gradients and hessian calculation, so we can have non-differentiable objective functions and we can have things like step functions. One example of this would be applying a very high penalty when a very important constraint is violated. For instance, in this setup, a penalty of 10000 would be very high and would almost certainly mean that the individual would not survive to the next generation. These are sometimes also called death penalties.

If you do decide to use an EA approach, your main task is to come up with this evaluation function to make sure the constraints are satisfied at least 80% of the time while maximising returns.

`weight1 = 10  ## these are the penalty weights I use for our objective function, this is for spread`
`weight2 = 10  ## this is for duration`

`def evalPortfolio(individual, St = St, wi = wi, ri = ri, Dt = Dt):`
`import numpy as np`
`objective = np.dot(individual,ri) ## calculating returns`
`if ((np.dot(St,individual)/np.dot(St,wi))-1) >0 : ## if the spread constraint is violated add a penalty`
`objective = objective - weight1*((np.dot(St,individual)/np.dot(St,wi))-1) ##`

`## The above code subtracts the penalty from the fitness value,`
`## the higher the violation the more unfit the individual will be, we can use the values of weight1`
`## and weight2 to specify what constraint we value more. For example, if I keep weight1 as 10 but increase`
`## weight2 to 100 that will penalize the violations for duration constraint more`

`if ((np.dot(Dt,individual)/np.dot(Dt,wi))-1) >0 : ## if the Duration constraint is violated add a penalty`
`objective = objective - weight2*((np.dot(Dt,individual)/np.dot(Dt,wi))-1)`

`return objective,`

#### Genetic Operators

These are the operators that are used to evolve the population. If you want to learn how these work, look at the articles on EA’s above.

Registering the operators and their default arguments in the toolbox is done as follows.

`toolbox.register("evaluate", evalPortfolio)`
`toolbox.register("mate", tools.cxTwoPoint)`
`toolbox.register("mutate", tools.mutFlipBit, indpb=0.10)`
`toolbox.register("select", tools.selTournament, tournsize=3)`

(If you don’t want to go into details of how the population is evolved, you can skip this section.)

The evaluation is given the alias `evaluate`. As this doesn't have any other arguments apart from the individual itself we don't need to add anything. The two points crossover function is registered the same way under the alias `mate`. We’re just going to use the predefined function from deap toolbox.

The mutation, for its part, needs an argument to be fixed (the independent probability of each attribute to be mutated `indpb`). Finally, the selection operator is registered under the name `select` and the size of the tournament set to 3.

We can, for example, mutate an individual and expect 10% of its attributes to be flipped.

`ind = toolbox.individual()`
`print(ind)`
`print("\n")`
`toolbox.mutate(ind)`
`print(ind)`

`[0.5828673696058406, 0.11366978696534202, 0.3375211617962608, 0.3998475660022206, 0.6608856743010275, 0.8803437895548394, 0.0411546208458744, 0.5374129121886653, 0.42636462233221684, 0.857497221679833, 0.44360394141642034, 0.5606787522881802, 0.3215982070032286, 0.514570845181715, 0.3032848172603626, 0.9550253885848993, 0.14695497774781818, 0.7261529437292629, 0.19772593773149327, 0.17496858916582325, 0.058915574639683554, 0.05282789654984488, 0.14229041732505954, 0.47347302527629187, 0.4239547330119139, 0.6261109729805046, 0.03632821035165801, 0.7562922249613215, 0.5981761054657581, 0.8951669524921041, 0.7969022215470213, 0.629130909604167, 0.7895227643622067, 0.05251624082969941, 0.474078511882579, 0.42510387579952913, 0.27176574523503405, 0.4887504917363862, 0.4180443419987684, 0.09724195363682164, 0.6463995607565507, 0.5250955563584663, 0.17999165178792353, 0.7993156155187991, 0.4306168784713159, 0.6271311547539816, 0.28114655910628805, 0.9629001274296468, 0.24255727043218245, 0.6417819388281502, 0.4782451132157759, 0.3992679686646209, 0.5472617220756029, 0.8861454710404537, 0.3033614411604343, 0.5473879486591839, 0.48891161310442965, 0.013477032287178004, 0.5849723466518983, 0.9862835912777634, 0.686561334269392, 0.5684763413298725, 0.747610241443788, 0.7027432225020004, 0.022220118118156607, 0.9049328730056511, 0.11989917307709574, 0.15352517436590252, 0.4443494107520093, 0.5630954514349847, 0.1385205591642642, 0.1501749362990964, 0.3029983430997525, 0.6660829511423043, 0.6281682337982342, 0.8299655681140754, 0.6522269482319345, 0.38467801110472155, 0.7580278230001196, 0.01960680567661144, 0.7210372348101194, 0.7802174779644195, 0.9636268112795919, 0.3704593418242361, 0.18941624475794683, 0.6784929267319626, 0.8883247242902804, 0.5051784786589903, 0.8406932615605996, 0.7631444098278283, 0.828015597902116, 0.44519027205471284, 0.287603157379565, 0.4735382634830282]`
`vs.`
`[0.5828673696058406, 0.11366978696534202, 0.3375211617962608, 0.3998475660022206, 0.6608856743010275, 0.8803437895548394, 0.0411546208458744, 0.5374129121886653, 0.42636462233221684, 0.857497221679833, 0.44360394141642034, 0.5606787522881802, 0.3215982070032286, 0.514570845181715, 0.3032848172603626, 0.9550253885848993, 0.14695497774781818, 0.7261529437292629, 0.19772593773149327, 0.0, 0.058915574639683554, 0.05282789654984488, 0.14229041732505954, 0.47347302527629187, 0.4239547330119139, 0.0, 0.0, 0.7562922249613215, 0.5981761054657581, 0.8951669524921041, 0.7969022215470213, 0.629130909604167, 0.7895227643622067, 0.05251624082969941, 0.474078511882579, 0.42510387579952913, 0.0, 0.4887504917363862, 0.4180443419987684, 0.09724195363682164, 0.6463995607565507, 0.5250955563584663, 0.17999165178792353, 0.7993156155187991, 0.4306168784713159, 0.6271311547539816, 0.28114655910628805, 0.9629001274296468, 0.24255727043218245, 0.6417819388281502, 0.4782451132157759, 0.3992679686646209, 0.5472617220756029, 0.8861454710404537, 0.3033614411604343, 0.5473879486591839, 0.48891161310442965, 0.013477032287178004, 0.5849723466518983, 0.9862835912777634, 0.686561334269392, 0.5684763413298725, 0.747610241443788, 0.7027432225020004, 0.022220118118156607, 0.9049328730056511, 0.11989917307709574, 0.15352517436590252, 0.4443494107520093, 0.5630954514349847, 0.1385205591642642, 0.1501749362990964, 0.3029983430997525, 0.6660829511423043, 0.6281682337982342, 0.8299655681140754, 0.0, 0.38467801110472155, 0.7580278230001196, 0.01960680567661144, 0.7210372348101194, 0.0, 0.9636268112795919, 0.0, 0.18941624475794683, 0.6784929267319626, 0.8883247242902804, 0.5051784786589903, 0.8406932615605996, 0.7631444098278283, 0.828015597902116, 0.44519027205471284, 0.287603157379565, 0.4735382634830282]`

#### Evolving the Population

We evolve the population in the below function (main). It works by generating a population and giving it to the algorithm to be evolved. We are also going to employ some helpful introspection tools such as Statistics and a Hall of Fame. The Statistics object keeps tracks of various statistics of the population (as you’d expect), and the hall of fame keeps track of the best individuals that appear during the evolution. Once the evolution is finished the population contains the individuals from the last generation.

`def main():`
`import numpy`

`pop = toolbox.population(n=5000)`
`hof = tools.HallOfFame(1)`
`stats = tools.Statistics(lambda ind: ind.fitness.values)`
`stats.register("avg", numpy.mean)`
`stats.register("min", numpy.min)`
`stats.register("max", numpy.max)`
`pop, logbook = algorithms.eaSimple(pop, toolbox, cxpb=0.5, mutpb=0.2, ngen=100, stats=stats, halloffame=hof, verbose=True)`
`return pop, logbook, hof `

Next, we call our main fucntion and launch the evolution, the verbose argument tells it to output the stats on every generation. We can print and plot the data returned.

`if __name__ == "__main__":`
`pop, log, hof = main()`
`print("Best individual is: %s\nwith fitness: %s" % (hof[0], hof[0].fitness))`
`import matplotlib.pyplot as plt`
`gen, avg, min_, max_ = log.select("gen", "avg", "min", "max")`
`plt.plot(gen, avg, label="average")`
`plt.plot(gen, min_, label="minimum")`
`plt.plot(gen, max_, label="maximum")`
`plt.xlabel("Generation")`
`plt.ylabel("Fitness")`
`plt.legend(loc="lower right")`
`plt.show()`

`gen	nevals	avg     	min     	max     0  	5000  	-1594.87	-1925.37	-1223.061  	2990  	-1474.65	-1809.43	-944.3612  	3002  	-1359.97	-1703.98	-910.3833  	3075  	-1249.51	-1538.01	-824.0594  	2972  	-1149.89	-1489.25	-727.1765  	2981  	-1060.2 	-1382.69	-697.4366  	3023  	-979.424	-1275.18	-646.0557  	2913  	-908.14 	-1259.83	-587.9958  	3002  	-845.091	-1211.8 	-542.3999  	2948  	-785.046	-1168.46	-477.57110 	3025  	-730.939	-1120.84	-445.85611 	2954  	-680.548	-1047.33	-397.74712 	2978  	-637.332	-1058.83	-415.18313 	2948  	-596.095	-1030.59	-334.41214 	2949  	-553.915	-963.067	-325.76215 	3063  	-520.186	-971.488	-260.16316 	2957  	-484.304	-861.326	-260.16317 	3024  	-452.281	-955.302	-247.82 18 	2939  	-419.993	-887.574	-237.84619 	2968  	-391.706	-938.699	-212.66920 	3003  	-365.013	-837.23 	-175.26921 	3013  	-338.8  	-773.258	-154.68122 	2989  	-313.367	-846.464	-138.89123 	3033  	-290.654	-735.63 	-96.698224 	3026  	-268.264	-787.421	-85.047725 	3031  	-245.839	-739.401	-90.201226 	2947  	-227.527	-774.742	-84.934127 	3036  	-208.849	-725.79 	-54.537328 	2974  	-190.253	-687.666	-54.537329 	3035  	-173.176	-669.439	-47.352930 	3037  	-157.32 	-664.827	-20.144131 	3005  	-142.523	-685.218	-20.144132 	3038  	-130.647	-672.209	-20.144133 	3034  	-120.205	-669.157	-15.174634 	2934  	-109.288	-747.99 	-7.5623335 	2950  	-99.6878	-800.502	-1.2027536 	3019  	-89.866 	-697.724	-0.24984237 	3035  	-84.9399	-618.093	0.1574   38 	2983  	-79.2561	-697.156	0.250377 39 	3035  	-73.7699	-653.753	0.326752 40 	2998  	-68.6861	-733.131	0.420127 41 	2955  	-62.6192	-648.171	0.560875 42 	3066  	-60.6574	-608.463	0.560875 43 	2909  	-57.6323	-629.026	0.560875 44 	2949  	-55.46  	-691.81 	0.560875 45 	3046  	-57.5635	-658.029	0.646645 46 	3080  	-55.558 	-678.279	0.724602 47 	2955  	-58.4815	-717.23 	0.724602 48 	2985  	-55.3568	-674.355	0.724602 49 	3093  	-54.9966	-713.537	0.75858  50 	2933  	-57.5   	-726.837	0.765629 51 	2927  	-61.172 	-706.837	0.765629 52 	3005  	-62.4919	-736.047	0.818316 53 	2959  	-60.5965	-687.391	0.818316 54 	2987  	-57.9066	-716.047	0.819749 55 	3000  	-58.2882	-681.17 	0.835201 56 	2933  	-59.4707	-716.929	0.835201 57 	3007  	-57.7807	-660.257	0.835201 58 	3012  	-58.8604	-718.704	0.840593 59 	2953  	-57.6502	-734.808	0.842513 60 	3017  	-61.3473	-696.42 	0.844697 61 	2997  	-62.5703	-673.568	0.851746 62 	3050  	-60.4694	-695.629	0.884787 63 	2967  	-60.0114	-726.935	0.884058 64 	3046  	-57.294 	-730.838	0.884058 65 	2995  	-59.7632	-731.285	0.884058 66 	2968  	-60.4451	-696.111	0.884058 67 	3006  	-59.9833	-724.874	0.884058 68 	2999  	-57.2582	-651.619	0.884058 69 	3043  	-58.709 	-733.498	0.884058 70 	2972  	-57.7789	-683.27 	0.885206 71 	2940  	-57.991 	-608.739	0.885206 72 	2979  	-59.1018	-654.209	0.885206 73 	2985  	-64.2197	-647.472	0.885206 74 	2987  	-59.8589	-863.475	0.885206 75 	2976  	-62.3899	-786.916	0.885206 76 	2998  	-60.6564	-729.62 	0.885206 77 	3047  	-61.3213	-795.346	0.885206 78 	2963  	-58.8925	-735.821	0.885206 79 	2987  	-59.5023	-650.488	0.885206 80 	2976  	-62.3891	-751.52 	0.885206 81 	3024  	-60.2731	-703.15 	0.885206 82 	3005  	-56.9883	-751.113	0.885206 83 	3009  	-60.2615	-703.578	0.885206 84 	3003  	-60.0471	-621.138	0.885206 85 	3003  	-59.6234	-884.891	0.885206 86 	3017  	-61.21  	-670.964	0.885206 87 	3047  	-59.9479	-728.485	0.885206 88 	3019  	-63.7722	-697.654	0.885206 89 	3041  	-62.2421	-771.483	0.885206 90 	3051  	-60.3648	-717.254	0.885206 91 	3002  	-61.8827	-752.856	0.885206 92 	3020  	-60.6895	-614.276	0.885206 93 	2979  	-61.9008	-665.643	0.885206 94 	2941  	-57.8349	-725.984	0.885206 95 	3003  	-61.9024	-740.262	0.885206 96 	3057  	-61.4431	-669.255	0.885206 97 	2980  	-63.8181	-645.92 	0.885206 98 	3000  	-64.4953	-725.343	0.885206 99 	3002  	-61.1017	-769.52 	0.885206 100	2956  	-61.0198	-682.371	0.885206 Best individual is: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.10845951732475545, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.09090024122397655, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.013945053542042118, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1658076885870895, 0.0, 0.015302503323372574, 0.0, 0.010987439226133655, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.3663735881178046, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1089114620959789, 0.0, 0.0, 0.004518749886925355, 0.0]with fitness: (0.8852062433280787,)`

We can see the evolution of our population and as expected with each new generation our max fitness in the population increases until it reaches an approximate optimum point. We also see the best individual and the weights for the portfolio for this date.

#### Further improvements

As with all our approaches and walkthroughs, this is not the best implementation or even the only way to approach this problem. What this should do is allow you to get started and hopefully, by building on this, create your own evolutionary algorithm approach!

Some of the things you could do to improve your solution:

1. Include other constraints in the evaluation function
2. Since EAs only provide an approximate solution, we may want to use this as the initial point in more conventional gradient-based optimization approaches

This topic was modified 2 years ago 3 times by David

Topic Tags
Share: