In this approach, I will be talking about a problem posted on the Auquan website. Go here to view the problem and code along: quant-quest.com/competitions/uk-data-science-varsity

If you want to get more detail on the deap implementation head to:
https://deap.readthedocs.io/en/master/


Introduction

There are many ways to solve optimisation problems and this is by no means the best or easiest way. Evolutionary algorithms are typically used to provide good approximate solutions to problems that cannot be solved easily using other techniques. Given that this problem has some difficult constraints, this is an approach that would be useful in generating an initial answer.

In general, for any optimization problem where the solution space we are searching isn’t easy to understand or has complex boundaries with many constraints, EA’s may provide good results. This does still require that we are able to develop an effective EA representation of that space.

The problem presented in the Global Recruiting Competition (see above) has many constraints and it’s difficult to understand how that affects the solution space, we will try solving it using an evolutionary algorithm-based approach.

If you are earlier in your data science journey, before you read this article you may want to look at a few articles on EAs.


Creating a Solution

Imports

We first get the required modules for our approach, we will use deap for EA implementation. (Learn more about deap: Paper | Github)

  • random gives us a way to generate random bits;
  • deap this implements the functionality for evolutionary algorithms in python, we’ll use the following submodules
  • base gives us access to the Toolbox and base Fitness;
  • creator allows us to create our types;
  • tools grants us access to the operators bank;
  • algorithms enables us some ready generic evolutionary loops.
 import random 
from deap import base, creator, tools, algorithms

We are going to try to get the optimal weights for only a single date, but this approach should work for any date in the problem dataset. I am using the optimisation parameters directly from pickle files.

import pickle

 ("ri", "Dt", "St" and "wi" refer to features of the supplied dataset - see the problem description for details) 

 f = open("ri.pickle","rb") ## predicted returns for this date 

 ri = pickle.load(f) 

 f = open("Dt.pickle","rb") ## Duration values for assets 

 Dt = pickle.load(f)

 f = open("St.pickle","rb") ## Spread values for assets 

 St = pickle.load(f) 

 f = open("wi.pickle","rb") ## weights in the index for assets 

 wi = pickle.load(f) 

 f.close() 

Type Creation

First step with deap is to create the types we are going to need for our algorithm. Usually, the types created are ‘the fitness’ and ‘the individual’. For this particular problem, we want to create a portfolio with the best returns possible. Thus we need a list of real numbers for weights and our fitness would be just the net returns (adjusted for penalties for voilating constraints).

Type creation is done by calling the function create in the creator module. This function takes two mandatory arguments and additional optional arguments. The first argument is the actual name of the type that we want to create. The second argument is the base class that the new type created should inherit from. Finally, the optional arguments are members to add to the new type. For a detailed understanding of deap types, check out this.

creator.create("FitnessMax", base.Fitness, weights=(1.0,))

creator.create("Individual", list, fitness=creator.FitnessMax)
 

The first line creates a maximizing fitness by replacing, in the base type Fitness, the pure virtual weights attribute by (1.0,) that means to maximize a single objective fitness. The second line creates an Individual class that inherits the properties of list and has a fitness attribute of the type FitnessMax that was just created. (We are using FitnessMax because we want to maximise returns, if this was a minimisation problem we would use FitnessMin). 

The classes we have created are made available in the creator module. We can directly instantiate objects of each class as follows:

ind = creator.Individual([1, 0, 1, 1, 0]) ## create an individual with these weights

print(ind)

print(type(ind))

print(type(ind.fitness)) 

Initialising the individual and population

An individual in the sense of evolutionary algorithms is just a set of weights. A population is just a set of multiple individuals. The goal is to find the best individual as the population evolves and the best individual is defined on the basis of fitness. 

The toolbox class in deap is intended to store functions with their arguments for evolving the population. We can register a function with two mandatory arguments, the alias to give to the function and the function it will be associated with. Any additional arguments can be given as an argument when the alias is called.

toolbox = base.Toolbox()

toolbox.register("attr_bool", random.random)

toolbox.register("individual", tools.initRepeat, creator.Individual, toolbox.attr_bool, n=len(ri))
 
toolbox.register("population", tools.initRepeat, list, toolbox.individual)

 

In this last block of code, we created a toolbox object and registered three functions:

  • This first one attr_bool creates a random real number in the interval [0,1][0,1] by using random we imported earlier. 
  • The second function individual will use the initRepeat function (from the deap toolbox) to fill an Individual class with what is produced by repeated calls to the previously defined attr_bool function, this basically initialises the weights for a random portfolio.
  • The final one does the same thing for the population function, which initialises our population.

For example, calling every function individually shows how it proceeds.

bit = toolbox.attr_bool()

ind = toolbox.individual()

pop = toolbox.population(n=500)

print("bit is of type %s\n" % (type(bit)))

print("ind is of type %s and contains %d bits\n" % (type(ind), len(ind)))

print("pop is of type %s and contains %d individuals\n" % (type(pop), len(pop))) 

Evaluation Function

In genetic algorithms, the evaluation function is where all the magic happens. This is the function that is used to define the fitness of an individual, this is where we define our objective. In the cell below we define our objective as the returns. We also apply penalties for voilating any constraints (These are defined in the problem copy).In this walkthrough, we have only made simple duration and spread constraints — it’s up to you to make the rest! 

Although the function below doesn’t have all the constraints implemented, it should be easy enough for you to do the rest. The good thing about evolutionary algorithms is that we don’t have to worry about things like gradients and hessian calculation, so we can have non-differentiable objective functions and we can have things like step functions. One example of this would be applying a very high penalty when a very important constraint is violated. For instance, in this setup, a penalty of 10000 would be very high and would almost certainly mean that the individual would not survive to the next generation. These are sometimes also called death penalties.

If you do decide to use an EA approach, your main task is to come up with this evaluation function to make sure the constraints are satisfied at least 80% of the time while maximising returns.

weight1 = 10  ## these are the penalty weights I use for our objective function, this is for spread

weight2 = 10  ## this is for duration

def evalPortfolio(individual, St = St, wi = wi, ri = ri, Dt = Dt):

import numpy as np

objective = np.dot(individual,ri) ## calculating returns

if ((np.dot(St,individual)/np.dot(St,wi))-1) >0 : ## if the spread constraint is violated add a penalty
 
objective = objective - weight1*((np.dot(St,individual)/np.dot(St,wi))-1) ##
 ## The above code subtracts the penalty from the fitness value,
 ## the higher the violation the more unfit the individual will be, we can use the values of weight1
 ## and weight2 to specify what constraint we value more. For example, if I keep weight1 as 10 but increase
 ## weight2 to 100 that will penalize the violations for duration constraint more

if ((np.dot(Dt,individual)/np.dot(Dt,wi))-1) >0 : ## if the Duration constraint is violated add a penalty
 
objective = objective - weight2*((np.dot(Dt,individual)/np.dot(Dt,wi))-1)
 
return objective, 

Genetic Operators

These are the operators that are used to evolve the population. If you want to learn how these work, look at the articles on EA’s above.

Registering the operators and their default arguments in the toolbox is done as follows.

toolbox.register("evaluate", evalPortfolio)

toolbox.register("mate", tools.cxTwoPoint)
 
toolbox.register("mutate", tools.mutFlipBit, indpb=0.10)
 
toolbox.register("select", tools.selTournament, tournsize=3)

 

(If you don’t want to go into details of how the population is evolved, you can skip this section.) 

The evaluation is given the alias evaluate. As this doesn’t have any other arguments apart from the individual itself we don’t need to add anything. The two points crossover function is registered the same way under the alias mate. We’re just going to use the predefined function from deap toolbox.

The mutation, for its part, needs an argument to be fixed (the independent probability of each attribute to be mutated indpb). Finally, the selection operator is registered under the name select and the size of the tournament set to 3.

We can, for example, mutate an individual and expect 10% of its attributes to be flipped.

ind = toolbox.individual()

print(ind)
 
print("\n")
 
toolbox.mutate(ind)
 
print(ind) 
[0.5828673696058406, 0.11366978696534202, 0.3375211617962608, 0.3998475660022206, 0.6608856743010275, 0.8803437895548394, 0.0411546208458744, 0.5374129121886653, 0.42636462233221684, 0.857497221679833, 0.44360394141642034, 0.5606787522881802, 0.3215982070032286, 0.514570845181715, 0.3032848172603626, 0.9550253885848993, 0.14695497774781818, 0.7261529437292629, 0.19772593773149327, 0.17496858916582325, 0.058915574639683554, 0.05282789654984488, 0.14229041732505954, 0.47347302527629187, 0.4239547330119139, 0.6261109729805046, 0.03632821035165801, 0.7562922249613215, 0.5981761054657581, 0.8951669524921041, 0.7969022215470213, 0.629130909604167, 0.7895227643622067, 0.05251624082969941, 0.474078511882579, 0.42510387579952913, 0.27176574523503405, 0.4887504917363862, 0.4180443419987684, 0.09724195363682164, 0.6463995607565507, 0.5250955563584663, 0.17999165178792353, 0.7993156155187991, 0.4306168784713159, 0.6271311547539816, 0.28114655910628805, 0.9629001274296468, 0.24255727043218245, 0.6417819388281502, 0.4782451132157759, 0.3992679686646209, 0.5472617220756029, 0.8861454710404537, 0.3033614411604343, 0.5473879486591839, 0.48891161310442965, 0.013477032287178004, 0.5849723466518983, 0.9862835912777634, 0.686561334269392, 0.5684763413298725, 0.747610241443788, 0.7027432225020004, 0.022220118118156607, 0.9049328730056511, 0.11989917307709574, 0.15352517436590252, 0.4443494107520093, 0.5630954514349847, 0.1385205591642642, 0.1501749362990964, 0.3029983430997525, 0.6660829511423043, 0.6281682337982342, 0.8299655681140754, 0.6522269482319345, 0.38467801110472155, 0.7580278230001196, 0.01960680567661144, 0.7210372348101194, 0.7802174779644195, 0.9636268112795919, 0.3704593418242361, 0.18941624475794683, 0.6784929267319626, 0.8883247242902804, 0.5051784786589903, 0.8406932615605996, 0.7631444098278283, 0.828015597902116, 0.44519027205471284, 0.287603157379565, 0.4735382634830282]
vs.
[0.5828673696058406, 0.11366978696534202, 0.3375211617962608, 0.3998475660022206, 0.6608856743010275, 0.8803437895548394, 0.0411546208458744, 0.5374129121886653, 0.42636462233221684, 0.857497221679833, 0.44360394141642034, 0.5606787522881802, 0.3215982070032286, 0.514570845181715, 0.3032848172603626, 0.9550253885848993, 0.14695497774781818, 0.7261529437292629, 0.19772593773149327, 0.0, 0.058915574639683554, 0.05282789654984488, 0.14229041732505954, 0.47347302527629187, 0.4239547330119139, 0.0, 0.0, 0.7562922249613215, 0.5981761054657581, 0.8951669524921041, 0.7969022215470213, 0.629130909604167, 0.7895227643622067, 0.05251624082969941, 0.474078511882579, 0.42510387579952913, 0.0, 0.4887504917363862, 0.4180443419987684, 0.09724195363682164, 0.6463995607565507, 0.5250955563584663, 0.17999165178792353, 0.7993156155187991, 0.4306168784713159, 0.6271311547539816, 0.28114655910628805, 0.9629001274296468, 0.24255727043218245, 0.6417819388281502, 0.4782451132157759, 0.3992679686646209, 0.5472617220756029, 0.8861454710404537, 0.3033614411604343, 0.5473879486591839, 0.48891161310442965, 0.013477032287178004, 0.5849723466518983, 0.9862835912777634, 0.686561334269392, 0.5684763413298725, 0.747610241443788, 0.7027432225020004, 0.022220118118156607, 0.9049328730056511, 0.11989917307709574, 0.15352517436590252, 0.4443494107520093, 0.5630954514349847, 0.1385205591642642, 0.1501749362990964, 0.3029983430997525, 0.6660829511423043, 0.6281682337982342, 0.8299655681140754, 0.0, 0.38467801110472155, 0.7580278230001196, 0.01960680567661144, 0.7210372348101194, 0.0, 0.9636268112795919, 0.0, 0.18941624475794683, 0.6784929267319626, 0.8883247242902804, 0.5051784786589903, 0.8406932615605996, 0.7631444098278283, 0.828015597902116, 0.44519027205471284, 0.287603157379565, 0.4735382634830282]

Evolving the Population

We evolve the population in the below function (main). It works by generating a population and giving it to the algorithm to be evolved. We are also going to employ some helpful introspection tools such as Statistics and a Hall of Fame. The Statistics object keeps tracks of various statistics of the population (as you’d expect), and the hall of fame keeps track of the best individuals that appear during the evolution. Once the evolution is finished the population contains the individuals from the last generation.

def main():
import numpy
 
pop = toolbox.population(n=5000)
 
hof = tools.HallOfFame(1)
 
stats = tools.Statistics(lambda ind: ind.fitness.values)
stats.register("avg", numpy.mean)
stats.register("min", numpy.min)
stats.register("max", numpy.max)
 pop, logbook = algorithms.eaSimple(pop, toolbox, cxpb=0.5, mutpb=0.2, ngen=100, stats=stats, halloffame=hof, verbose=True)
 
return pop, logbook, hof  

 Next, we call our main fucntion and launch the evolution, the verbose argument tells it to output the stats on every generation. We can print and plot the data returned.

if __name__ == "__main__":

 pop, log, hof = main()
 print("Best individual is: %s\nwith fitness: %s" % (hof[0], hof[0].fitness))


 import matplotlib.pyplot as plt
 gen, avg, min_, max_ = log.select("gen", "avg", "min", "max")
 plt.plot(gen, avg, label="average")
 plt.plot(gen, min_, label="minimum")
 plt.plot(gen, max_, label="maximum")
 plt.xlabel("Generation")
 plt.ylabel("Fitness")
 plt.legend(loc="lower right")
 plt.show() 
gen	nevals	avg     	min     	max     
0 5000 -1594.87 -1925.37 -1223.06
1 2990 -1474.65 -1809.43 -944.361
2 3002 -1359.97 -1703.98 -910.383
3 3075 -1249.51 -1538.01 -824.059
4 2972 -1149.89 -1489.25 -727.176
5 2981 -1060.2 -1382.69 -697.436
6 3023 -979.424 -1275.18 -646.055
7 2913 -908.14 -1259.83 -587.995
8 3002 -845.091 -1211.8 -542.399
9 2948 -785.046 -1168.46 -477.571
10 3025 -730.939 -1120.84 -445.856
11 2954 -680.548 -1047.33 -397.747
12 2978 -637.332 -1058.83 -415.183
13 2948 -596.095 -1030.59 -334.412
14 2949 -553.915 -963.067 -325.762
15 3063 -520.186 -971.488 -260.163
16 2957 -484.304 -861.326 -260.163
17 3024 -452.281 -955.302 -247.82
18 2939 -419.993 -887.574 -237.846
19 2968 -391.706 -938.699 -212.669
20 3003 -365.013 -837.23 -175.269
21 3013 -338.8 -773.258 -154.681
22 2989 -313.367 -846.464 -138.891
23 3033 -290.654 -735.63 -96.6982
24 3026 -268.264 -787.421 -85.0477
25 3031 -245.839 -739.401 -90.2012
26 2947 -227.527 -774.742 -84.9341
27 3036 -208.849 -725.79 -54.5373
28 2974 -190.253 -687.666 -54.5373
29 3035 -173.176 -669.439 -47.3529
30 3037 -157.32 -664.827 -20.1441
31 3005 -142.523 -685.218 -20.1441
32 3038 -130.647 -672.209 -20.1441
33 3034 -120.205 -669.157 -15.1746
34 2934 -109.288 -747.99 -7.56233
35 2950 -99.6878 -800.502 -1.20275
36 3019 -89.866 -697.724 -0.249842
37 3035 -84.9399 -618.093 0.1574
38 2983 -79.2561 -697.156 0.250377
39 3035 -73.7699 -653.753 0.326752
40 2998 -68.6861 -733.131 0.420127
41 2955 -62.6192 -648.171 0.560875
42 3066 -60.6574 -608.463 0.560875
43 2909 -57.6323 -629.026 0.560875
44 2949 -55.46 -691.81 0.560875
45 3046 -57.5635 -658.029 0.646645
46 3080 -55.558 -678.279 0.724602
47 2955 -58.4815 -717.23 0.724602
48 2985 -55.3568 -674.355 0.724602
49 3093 -54.9966 -713.537 0.75858
50 2933 -57.5 -726.837 0.765629
51 2927 -61.172 -706.837 0.765629
52 3005 -62.4919 -736.047 0.818316
53 2959 -60.5965 -687.391 0.818316
54 2987 -57.9066 -716.047 0.819749
55 3000 -58.2882 -681.17 0.835201
56 2933 -59.4707 -716.929 0.835201
57 3007 -57.7807 -660.257 0.835201
58 3012 -58.8604 -718.704 0.840593
59 2953 -57.6502 -734.808 0.842513
60 3017 -61.3473 -696.42 0.844697
61 2997 -62.5703 -673.568 0.851746
62 3050 -60.4694 -695.629 0.884787
63 2967 -60.0114 -726.935 0.884058
64 3046 -57.294 -730.838 0.884058
65 2995 -59.7632 -731.285 0.884058
66 2968 -60.4451 -696.111 0.884058
67 3006 -59.9833 -724.874 0.884058
68 2999 -57.2582 -651.619 0.884058
69 3043 -58.709 -733.498 0.884058
70 2972 -57.7789 -683.27 0.885206
71 2940 -57.991 -608.739 0.885206
72 2979 -59.1018 -654.209 0.885206
73 2985 -64.2197 -647.472 0.885206
74 2987 -59.8589 -863.475 0.885206
75 2976 -62.3899 -786.916 0.885206
76 2998 -60.6564 -729.62 0.885206
77 3047 -61.3213 -795.346 0.885206
78 2963 -58.8925 -735.821 0.885206
79 2987 -59.5023 -650.488 0.885206
80 2976 -62.3891 -751.52 0.885206
81 3024 -60.2731 -703.15 0.885206
82 3005 -56.9883 -751.113 0.885206
83 3009 -60.2615 -703.578 0.885206
84 3003 -60.0471 -621.138 0.885206
85 3003 -59.6234 -884.891 0.885206
86 3017 -61.21 -670.964 0.885206
87 3047 -59.9479 -728.485 0.885206
88 3019 -63.7722 -697.654 0.885206
89 3041 -62.2421 -771.483 0.885206
90 3051 -60.3648 -717.254 0.885206
91 3002 -61.8827 -752.856 0.885206
92 3020 -60.6895 -614.276 0.885206
93 2979 -61.9008 -665.643 0.885206
94 2941 -57.8349 -725.984 0.885206
95 3003 -61.9024 -740.262 0.885206
96 3057 -61.4431 -669.255 0.885206
97 2980 -63.8181 -645.92 0.885206
98 3000 -64.4953 -725.343 0.885206
99 3002 -61.1017 -769.52 0.885206
100 2956 -61.0198 -682.371 0.885206
Best individual is: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.10845951732475545, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.09090024122397655, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.013945053542042118, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1658076885870895, 0.0, 0.015302503323372574, 0.0, 0.010987439226133655, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.3663735881178046, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1089114620959789, 0.0, 0.0, 0.004518749886925355, 0.0]
with fitness: (0.8852062433280787,)

We can see the evolution of our population and as expected with each new generation our max fitness in the population increases until it reaches an approximate optimum point. We also see the best individual and the weights for the portfolio for this date.

Further improvements

As with all our approaches and walkthroughs, this is not the best implementation or even the only way to approach this problem. What this should do is allow you to get started and hopefully, by building on this, create your own evolutionary algorithm approach!

Some of the things you could do to improve your solution:

  1. Include other constraints in the evaluation function
  2. Since EAs only provide an approximate solution, we may want to use this as the initial point in more conventional gradient-based optimization approaches

David

I'm the community lead at Auquan. Feel free to message me if you get into any problems. A fun fact about me is I actually studied medicine at university.

0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *