Behind the Algorithm: Bitsapiens
Hey guys, can you give me a little bit of information about your background?
Aditya: I’m Aditya, I’m doing an integrated bachelors in computer science and a masters in chemistry from BITs Pilani. I started being interested in data science after I competed in an Auquan competition alongside my summer internship last year.
This gave me an opportunity to learn about the wonders of data science and I had an insight into how data science will change society. I’ve mostly self-taught myself using Auquan’s platform and other competitions.
Nikhil: I’m Nikhil, I study the same degree as Aditya, but instead of chemistry I study chemical engineering. I came to learn about data science in my third year. As this has happened I became more interested in python and took a course in machine learning, to learn more about how you can use these techniques to make more sense of the data.
Lots of companies have lots of data and need to make decisions based on this data. So, I’m just learning to make decisions for that data. Again, I use multiple online sources.
You’ve both mentioned that you’ve used online resources to learn, would you be able to share some of these?
Aditya: I have particularly used a course on Udacity called “AI in trading” ( https://eu.udacity.com/course/ai-for-trading--nd880 ). It is a full month long and was really helpful to learn the fundamentals about how markets and trading works. Data machine learning mastery was really useful as well. Also, I think there are loads of good Medium blogs, including those written by Shub and Chandini [Auquan founders] ( https://medium.com/auquan ).
Nikhil: I have used DataQuest. Also, the python for trading course on Data Camp that is part of the Auquan Winter Training Program was really good. There are also machine learning courses on AWS.
Given that you both self-taught yourselves data science, when did you decide to start doing competitions?
Aditya: For me, it was after I’d finished my Udacity course. I wanted to test out the skills I had learnt and see what I could do. Fortunately, there was an Auquan challenge that came up right after I had finished so I gave it ago.
Nikhil: When I finished each module on the course on DataQuest there were mini problems to tackle, after a couple of these problems I realised that I could probably use the same skills in competitions. This would give me a better idea of how to work with larger datasets.
Ok, What about this competition? What did you think when you first started this problem?
Nikhil: Problem one had 2 data sets. G1 made us quite confused initially because there wasn’t much supporting information. Initially, we also thought the data and features might be too difficult for us to tackle head-on. Because of this, we decided to put in time cleaning the data and trying to visualise and understand it better. Once the worksheets came out we got a much better idea of how to use the features and manipulate certain variables.
Is there any advice you would give someone trying to tackle this (or a similar problem)?
Nikhil: The first most important thing is the data.
Aditya: I completely agree. Its really important to take time to clean the data, visualise it and understand it. What does it look like? Can you reprocess it and make some graphs to see how it is behaving?
Because we’re self-taught, we’ve become really comfortable in trying out different models and learning on the fly so that part isn’t really that daunting. Before you can do that though you need to understand things like how many data points, how many features are you working with? This will help you work out what models are likely to be effective.
Is there anything in particular, you feel you learnt from this competition?
Aditya: I think that normally when you work on problems with online courses its a very controlled environment and the problems are specifically designed to help your learning. In these competitions, the data is rawer and you get a much more hands-on experience.
Nikhil: Adding to that. When I used to do the Dataquest problems they would tell you when you’ve solved a certain step and move you on to the next task. However, in real problems, it isn’t like that and you have to work out for yourself when to move on and when to do something different. There’s a lot of learning done from searching google and other websites for information. About 50% of our time would be spent looking up how to do things.
Good to know! Where would you go to look for this information?
Nkhil: Math overflow and lots of Medium articles. On the new challenge [The Cricket World Cup Challenge] — Crickalytics is a good source. There are also lots of documentation available for new python functions. Reading this allows you to learn about the hyperparameters which are key in producing the best model.
Aditya: Reading the documentation really helped me. Python has so many libraries that we’re not aware of all the functionality. It’s not until you read the functionality (e.g. for scipy). These libraries are very famous but most users don’t explore the depth of functionality. It’s not until you read into the documentary that you learn how much specialised stuff you can do with a specific module.
Nikhil: Normally people don’t look at the hyperparameters because the default settings work pretty well for most data. But if you want to get the best results you have to change these and most people don’t look.
That’s really useful for people starting out. Finally, in one sentence, is there anything you would do differently?
Nikhil: I think I would spend more time on the data. The data was time series, but I’ve seen our model was slightly skewed towards some features. I’d like to correct that.
Aditya: After the competition, I’ve learnt about a load more models that I would like to try.
If you think you can take these insights and take the top spot in our next competition, check out: https://quant-quest.auquan.com/competitions