The team is in different places on their respective machine learning journeys – from “Kaggle regulars” to “yet to write a single line of Python.” This project would certainly go smoother if we all were Machine Learning gurus, but that would be no fun. Pursuing an innovation project is as much about learning something new, as it is about building something new. The fun is in wrangling the project’s unknown and finding yourself getting smarter in the process.
Along those lines, I think it is crucial for project team members to be each other’s biggest fans. Projects are hard. Innovation projects are even harder. Team success comes from team support.
So it is important for our team to encourage each other, especially as we tackle the new-to-us subject of ML. Thus, at the end of each team meeting, we close with the following question “What interesting thing have you learned lately?”
Personally, I find this discussion incredibly motivating. It makes me consider subjects that I had not before. It helps me discover new resources to help me get smarter.
And it encourages me. I am not the only one trying to make sense of this whole ML thing. And I am not alone in getting excited about dorky things like data and algorithms.
Below you will find a list of the things – online courses, books, podcasts and more – that I have found helpful. Many were recommendations from the team. If you have additional suggestions, please leave them in the comments. Let’s support each other!
I have fallen head over heels for online courses or MOOCs. The price is right, often cheap or free. They are easy to use, so portable that I have been known to watch lectures on the sidelines of my kids’ soccer practices. And, most importantly, they are effective. I am so much smarter thanks to MOOCs.
Here is a rundown of what I have taken so far:
- A few of my colleagues crowd-shamed me into taking Andrew Ng’s Machine Learning class on Coursera. They hooked me with, “It’s only math.” True to their word, there was a lot of math. Thankfully, a hundred years ago I spent decent time studying linear algebra and data modeling and that came back pretty easy. As they say, just like riding a bike. What they forgot to mention was that there also was coding homework. Writing code did not come back so easily. That, unfortunately, was more like the first time you rode a bike after taking off the training wheels. Crash and burn. Repeatedly. But I muddled through and actually passed the class. Now my eyes don’t gloss over when the team throws out words like “logistic regression” and “backpropagation in the neural network.” If I had to do it again, I would probably just skip the homework and just concentrate on the lectures and quizzes. While coding is always a good skill to practice, the homework is in Matlab which does not seem as easily transferable to ML as Python or R.
- Ng’s class hooked me on MOOCs, and as soon as I was done, I signed up for John Hopkin’s Executive Data Science Certification (also on Coursera). The class is high-level and approaches data science from a business perspective. The five part series’ classes are entitled “A Crash Course in Data Science,” “Building a Data Science Team,” “Managing Data Analysis,” “Data Science in Real Life” and “Executive Data Science Capstone.” It was a good series of classes, but I was hungry to get hands-on, so I ended up a little disappointed. In retrospect, this would have been a good series to start with. That said, the Capstone class was FUN. It was completely different from other online courses. The best way I can describe it as “Choose Your Adventure in Data Science” where you make business decisions based on dilemmas posed by real Zillow employees via video. Choose the wrong path and your project flops. Choose the right path and discover data gold!
- Once we settled on using Azure for this project, I started poking around to find out what training Microsoft offered. I was excited to learn they offer an entire Data Science Professional track on edX. It is time intensive but, so far, totally worth it. I will detail the courses within the track below.
Microsoft Data Science Professional Certification
There are ten required courses for certification, with each class taking between 16-32 hours to complete. There are actually sixteen classes in total, as some skills allow you choose between multiple course choices (such as Python or R). I am about 60% complete. Here is a run-down:
- Data Science Orientation – My first consulting internship (I will not give the year, but let’s say it was not in this century) involved a copious amount of data crunching in Excel. By the end of the summer, I could pivot-table and VB script LIKE A BOSS. So, I thought this first course in the Microsoft Data Science certificate (“Use Microsoft Excel to explore data“) would be a breeze. Ha. Microsoft has added a lot of features since my “Summer of Spreadsheets.” Things like this and this. Note to self: It’s probably good to brush up on your software skills every decade or so.
- Querying Data with Transact-SQL – Back in olden times, I managed an Oracle database team and subsequently became pretty proficient in PL/SQL. However, I had not written a lick of SQL in at least a decade. So I was pleasantly surprised that all the OUTER JOINs and INNER JOINs came back pretty easily. I guess it is true that you retain skills more or less forever.
- Analyzing and Visualizing Data – Two options here, Excel or PowerBI. I chose the later since I had never used it before. The last time I used data visualization tools (a hundred years ago, see above bullet point), they were pretty awkward and exasperating to use. In the ensuing years, I had little use for reporting tools and lost track of where the market was. Seeing and using PowerBI felt like I went from driving a horse-and-buggy straight to a Tesla Roadster. I have already used it for some other work stuff unrelated to this project.
- Essential Statistics for Data Analysis using Excel – The reviews for this course were not great, and I was anxious to get to the ML stuff, so I skipped this one. I will circle back to it eventually, as it is required to earn the Data Science certification.
- Introduction to Python for Data Science – Two options here, R or Python. I chose Python since that is what our project is based on. This was a fun class, not just by programming course standards! The instructor is wry, and I found myself smiling at some of his offhand comments. The content is presented simply but effectively. I thought this was the best class in the series so far.
- Data Science Essentials – I will admit, the lectures were kind of painful. If you had a Stats class back in college, you would probably find it tedious as well. However, the lab exercises were excellent. They allow you to get hands on in AzureML. I found that the labs made things click for me. By the end of the class, I had an understanding of how to digest data and develop an algorithm in AzureML, as well as how to share it via an API.
- Principles of Machine Learning – I am currently taking this class. The biggest thing I have learned so far is that you can play the lectures at 2x speed. Game changing. More content in less time. Plus the lecturers are slow talkers, and I am admittedly impatient, so it allows me to get through the class without yelling at my computer “Hurry it up!” In all seriousness, there is good content about the principles behind ML. But if you have taken Andrew Ng’s class, it will feel repetitive.
- Programming with Python for Data Science – Two options again, R or Python. I will take Python. I hope the same guy who taught the Intro class teaches this one.
- Applied Data Science – Four(!) choices here: Applied Machine Learning, Implementing Predictive Solutions with Spark in HDInsight, Developing Intelligent Applications, and Analyzing Big Data with Microsoft R Server. I am planning on doing the Applied Machine Learning class, but they all seem interesting. I may audit the others.
- Capstone Final Project
All of these titles were recommended to me by someone awesome, and they all seem like awesome books. But I am a total slacker and have not finished any of them. Not awesome. I did download them to my Kindle. That counts for something, right?
- Algorithms to Live By: The Computer Science of Human Decisions Brian Christian and Tom Griffiths
- Business Analytics for Managers Gert H. N. Laursen
- Competing on Analytics: The New Science of Winning by Thomas Davenport and Jeanne Harris
- Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We really Are by Seth Stephens-Davidowitz
- Machine Learning for Dummies by John Paul Mueller and Luca Massaron
- The Master Algorithm by Pedro Domingos
- Predictive Analytics by Eric Siegel
- Predictive Analytics for Dummies by Anasse Bari and Mohamed Chaouchi
- The Download – Produced daily by MIT Technology Review, it’s a well-curated email digest of articles across the web on Artificial Intelligence, Big Data, and other hot tech topics.
- Partially Derivative – One of our team members turned me on to this podcast. Called the “Car Talk of the Data Science community,” the hosts are real and the content accessible. Even a newbie like me can understand, and enjoy, the topics.
- Significant Digits – Another newsletter I have been digging. It’s short and sweet and full of numbers. What more can you want?
We finally buttoned up the infrastructure. Now, we get to revisit the fun stuff – ML. August and September will be what I call “fussing with algorithms” months. Other status updates:
- What’s making me feel smarter – When we started this project I only had a vague concept of what ML was. Now I feel like I could have a semi-intelligent conversation with a lay person on the subject. Just listing out the ML resources above, made me feel proud. I put in the work to learn this stuff, and it’s paying off.
- What’s making me feel dumb – I still do not feel confident that I can have a semi-intelligent conversation with an expert. There is just so much to learn about artificial intelligence – and it all keeps changing. Just yesterday, I saw that Andrew Ng came out with a new course on Deep Learning. Something else I need to learn about… I feel like I will never catch up!
- What’s keeping me up at night – The more I grow in my knowledge of ML, the more insecure I am about our current plan. My self-talk: “Should we have done this instead of that? Should we do this instead of what we planned? How about that?” Sometimes a little knowledge is a bad thing 😉 I know every project manager goes through periods of self-doubt, no matter what the technology platform. Perhaps I should take a dose of my own medicine and give myself some self-encouragement…
This is the sixth installment of my real-time case study on my first AI project. I plan to share what we are working on, what is going well, what is sucking at the moment – everything – as it happens.
My hope is by sharing our project’s small victories and painful bruises, you will be encouraged to tackle a project that scares the sh?! out of you too.