Have you found learning at home difficult? Most of us are in the same boat – there are too many things to juggle during these tumultuous times and learning has, contrary to our initial expectations, taken a back seat.
So how can we get back on track? How can we combine our data science learning with practical experience?
One key thing that has helped me immensely is picking an open-source data science project and running with it. This not only helps me understand the key areas I need to improve on but also shows me the way forward.
And these projects aren’t your run-of-the-mill data science projects. These are specific projects that tackle a certain data science sub-field, such as computer vision, web analytics, and so on. The project could be a dataset, a state-of-the-art library that has brought the data science field forward, or even an open-source analytics tool.
So, pick a project that intrigues you and start working on it today!
6 Open-Source Data Science Projects to Try During this Lockdown Period
Open Source Computer Vision Projects
Thanks to the power of PyTorch, we’re seeing a slew of awesome use cases in the computer vision space this year. Here, I have picked out a few outstanding computer vision projects you’ll love exploring and diving into.
This is an exquisite use case of computer vision. Converting an image into a 3-dimensional photo required sophisticated and in-depth knowledge of tools such as Photoshop at one point in time. Now, thanks to the advances in deep learning and computer vision, we can perform this transformation in just a few lines of code!
This project, open-sourced on GitHub, does exactly that. It takes a single RGB-D input image and converts it into a 3D photo. If you prefer deep learning terms, then this is “a multi-layer representation for novel view synthesis that contains hallucinated color and depth structures in regions occluded in the original view”.
Check out an example of what you can do using this framework:
Pretty awesome, right? This project, as you might have guessed already, has been done using PyTorch. That’s a framework you should really start getting familiar with:
This is a sweet side project to work on if you don’t have a lot of time on your hands. It does what it says on the box – you give the model an input image, and it’ll transform that into a cartoon version:
Can you take a guess as to what computer vision concept is behind this project? Yes – Generative Adversarial Networks (GANs). I am truly amazed at the rapid advancements we’ve seen in GANs since it was open-sourced in 2014 to the community. From CycleGANs to StarGANs, there’s no shortage of frameworks you can pick up and work on.
The developers behind this photo-to-cartoon project have open-sourced a pretrained model to help you quickly load and execute this on your machine. I have seen a few attempts at this before but this is the most realistic transformation I’ve come across.
Object detection frameworks have seen remarkable progress in recent years. We have gone from generating simple bounding boxes on static images to tracking dynamic objects in videos. That’s the power of computer vision.
However, progress in uniting the concepts of object detection and re-identification has been slow (to say the least!). In this fascinating study, the researchers present a simple baseline to address this gap using one-shot multi-object tracking.
Check out their model in action:
The baseline model they have open-sourced outperforms the state-of-the-art on public datasets at 30 fps. You can find both the code and research paper on the link I have mentioned above
Other Awesome Open Source Data Science Projects
I have curated a list of miscellaneous open source data science projects here, from audio generation to sports analytics. Have a crack at your favorite and enjoy the learning experience!
I clicked on this project as soon as I saw OpenAI in the headline. I’m a big fan of their work, and I appreciate their stance on open-sourcing the major developments to the general data science community. Who doesn’t love GPT-2?
Jukebox, as music fans will intuitively understand, is a neural network model that generates music with singing in the raw audio domain. OpenAI has open-sourced the model weights and code, along with a tool to explore the generated samples.
Here’s how Jukebox works – we provide the genre, artist, and lyrics as input, and the neural network gives us a new music sample produced from scratch. The range of music Jukebox can generate is staggering in its scope. This is a fascinating project to work on!
You can see (and hear) Jukebox in action on OpenAI’s site. And you can also check out Analytics Vidhya’s articles on working with audio data:
- Getting Started with Audio Data Analysis using Deep Learning
- Generate your own Music using Deep Learning
Do you use web analytics tools like Google Analytics to track your site’s performance? The issue with these tools is that there is no privacy for your organization. Additionally, you might need to fork out some money if you want the premium features. Not ideal for everyone, then.
These are the key gaps ShyNet aims to bridge. Here’s how the developers put it:
“You host it yourself, so the data is yours. It works without cookies, so you don’t need any intrusive cookie notices. It collects just enough data to be useful, but not enough to be creepy. It’s open source and intended to be self-hosted. And you may even find the interface easy to use.”
Here’s a sample screenshot of ShyNet’s default homepage:
And if you’re wondering what key metrics ShyNet can give you, your wait is over:
- Page load time
- Bounce rate
- Operating system
- Geographic location & network
- Device type
Keep in mind that ShyNet in its current format is great if you have a small or medium-sized business. It might not be ideal to use if you’re in a big firm. The GitHub repository I have linked above contains a comprehensive run-through of how ShyNet works and how you can start using it.
I recommend going through the below in-depth guide to learn about the world of digital marketing (of which web analytics is a part):
This is a personal favorite. I’m a huge football fan and have been delving into the world of sports analytics for quite some time now. Progress in this field has been far slower as compared to other industries but in the last couple of years, teams and franchises are waking up to the power of analytics and data science.
American sports are way ahead of other countries in terms of progress and adaptability but European football clubs are starting to finally play ball. Liverpool, for example, relies heavily on a data-driven approach from top-to-bottom, including planning their recruitment strategy.
So, if you’re a sports fan and want to dabble into the world of analytics, this is the perfect open source project for you.
The GitHub repository contains a plethora of resources to get you started, including:
- Resources and suggestions for technical skills worth having for work in football analytics
- A collection of Python tutorials that showcase how to work with football datasets
- Research papers and articles about state-of-the-art developments in football analytics