Git & GitHub For Data Scientist: A beginners guide - Rise Networks

Rise Networks

Git & GitHub For Data Scientist: A beginners guide

My first day in a Tech lecture as a university “Freshman” already looked like a daunting proposition. Things only became worse when a few coursemates with more experience than I, started talking about Version Control Systems(VCS) and how it is Godsent. They were like, “Who listens to any self-acclaimed Tech guy who does not have a GitHub account?” I felt uncomfortable in my stomach (Feel free to laugh) and quickly tried to register a GitHub account (as a sharp guy). I was successful but things became even more confusing when I saw terms like “Repository!” on the App! Phew! I quietly took my L.

Hopefully, you have subscribed to our newsletters, and regularly follow up on our educational content at Rise Networks so you wouldn’t have the same fate as the guy in paragraph one befall you. Notice how I am not that guy anymore? Good! Even I have denied him. (Feel free to laugh) 

Let’s get more serious…

INTRODUCTION

“Talent wins games, but TEAMWORK and intelligence win championships.” 

– Michael Jordan

The need for efficient collaboration on Tech projects, and the need to have total control on versions of a code or programme makes Git and GitHub very essential tools for all programmers.

WHAT IS GIT AND WHY?

Git is a free and open-source distributed version control system designed to manage everything from tiny to extremely large projects with speed and efficiency, according to the Git website.

Including:

-Serving as a benchmark for industrial production

-To be used in a Hackathon

-To be used in the drafting of a book

-Keep lecture and practical notes in this notebook.

-There are a slew of other possibilities…

Git is different from GitHub in that it is a system that controls a project’s version control, whereas GitHub is a remote platform that hosts Git-based projects. 

This means that while you can use Git, you don’t have to utilize GitHub unless you want a backup of your repository on a different platform.

Your Very First Git Repository

When it comes to dealing with Git and GitHub, We won’t be using the Jupyter Notebooks we already have, to interface with Git and GitHub; instead, we’ll be using the command line/terminal. 

I use GitBash for this, especially for GitHub activities, because it allows me to interface with GitHub quickly and easily, but you may use your favourite term on Mac or Bash on Windows.

To create a project, we must first establish a directory or folder for it. 

We may do this on the command line by typing mkdir, followed by the name of the folder, which will create a folder on your machine.

mkdir my-data-science-project

We then use the cd command to navigate to that folder from the command line, which allows us to change the directory as follows:

cd my-data-science-project

We can use the git init command to initialize the project using Git so that it can be used to manage the project’s versions once we’re in that directory:

init git

You should now have a .git folder in your directory, which is now hidden. The local workspace should be the name of the project you just established.

Adding a commit to the repo

Of course, now that we’ve built the repository, we need to populate it! We may proceed to write the first few codes of our project using the command line by typing nano, which opens a text editor and prompts us to create a file. 

We can refer to this file as project.txt in our case:

nano project.txt

where you can go ahead and type in your codes or instructions. 

After that, use the commands ctrl o to save the file and ctrl x to exit it.

To ensure that we have a record of our initial work as it progresses, we need to add our changes to the stage before we can commit our changes (otherwise known as staging the changes). 

This is done with the git add command, which allows you to either specify the file names you want to add or use. to specify any changes, as shown below:

git add project.txt

The status command can then be used to check our stage:

git status

What will we see?

On master branch

No commits yet 

Changes to be committed: (use “git rm –cached …” to unstage) 

new file: project.txt

This notifies us that we have modifications to commit, namely the new file project.txt, but that no commits have been done thus far.

With this in mind, we can use the commit command to commit our modifications. 

The important thing to note here is that we may use the -m parameter to define a comment string that explains what you’re committing, any modifications you made, and why this is. 

This just ensures that you remember any significant changes you made so that if your code breaks or, in our example, if we wish to undo a chapter of our novel, we can refer to the message to determine which commit to return to. This is something we can do here.

git commit -m “start the project with an intro comment”

We’ve made our first contribution to the project here!

Adding to a remote repository

Of course, having all of the changes on our local machine is great for version control (no more final final final version 27.txt files! ), but what if we want to cooperate with others or have a remote backup if something goes wrong? 

We can leverage GitHub’s magic for this.

To do so, we’ll need to sign up for GitHub and establish a new repository. After you’ve set up your account, you’ll need to establish a new repository, but make sure not to initialize it with either a README, licence or gitignore files as this will create errors.

This should be done with a README, license, or gitignore file to avoid errors. 

But don’t worry; these files can be created after you’ve pushed to the remote repository.

We only need to link the remote repository to our own because we’ve already committed to our own local repository. 

We must utilize the repository url, which can be found here:

Author’s photo

Add origin as: on your project, which we can then link through our terminal using git remote.

add origin to git remote

It checks to see if you’re using your own URL.

We can then use git remote -v to check that the connection was made, which should print the remote url that you specified.

We’ve now established a connection to our remote repository!

However, one difficulty here is that we haven’t specified where we want to push to in the remote origin. 

We have a branch in our local repository, but none in our remote repository. 

We can specify the following:

git push –set-upstream origin HEAD:master

This will instruct the repository’s current head to push to the master branch within our repository.

Then we may push the HEAD to our remote repository for the first time by specifying:

git push origin HEAD

This means that in the future after we’ve added all of our files and committed them, all we have to do is run git push, which will take care of everything.

which is pleasant and straightforward! You can double-check that your files are on the remote repository by heading to your GitHub account’s repository and looking for them. This will allow you to collaborate with other programmers as well as store your work on a remote repository!

So, you’ve made your first local repository, committed your files, and then pushed them to a remote repository.

 

0
Would love your thoughts, please comment.x
()
x
Scroll to Top

Download Data Science Career Guidance Packet

Provide the following information to download the data science career guidance packet