Level up your Data Visualizations with quick plot
10 hours ago, towardsdatascience

K-Means plot for SpotifyData Visualization is an essential part of a Data Scientists workflow. It allows us to visually understand our problem, analyses our models, and allows us to provide deep meaningful understanding to communities.As Data Scientists, we always look new ways of improving our data science workflow.Why should I use this over ggplot? Is their any differences? What are the benefits?Qplot allows sharper, shorter and more concise syntax for declaring ggplot visualizations. It is a ...

Playlist Classification on Spotify using KNN and Naive Bayes Classification
10 hours ago, towardsdatascience

https://unsplash.com/@usefulcollectiveOne day, I thought it would be cool if Spotify helped me pick a playlist when I like a song. The idea is to touch on the plus button when my phone is locked and Spotify add it into one of my playlists rather than library so that I don’t go into the app and try to pick a playlist that it suits well. This way, I wouldn’t have to choose a playlist among all of my playlists and I would just leave it to the algorithms. Then, I realized it makes a good side projec...

Mass Shootings and Terrorism
15 hours ago, towardsdatascience

Our obsession with small probabilities and rare eventsI started considering this article last month around the anniversary of the death of my father. Even with Christmas, the weeks leading up to and after the holiday are always a little somber. Thoughts of death and mortality intermingle with my children’s innocent excitement for Santa’s arrival and the welcoming of the new year. My dad passed away in 2001, a few months after September 11th. He died after a short battle with gastrointestinal can...

Graph Databases. What’s the Big Deal?
16 hours ago, towardsdatascience

Continuing the analysis on semantics and data science, it’s time to talk about graph databases and what they have to offer us.IntroductionShould we invest our precious time in learning a new way on ingesting, storing and analyzing data? With the touch on mathematics on graphs?For me the answer was unsure when I started my investigation, but after a little while, my answer was:https://medium.com/media/cf9dcc1b9f2e3a14c0425bfd27e118e1/hrefHere in this article, I’ll discuss some ideas and concepts ...

Experiment sample size calculation using power analysis
17 hours ago, towardsdatascience

If you use experiments to evaluate a product feature, and I hope you do, the question of the minimum required sample size to get statistically significant results is often brought up. In this article, we explain how we apply mathematical statistics and power analysis to calculate AB testing sample size.Before launching an experiment, it is essential to calculate ROI and estimate the time required to get statistical significance. The AB test cannot last forever. However, if we don’t collect enoug...

Mastering the Data Science Interview
17 hours ago, towardsdatascience

Mastering the Data Science Interview LoopIn 2012, Harvard Business Review announced that Data Science will be the sexiest job of the 21st Century. Since then, the hype around data science has only grown, with recent reports showing that demand for data scientists far exceeds the supply.However, the reality is most of these jobs are for those who already have experience in data science. Entry level data science jobs, on the other hand, are extremely competitive due to the supply/demand dynamics. ...

Is the Difference in Work Hours the Real Reason for the Gender Wage Gap? [Interactive Infographic]
17 hours ago, towardsdatascience

Every year, the Department of Labor issues a report on the pay gap between women and men.Women earn a median of $30,0001 per year, while men earn $40,000 per year. In other words, working women earn 75% of what men earn.But this gap doesn’t take into account the fact that on average, men work more hours than women. According to U.S. census data, men spend an average of 41.0 hours per week at their jobs, while women work an average of 36.3 hours per week.Many argue that gender discrimination expl...

Scrape Reddit data using Python and Google BigQuery
17 hours ago, towardsdatascience

An user friendly approach to access Reddit API and Google BigquerysourceReddit is one of the oldest social media platforms which is still going strong in terms of its users and content generated every year.Behind that age old user interface, is the treasure trove of information that millions of users are creating on a daily basis in the form of questions and comments.In this post, we will see how to get data from Reddit website using python and Google Bigquery in a step by step manner.To illustr...

Python for Data Science: From Scratch(Part II)
20 hours ago, towardsdatascience

Learning about Data Structures and important packages like Numpy and Pandas in Python.One word for this epic shot: PERSEVERANCE!This article is the second piece in the Python For Data Science Series. In case you haven’t gone through the introduction of Python(part 1), go ahead and skim through that article here. After knowing about the basics, its time to indulge in more challenging topics of Python. In this article, we shall be looking into Python’s usage for data representation and manipulatio...

Getting Started with Recommender Systems and TensorRec
20 hours ago, towardsdatascience

Prototyping a recommender system, step-by-step.Recommender systems are used in many products to present users with relevant or personalized items (food, movies, music, books, news, etc). To do this, they learn from users’ previous interactions with items to identify users’ tastes and improve future recommendations.This post will walk us through the prototyping of a new recommender system in Python using TensorRec including input data manipulation, algorithm design, and usage for prediction.You c...

Infection Modeling — Part 3
22 hours ago, towardsdatascience

Infection Modeling — Part 3Optimizing a vaccination strategy with network scienceIn part 1 of this series, we modeled the spread of a pathogen through a social network and determined that an R0 between 2–3 is most likely, with roughly 80% of the network becoming infected at some point. In part 2, we used a genetic algorithm (GA) to identify a vaccination strategy that minimizes the spread of the infection, assuming only 25% of the population can receive a vaccine. The resulting strategy lowered ...

Redlining: Mapping Inequality in Peer-2-Peer Lending using Geopandas — Part 2
22 hours ago, towardsdatascience

Redlining: Mapping Inequality in Peer-2-Peer Lending using Geopandas — Part 21930’s Redlining in US set the rules for nearly a century of real estate practice, racial inequality that so profoundly shaped cities that we feel their legacy to this day.Figure 1BackgroundIn Part 1 of this series we investigated the correlation between Redlining maps and today’s credit landscape by cross-relating today’s loan applications to old Redlining zip codes. We found some signs of unfair algorithms or historic...

Why spectrogram-based VGGs suck?
22 hours ago, towardsdatascience

Why do spectrogram-based VGGs suck?Me: VGGs suck because they are computationally inefficient, and because they are a naive adoption of a computer vision architecture.Random person on Internet: Jordi, you might be wrong. People use VGGs a lot!No more introduction is required, this series of posts is about that: I want to share my honest thoughts regarding this discussion, for thinking which is the role of the computer vision deep learning architectures in the audio field.Post I: Why do spectrogr...

Introduction to Logistic Regression
22 hours ago, towardsdatascience

IntroductionIn this blog, we will discuss the basic concepts of Logistic Regression and what kind of problems can it help us to solve.GIF: University of TorontoLogistic regression is a classification algorithm used to assign observations to a discrete set of classes. Some of the examples of classification problems are Email spam or not spam, Online transactions Fraud or not Fraud, Tumor Malignant or Benign. Logistic regression transforms its output using the logistic sigmoid function to return a...

My Machine Learning Journey and First Kaggle Competition
1 day ago, towardsdatascience

How i started Data Science and First Experience.Image taken from Pexel Gavin TracyBeginning of the JourneyAfter working as Electronic Engineer, I decided to change my career path to Data Scientist . To reach my Data Science career goal I have started to review Moocs about this field. Here the list that I found helpful in my journey,Intro to Machine Learning https://www.udacity.com/course/intro-to-machine-learning--ud120Machine Learning A-Z https://www.udemy.com/machinelearning/Machine Learning h...

Sixth Man of The Year: Data Acquisition
1 day ago, towardsdatascience

All of my eggs are currently in the Data Science basket, and as such I try to spend as much time as possible collecting little nuggets of advice from those who have been practicing data science for a while. After all — I’d much rather learn from other people’s mistakes than my own, that way I can make more advanced mistakes. There is one piece of advice that has been beaten to death at the hands of many different professionals. If you interact with Data Scientists on a daily basis, I’m sure that...

A Comprehensive List of Handy R Packages
1 day ago, towardsdatascience

Stuff I have found super useful for work and lifeWhether Python or R is more superior for Data Science / Machine Learning is an open debate. Despite of its quirkiness and not-so-true-but-generally-perceived slowness, R really shines in exploratory data analysis (EDA), in terms of data wrangling, visualizations, dashboards, myriad choices of statistical packages (and bugs) — so I always found it helpful to dual wield R and Python, especially with improved inter-operability using reticulate and rp...

Detecting malaria using deep learning.
1 day ago, towardsdatascience

Building a convolutional neural network to quickly predict the presence of malaria parasitized cells in a thin blood smear.Nightmare: Malaria (Source)Although the malaria virus doesn’t take the form of a mutant mosquito, it sure feels like a mutant problem. The deadly disease has reached epidemic, even endemic proportions in different parts of the world — killing around 400,000 people annually [1]. In other areas of the world, it’s virtually nonexistent. Some areas are just particularly prone to...

QuickBlarks
1 day ago, towardsdatascience

Is the Difficulty Bomb ExplodingI wanted to share with you (through a series of charts) what happens when one releases a world-class data scientist such as Ed Mazurek on fresh-baked Ethereum difficulty data.You get QuickBlarks (that’s a portmanteau of “QuickBlocks” and “R” in case you were wondering). If you don’t know about “R” and R Studio, you should. It’s amazing.With little to no explanation, I am going to copy and paste the “R” code right next to the chart used to create it. Ask Ed what th...

Quality over quantity: building the perfect data science project
1 day ago, towardsdatascience

credit: https://www.housetohouse.com/diamonds-in-the-rough/In startup lingo, a “vanity metric” is a number that companies keep track of in order to convince the world — and sometimes themselves — that they’re doing better than they actually are.To pick on a prominent example, about eight years ago Twitter announced that 200 million tweets per day were being sent on its app. That’s a big number, but it’s not as relevant as it might seem: a large fraction of these Tweets were sent by bots. Besides...

Next