Visualizing Seattle & Boston AirBNB Data:

Jonathan Vasquez
4 min readNov 13, 2020

Udacity’s Write a Data Science Blog Post

Overview:

As part of Udacity’s Data Scientist Nanodegree Program’s curriculum the first project is to utilize a dataset and create a blog post. For the project I choose to utilize the Seattle and Boston Airbnb datasets, that were provided by Udacity and readily available on Kaggle.

Furthermore, I’m suppose to pose at least three questions related to business or real-world applications of how the data could be used, create a Jupyter Notebook, prepare the data, gather necessary data to answer my questions, handle categorical and missing data, provide insight into the methods I chose and why I chose them, analyze, model, and generate visualizes. Then I’m to provide a clear connection between my business questions and how the data answers them in order to communicate my business insights. Moreover, I’m to create a Github repository to share my code and demonstrate data wrangling/modeling techniques, with a technical audience in mind. Lastly, I’m to create a blog post to share my questions and insights with a non-technical audience, hence this post.

Regarding the data for this project the Boston AirBNB dataset has 3585 rows, 95 columns, while the Seattle AirBNB dataset has 3818 rows, 92 columns. However, both datasets were not tidy and required data cleaning. Some of the information within the datasets are amenities, price, neighborhoods, bedrooms, beds, etc.

The following questions will be answered utilizing visuals:

  1. What are the top 10 amenities for Seattle and Boston?

As we can see from the visuals below the top 10 available amenities in Seattle and Boston are almost identical with wireless internet, heating, kitchen, and essentials rounding out the top 4, but an interesting significant difference is how high air conditioning ranks in Boston relative to Seattle. This is likely due to the fact that “[a]verage monthly temperatures vary by 11.5 °C (20.7°F) less in Seattle, Washington [than Boston, Massachusetts]. ” (Read more)

2. What are the top 10 potential revenue generating neighborhoods for Seattle and Boston?

As we can see from the visuals below Boston’s neighborhoods have a higher likelihood of generating more revenue overall relative to Seattle’s neighborhoods. This is further evident once Seattle and Boston datasets were combined and it was uncovered that Seattle’s highest earning potential neighborhood was ranked 6th.

3. What are the top 10 amenities ranked by their importance to price?

Furthermore, more important than just listing the amenities in is imperative to uncover their magnitude of importance. Thus, after combining both data sets I was able to build a Lasso Regression to select more significant features, and build a random forest model to show which features were most important. As we can see from the visual below far and away the greatest influence on price was the amount of bedrooms. This makes sense because as bedrooms increase the amount of guest increases thus the price that can reasonably be charged will increase.

Summary:

The following questions were answered:

  • What are the top 10 amenities for Seattle and Boston?
  • What are the top 10 potential revenue generating neighborhoods for Seattle and Boston?
  • What are the top 10 amenities ranked by their importance to price?

Moreover, while these questions were answered there are a plethora of more insights that could be gathered. With that said, I will eventually circle back to this project once I develop more Data Science skills from Udacity’s Data Scientist Nanodegree Program.

--

--