What is selling on Amazon?

I have been interested for awhile now in selling goods through FBA. The idea is simple: Buy cheap products from Alibaba, have them shipped to an Amazon warehouse, create a listing on Amazon for the product, and profit. There are many articles and blogs on the internet on how to do this successfully. There are also data mining software packages that allow you to see what is being sold on Amazon. I thought is would be interesting to see what has been sold in the electronics department over the last month ending on 7/16/2018. I gathered this data from Jungle Scout and to my knowledge is data has not been studied before. The objective is two fold:

1. Is it worth it to sell electronics on Amazon?
2. Can data science be used to determine what the best product is to be sold on Amazon?

One of the main goals of a data scientist is to look at a dataset and determine what value can be extracted from it. This is done though conducting exploratory data analysis and looking for correlations in the data.

In [1]:
import pandas as pd
import os
import glob
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
In [2]:
currdir = os.listdir()

We have all of the data in the same folder as our Jupyter Notebook so we will load in the data and look at a few rows of the data.

In [3]:
path = 'Jungle Scout CSV Export - Mon Jul 16 2018 18_08_38 GMT-0500 (Central Daylight Time).csv'
df = pd.read_csv(path, skiprows=2, index_col=False)
df.head()
Out[3]:
ASIN Product Name Brand Seller Category Price Fees Net Weight (lbs) Product Tier Reviews Avg. Rating Rank Est. Monthly Sales Est. Monthly Revenue LQS Number Sellers
0 B0723599RQ Motorola DOCSIS 3.1 Gig-speed Cable Modem Mod... Motorola MTRLC LLC Electronics 158.00 27.9 130.1 2.0503 Standard (Large) 573 4.1 495 2539 401162.00 54 22
1 B01MSTB5KW Motorola MG7540 16x4 Cable Modem plus AC1600 D... Motorola MTRLC LLC Electronics 129.99 23.93 106.06 2.65 Standard (Large) 774 4.2 170 3855 501111.45 52 9
2 B01LXRSS36 Motorola MG7550 16x4 Cable Modem plus AC1900 D... Motorola MTRLC LLC Electronics 169.99 29.97 140.02 2.75 Standard (Large) 1149 4.3 3187 612 104033.88 53 1
3 B01JGT2JI6 Motorola MG7550 16x4 Cable Modem plus AC1900 D... Motorola Etech Galaxy Electronics 163.92 29.06 134.85999999999999 2.75 Standard (Large) 1150 4.3 1535 1280 209817.60 63 44
4 B07BRZ2KW5 Motorola MG7700 24X8 Cable Modem plus AC1900 D... Motorola MTRLC LLC Electronics 184.99 32.22 152.77 2.7492 Standard (Large) 17 4.7 2836 481 88980.19 53 2

We have several csv files that need to be combined into one file so we will do a list comprehension to append the files into a single file:

In [4]:
df = []
for i in glob.glob("*.csv"):
    data = pd.read_csv(i, skiprows=2,  index_col=False)
    df.append(data)
df = pd.concat(df)
In [5]:
df.describe()
Out[5]:
Rank Est. Monthly Sales Est. Monthly Revenue LQS Number Sellers
count 3.912200e+04 39122.000000 3.912200e+04 39122.000000 39087.000000
mean 9.151861e+04 219.173420 1.389535e+04 48.737795 8.062860
std 1.636627e+05 923.034722 8.544046e+04 13.742925 15.792671
min 1.000000e+00 1.000000 1.000000e+01 0.000000 1.000000
25% 7.230750e+03 10.000000 1.652802e+03 38.000000 1.000000
50% 2.493200e+04 42.000000 3.104305e+03 49.000000 2.000000
75% 8.108150e+04 137.000000 7.996000e+03 60.000000 7.000000
max 1.828111e+06 90709.920000 8.288314e+06 87.000000 245.000000

The describe function looks at all continuous columns in the dataframe and gives statistics about the columns. The estimated monthly revenue can gives a positive outlook on the sales for the last month. The average sales were \$13,895. Not bad for a side gig. However, when looking at the 50th percentile we can see that the average sales are \$3,104 per month which indicates that the top sellers get the vast majority of the sales. The average product has 8 sellers competing to sell the product.

Next:

Cleaning the Data