Background
This is a simple project in R demonstrating how to calculate the
average customer lifetime value (CLV) for an example Amazon business
using Amazon-provided business reports.
CLV is a useful metric that lets a business estimate the average
amount that a customer spends on the business over the course of time.
Some businesses have products or business models with strong customer
repurchase rates (think for example a supplement company or an apparel
company with strong brand loyalty), whereas others will have a less
frequent relationship with their customers. Knowing the lifetime value
of a customer could help a business decide whether the cost of acquiring
a new customer justifies its current marketing spend. It could also help
the business create particular financial projections or serve as a
metric for customer loyalty.
Set up initial libraries:
library(tidyverse)
library(tidyr)
library(dplyr)
library(janitor)
library(readr)
Obtain the relevant reports:
The appropriate sales report needed for this project can be obtained
from one’s Amazon Seller Central account: Reports –> Fulfillment
–> Amazon Fulfilled Shipments
Download reports covering at least 1 year of sales. While the project
below uses ~1.5 years of reports (the maximum available), it may be
easier to stick to 1 year of sales data for ease in interpreting the CLV
number.
For our project, we obtained reports from 12/12/2020 - 5/31/2022.
Reports were generated for 1 month intervals. There were 18 final output
files, and each file was given a date-descriptive name such as:
6_16_21_to_7_16_21.csv 7_17_21_to_8_16_21.csv 8_17_21_to_9_16_21.csv
Each file contain the following columns: Amazon Order Id Merchant
Order Id Shipment ID Shipment Item Id Amazon Order Item Id Merchant
Order Item Id Purchase Date Payments Date Shipment Date Reporting Date
Buyer Email Buyer Name Buyer Phone Number Merchant SKU Title Shipped
Quantity Currency Item Price Item Tax Shipping Price Shipping Tax Gift
Wrap Price Gift Wrap Tax Ship Service Level Recipient Name Shipping
Address 1 Shipping Address 2 Shipping Address 3 Shipping City Shipping
State Shipping Postal Code Shipping Country Code Shipping Phone Number
Billing Address 1 Billing Address 2 Billing Address 3 Billing City
Billing State bill-postal-code bill-country Item Promo Discount Shipment
Promo Discount Carrier Tracking Number Estimated Arrival Date FC
Fulfillment Channel Sales Channel
As of this post (2022), several of the columns are no longer
populated with data by Amazon out of concerns for customer privacy,
including buyer name, phone number and address.
The 18 CSV files are joined together under a variable named df:
temp <- list.files(pattern="*.csv")
myfiles <- lapply(temp, read_csv)
df <- bind_rows(myfiles)
We have a table that includes 60,030 entries and 48 columns.
Clean the titles to make them easier to work with:
df <- clean_names(df)
View(df)
Export the joined single file for convenient use in the future
(change the path to one of your own choosing):
write_csv(df, "/Users/YourName/Desktop/FileName.csv")
Converting item_price column to a single currency:
The Amazon seller account we are working with sells to customers in
the US, Canada and Mexico, and therefore the order amounts are given in
USD, MXN and CAD, which can confuse our summation calculations unless
they are converted to a single currency.
We first confirm that only USD, MXN and CAD are represented in the
dataset:
df %>% filter(currency !="CAD" & currency !="MXN" & currency !="USD")
Let’s use the case_when and round functions to convert our values
into a USD value, which will be put in a new column in a new dataframe
we will call new_df. Note that the conversion values given below reflect
the rates at the time of writing this. Search online for the latest
conversion values.
new_df <- df %>%
mutate(item_price_USD = case_when(
df$currency == "USD" ~ df$item_price,
df$currency == "CAD" ~ round(df$item_price * 0.794585, digits = 2),
df$currency == "MXN" ~ round(df$item_price * 0.051111, digits = 2)))
View(new_df)
Find a unique identifier for customers in the reports:
As noted above, several columns are no longer populated with data by
Amazon out of concerns for customer privacy, including buyer name, phone
number and address. Without this unique identifier information, we will
need to be creative to determine each unique customer. We can do so by
referring to the buyer email address, an Amazon-generated email address
that is not the actual buyer’s email address, but an Amazon-server-tied
email address that appears to be consistent across orders when the same
user purchases.
To verify, try searching a few of the provided Amazon customer email
in our table to see if they are used for more than one order. This would
indicate that the emails are linked to a unique customers, and not
created for each order. When you identify an email that is linked to
more than one order, you can verify it is the same customer because the
customer’s address city and zip code will match, this address
information being the limited customer-specific data that the Amazon
report DOES provide.
To test, try a few emails from the table and replace them with the
the email in this code snippet:
Let’s identify the number of unique buyers over the time period of
our data (1.5 years) by counting the unique email addresses:
unique_buyers <- length(unique(new_df$buyer_email))
unique_buyers
[1] 37796
We have been able to determine that there are 37796 unique customers
represented in our table.
Find the total sales over this time period by summing the
item_price_USD column:
total_sales <- sum(new_df$item_price_USD)
total_sales
[1] 876787.1
Over this time period our example Amazon account has sold 876,787.10
worth of products.
Because data in our report is given on each item sold and not each
order, let us derive the number of unique orders over this period:
unique_orders <- length(unique(new_df$amazon_order_id))
unique_orders
[1] 47788
We find that our example account had made 47788 unique orders during
the period of 1.5 years.
Determining the average order value:
avg_order_value <- round(total_sales/unique_orders, digits=2)
avg_order_value
[1] 18.35
We find that the average order value for this account is $18.35.
Calculating the average order frequency for our customers (total
orders / unique customers):
order_freq <- round(unique_orders/unique_buyers, digits=2)
order_freq
[1] 1.26
We find that the customers of this Amazon business have an average
order frequency of 1.26 orders over the period of 1.5 years.
Calculating the average customer value (average order value / order
frequency):
avg_cust_value = round(avg_order_value*order_freq, digit=2)
avg_cust_value
[1] 23.12
The average customer value for the given data is 23.12, indicating
that this is the amount we expect our average customer to spend over a
period of 1.5 years. If data for only 1 year was selected, then this
figure represents the customer’s value over just one year. To determine
the lifetime value of our customer, we make a reasonable
estimate given our knowledge of the business that the average customer
engages with this business for 3 years. That would make the total
average lifetime customer value equal to $46.24. Our estimate of 3 years
will vary from business to business, and business stakeholders will be
in the best position to make this determination
---
title: "Customer Lifetime Value from Amazon Seller Data (A Project in R)"
author: "F Abdullah"
date: '2022-06-06'
output: html_notebook
---

## Background

This is a simple project in R demonstrating how to calculate the average customer lifetime value (CLV) for an example Amazon business using Amazon-provided business reports.

CLV is a useful metric that lets a business estimate the average amount that a customer spends on the business over the course of time. Some businesses have products or business models with strong customer repurchase rates (think for example a supplement company or an apparel company with strong brand loyalty), whereas others will have a less frequent relationship with their customers. Knowing the lifetime value of a customer could help a business decide whether the cost of acquiring a new customer justifies its current marketing spend. It could also help the business create particular financial projections or serve as a metric for customer loyalty.

#### Set up initial libraries:

```{r message=FALSE, warning=FALSE}
library(tidyverse)
library(tidyr)
library(dplyr)
library(janitor)
library(readr)
```

#### Obtain the relevant reports:
The appropriate sales report needed for this project can be obtained from one's Amazon Seller Central account:
Reports --> Fulfillment --> Amazon Fulfilled Shipments

Download reports covering at least 1 year of sales. While the project below uses ~1.5 years of reports (the maximum available), it may be easier to stick to 1 year of sales data for ease in interpreting the CLV number.

For our project, we obtained reports from 12/12/2020 - 5/31/2022. Reports were generated for 1 month intervals. There were 18 final output files, and each file was given a date-descriptive name such as:
6_16_21_to_7_16_21.csv
7_17_21_to_8_16_21.csv
8_17_21_to_9_16_21.csv

Each file contain the following columns:
Amazon Order Id
Merchant Order Id
Shipment ID
Shipment Item Id
Amazon Order Item Id
Merchant Order Item Id
Purchase Date
Payments Date
Shipment Date
Reporting Date
Buyer Email
Buyer Name
Buyer Phone Number
Merchant SKU
Title
Shipped Quantity
Currency
Item Price
Item Tax
Shipping Price
Shipping Tax
Gift Wrap Price
Gift Wrap Tax
Ship Service Level
Recipient Name
Shipping Address 1
Shipping Address 2
Shipping Address 3
Shipping City
Shipping State
Shipping Postal Code
Shipping Country Code
Shipping Phone Number
Billing Address 1
Billing Address 2
Billing Address 3
Billing City
Billing State
bill-postal-code
bill-country
Item Promo Discount
Shipment Promo Discount
Carrier
Tracking Number
Estimated Arrival Date
FC
Fulfillment Channel
Sales Channel

As of this post (2022), several of the columns are no longer populated with data by Amazon out of concerns for customer privacy, including buyer name, phone number and address.

The 18 CSV files are joined together under a variable named df:

```{r message=FALSE, warning=FALSE}
temp <- list.files(pattern="*.csv")
myfiles <- lapply(temp, read_csv)
df <- bind_rows(myfiles)
```

We have a table that includes 60,030 entries and 48 columns.

#### Clean the titles to make them easier to work with:

```{r}
df <- clean_names(df)

View(df)
```

#### Export the joined single file for convenient use in the future (change the path to one of your own choosing):
```{r}
write_csv(df, "/Users/YourName/Desktop/FileName.csv")
```

#### Converting item_price column to a single currency:
The Amazon seller account we are working with sells to customers in the US, Canada and Mexico, and therefore the order amounts are given in USD, MXN and CAD, which can confuse our summation calculations unless they are converted to a single currency.

We first confirm that only USD, MXN and CAD are represented in the dataset:
```{r}
df %>% filter(currency !="CAD" & currency !="MXN" & currency !="USD")
```

Let's use the case_when and round functions to convert our values into a USD value, which will be put in a new column in a new dataframe we will call new_df. Note that the conversion values given below reflect the rates at the time of writing this. Search online for the latest conversion values.

```{r}
new_df <- df %>% 
  mutate(item_price_USD = case_when(
    df$currency == "USD" ~ df$item_price,
    df$currency == "CAD" ~ round(df$item_price * 0.794585, digits = 2),
    df$currency == "MXN" ~ round(df$item_price * 0.051111, digits = 2)))
View(new_df)
```

#### Find a unique identifier for customers in the reports:
As noted above, several columns are no longer populated with data by Amazon out of concerns for customer privacy, including buyer name, phone number and address. Without this unique identifier information, we will need to be creative to determine each unique customer. We can do so by referring to the buyer email address, an Amazon-generated email address that is not the actual buyer's email address, but an Amazon-server-tied email address that appears to be consistent across orders when the same user purchases. 

To verify, try searching a few of the provided Amazon customer email in our table to see if they are used for more than one order. This would indicate that the emails are linked to a unique customers, and not created for each order. When you identify an email that is linked to more than one order, you can verify it is the same customer because the customer's address city and zip code will match, this address information being the limited customer-specific data that the Amazon report DOES provide.

To test, try a few emails from the table and replace them with the the email in this code snippet:
```{r eval=FALSE, include=FALSE}
df %>% filter(buyer_email=="abcd1abcde5abcd@marketplace.amazon.com")
```

#### Let's identify the number of unique buyers over the time period of our data (1.5 years) by counting the unique email addresses:
```{r}
unique_buyers <- length(unique(new_df$buyer_email))
unique_buyers
```

We have been able to determine that there are 37796 unique customers represented in our table.

#### Find the total sales over this time period by summing the item_price_USD column:
```{r}
total_sales <- sum(new_df$item_price_USD)
total_sales
```

Over this time period our example Amazon account has sold 876,787.10 worth of products.


#### Because data in our report is given on each item sold and not each order, let us derive the number of unique orders over this period:
```{r}
unique_orders <- length(unique(new_df$amazon_order_id))
unique_orders
```

We find that our example account had made 47788 unique orders during the period of 1.5 years.

#### Determining the average order value:
```{r}
avg_order_value <- round(total_sales/unique_orders, digits=2)
avg_order_value
```

We find that the average order value for this account is $18.35.

#### Calculating the average order frequency for our customers (total orders / unique customers):
```{r}
order_freq <- round(unique_orders/unique_buyers, digits=2)
order_freq
```

We find that the customers of this Amazon business have an average order frequency of 1.26 orders over the period of 1.5 years.

#### Calculating the average customer value (average order value / order frequency):
```{r}
avg_cust_value = round(avg_order_value*order_freq, digit=2)
avg_cust_value
```

The average customer value for the given data is 23.12, indicating that this is the amount we expect our average customer to spend over a period of 1.5 years. If data for only 1 year was selected, then this figure represents the customer's value over just one year. To determine the *lifetime value* of our customer, we make a reasonable estimate given our knowledge of the business that the average customer engages with this business for 3 years. That would make the total average lifetime customer value equal to $46.24. Our estimate of 3 years will vary from business to business, and business stakeholders will be in the best position to make this determination