Background

This is a simple project in R demonstrating how to calculate the average customer lifetime value (CLV) for an example Amazon business using Amazon-provided business reports.

CLV is a useful metric that lets a business estimate the average amount that a customer spends on the business over the course of time. Some businesses have products or business models with strong customer repurchase rates (think for example a supplement company or an apparel company with strong brand loyalty), whereas others will have a less frequent relationship with their customers. Knowing the lifetime value of a customer could help a business decide whether the cost of acquiring a new customer justifies its current marketing spend. It could also help the business create particular financial projections or serve as a metric for customer loyalty.

Set up initial libraries:

library(tidyverse)
library(tidyr)
library(dplyr)
library(janitor)
library(readr)

Obtain the relevant reports:

The appropriate sales report needed for this project can be obtained from one’s Amazon Seller Central account: Reports –> Fulfillment –> Amazon Fulfilled Shipments

Download reports covering at least 1 year of sales. While the project below uses ~1.5 years of reports (the maximum available), it may be easier to stick to 1 year of sales data for ease in interpreting the CLV number.

For our project, we obtained reports from 12/12/2020 - 5/31/2022. Reports were generated for 1 month intervals. There were 18 final output files, and each file was given a date-descriptive name such as: 6_16_21_to_7_16_21.csv 7_17_21_to_8_16_21.csv 8_17_21_to_9_16_21.csv

Each file contain the following columns: Amazon Order Id Merchant Order Id Shipment ID Shipment Item Id Amazon Order Item Id Merchant Order Item Id Purchase Date Payments Date Shipment Date Reporting Date Buyer Email Buyer Name Buyer Phone Number Merchant SKU Title Shipped Quantity Currency Item Price Item Tax Shipping Price Shipping Tax Gift Wrap Price Gift Wrap Tax Ship Service Level Recipient Name Shipping Address 1 Shipping Address 2 Shipping Address 3 Shipping City Shipping State Shipping Postal Code Shipping Country Code Shipping Phone Number Billing Address 1 Billing Address 2 Billing Address 3 Billing City Billing State bill-postal-code bill-country Item Promo Discount Shipment Promo Discount Carrier Tracking Number Estimated Arrival Date FC Fulfillment Channel Sales Channel

As of this post (2022), several of the columns are no longer populated with data by Amazon out of concerns for customer privacy, including buyer name, phone number and address.

The 18 CSV files are joined together under a variable named df:

temp <- list.files(pattern="*.csv")
myfiles <- lapply(temp, read_csv)
df <- bind_rows(myfiles)

We have a table that includes 60,030 entries and 48 columns.

Clean the titles to make them easier to work with:

df <- clean_names(df)

View(df)

Export the joined single file for convenient use in the future (change the path to one of your own choosing):

write_csv(df, "/Users/YourName/Desktop/FileName.csv")

Converting item_price column to a single currency:

The Amazon seller account we are working with sells to customers in the US, Canada and Mexico, and therefore the order amounts are given in USD, MXN and CAD, which can confuse our summation calculations unless they are converted to a single currency.

We first confirm that only USD, MXN and CAD are represented in the dataset:

df %>% filter(currency !="CAD" & currency !="MXN" & currency !="USD")

Let’s use the case_when and round functions to convert our values into a USD value, which will be put in a new column in a new dataframe we will call new_df. Note that the conversion values given below reflect the rates at the time of writing this. Search online for the latest conversion values.

new_df <- df %>% 
  mutate(item_price_USD = case_when(
    df$currency == "USD" ~ df$item_price,
    df$currency == "CAD" ~ round(df$item_price * 0.794585, digits = 2),
    df$currency == "MXN" ~ round(df$item_price * 0.051111, digits = 2)))
View(new_df)

Find a unique identifier for customers in the reports:

As noted above, several columns are no longer populated with data by Amazon out of concerns for customer privacy, including buyer name, phone number and address. Without this unique identifier information, we will need to be creative to determine each unique customer. We can do so by referring to the buyer email address, an Amazon-generated email address that is not the actual buyer’s email address, but an Amazon-server-tied email address that appears to be consistent across orders when the same user purchases.

To verify, try searching a few of the provided Amazon customer email in our table to see if they are used for more than one order. This would indicate that the emails are linked to a unique customers, and not created for each order. When you identify an email that is linked to more than one order, you can verify it is the same customer because the customer’s address city and zip code will match, this address information being the limited customer-specific data that the Amazon report DOES provide.

To test, try a few emails from the table and replace them with the the email in this code snippet:

Let’s identify the number of unique buyers over the time period of our data (1.5 years) by counting the unique email addresses:

unique_buyers <- length(unique(new_df$buyer_email))
unique_buyers
[1] 37796

We have been able to determine that there are 37796 unique customers represented in our table.

Find the total sales over this time period by summing the item_price_USD column:

total_sales <- sum(new_df$item_price_USD)
total_sales
[1] 876787.1

Over this time period our example Amazon account has sold 876,787.10 worth of products.

Because data in our report is given on each item sold and not each order, let us derive the number of unique orders over this period:

unique_orders <- length(unique(new_df$amazon_order_id))
unique_orders
[1] 47788

We find that our example account had made 47788 unique orders during the period of 1.5 years.

Determining the average order value:

avg_order_value <- round(total_sales/unique_orders, digits=2)
avg_order_value
[1] 18.35

We find that the average order value for this account is $18.35.

Calculating the average order frequency for our customers (total orders / unique customers):

order_freq <- round(unique_orders/unique_buyers, digits=2)
order_freq
[1] 1.26

We find that the customers of this Amazon business have an average order frequency of 1.26 orders over the period of 1.5 years.

Calculating the average customer value (average order value / order frequency):

avg_cust_value = round(avg_order_value*order_freq, digit=2)
avg_cust_value
[1] 23.12

The average customer value for the given data is 23.12, indicating that this is the amount we expect our average customer to spend over a period of 1.5 years. If data for only 1 year was selected, then this figure represents the customer’s value over just one year. To determine the lifetime value of our customer, we make a reasonable estimate given our knowledge of the business that the average customer engages with this business for 3 years. That would make the total average lifetime customer value equal to $46.24. Our estimate of 3 years will vary from business to business, and business stakeholders will be in the best position to make this determination

---
title: "Customer Lifetime Value from Amazon Seller Data (A Project in R)"
author: "F Abdullah"
date: '2022-06-06'
output: html_notebook
---

## Background

This is a simple project in R demonstrating how to calculate the average customer lifetime value (CLV) for an example Amazon business using Amazon-provided business reports.

CLV is a useful metric that lets a business estimate the average amount that a customer spends on the business over the course of time. Some businesses have products or business models with strong customer repurchase rates (think for example a supplement company or an apparel company with strong brand loyalty), whereas others will have a less frequent relationship with their customers. Knowing the lifetime value of a customer could help a business decide whether the cost of acquiring a new customer justifies its current marketing spend. It could also help the business create particular financial projections or serve as a metric for customer loyalty.

#### Set up initial libraries:

```{r message=FALSE, warning=FALSE}
library(tidyverse)
library(tidyr)
library(dplyr)
library(janitor)
library(readr)
```

#### Obtain the relevant reports:
The appropriate sales report needed for this project can be obtained from one's Amazon Seller Central account:
Reports --> Fulfillment --> Amazon Fulfilled Shipments

Download reports covering at least 1 year of sales. While the project below uses ~1.5 years of reports (the maximum available), it may be easier to stick to 1 year of sales data for ease in interpreting the CLV number.

For our project, we obtained reports from 12/12/2020 - 5/31/2022. Reports were generated for 1 month intervals. There were 18 final output files, and each file was given a date-descriptive name such as:
6_16_21_to_7_16_21.csv
7_17_21_to_8_16_21.csv
8_17_21_to_9_16_21.csv

Each file contain the following columns:
Amazon Order Id
Merchant Order Id
Shipment ID
Shipment Item Id
Amazon Order Item Id
Merchant Order Item Id
Purchase Date
Payments Date
Shipment Date
Reporting Date
Buyer Email
Buyer Name
Buyer Phone Number
Merchant SKU
Title
Shipped Quantity
Currency
Item Price
Item Tax
Shipping Price
Shipping Tax
Gift Wrap Price
Gift Wrap Tax
Ship Service Level
Recipient Name
Shipping Address 1
Shipping Address 2
Shipping Address 3
Shipping City
Shipping State
Shipping Postal Code
Shipping Country Code
Shipping Phone Number
Billing Address 1
Billing Address 2
Billing Address 3
Billing City
Billing State
bill-postal-code
bill-country
Item Promo Discount
Shipment Promo Discount
Carrier
Tracking Number
Estimated Arrival Date
FC
Fulfillment Channel
Sales Channel

As of this post (2022), several of the columns are no longer populated with data by Amazon out of concerns for customer privacy, including buyer name, phone number and address.

The 18 CSV files are joined together under a variable named df:

```{r message=FALSE, warning=FALSE}
temp <- list.files(pattern="*.csv")
myfiles <- lapply(temp, read_csv)
df <- bind_rows(myfiles)
```

We have a table that includes 60,030 entries and 48 columns.

#### Clean the titles to make them easier to work with:

```{r}
df <- clean_names(df)

View(df)
```

#### Export the joined single file for convenient use in the future (change the path to one of your own choosing):
```{r}
write_csv(df, "/Users/YourName/Desktop/FileName.csv")
```

#### Converting item_price column to a single currency:
The Amazon seller account we are working with sells to customers in the US, Canada and Mexico, and therefore the order amounts are given in USD, MXN and CAD, which can confuse our summation calculations unless they are converted to a single currency.

We first confirm that only USD, MXN and CAD are represented in the dataset:
```{r}
df %>% filter(currency !="CAD" & currency !="MXN" & currency !="USD")
```

Let's use the case_when and round functions to convert our values into a USD value, which will be put in a new column in a new dataframe we will call new_df. Note that the conversion values given below reflect the rates at the time of writing this. Search online for the latest conversion values.

```{r}
new_df <- df %>% 
  mutate(item_price_USD = case_when(
    df$currency == "USD" ~ df$item_price,
    df$currency == "CAD" ~ round(df$item_price * 0.794585, digits = 2),
    df$currency == "MXN" ~ round(df$item_price * 0.051111, digits = 2)))
View(new_df)
```

#### Find a unique identifier for customers in the reports:
As noted above, several columns are no longer populated with data by Amazon out of concerns for customer privacy, including buyer name, phone number and address. Without this unique identifier information, we will need to be creative to determine each unique customer. We can do so by referring to the buyer email address, an Amazon-generated email address that is not the actual buyer's email address, but an Amazon-server-tied email address that appears to be consistent across orders when the same user purchases. 

To verify, try searching a few of the provided Amazon customer email in our table to see if they are used for more than one order. This would indicate that the emails are linked to a unique customers, and not created for each order. When you identify an email that is linked to more than one order, you can verify it is the same customer because the customer's address city and zip code will match, this address information being the limited customer-specific data that the Amazon report DOES provide.

To test, try a few emails from the table and replace them with the the email in this code snippet:
```{r eval=FALSE, include=FALSE}
df %>% filter(buyer_email=="abcd1abcde5abcd@marketplace.amazon.com")
```

#### Let's identify the number of unique buyers over the time period of our data (1.5 years) by counting the unique email addresses:
```{r}
unique_buyers <- length(unique(new_df$buyer_email))
unique_buyers
```

We have been able to determine that there are 37796 unique customers represented in our table.

#### Find the total sales over this time period by summing the item_price_USD column:
```{r}
total_sales <- sum(new_df$item_price_USD)
total_sales
```

Over this time period our example Amazon account has sold 876,787.10 worth of products.


#### Because data in our report is given on each item sold and not each order, let us derive the number of unique orders over this period:
```{r}
unique_orders <- length(unique(new_df$amazon_order_id))
unique_orders
```

We find that our example account had made 47788 unique orders during the period of 1.5 years.

#### Determining the average order value:
```{r}
avg_order_value <- round(total_sales/unique_orders, digits=2)
avg_order_value
```

We find that the average order value for this account is $18.35.

#### Calculating the average order frequency for our customers (total orders / unique customers):
```{r}
order_freq <- round(unique_orders/unique_buyers, digits=2)
order_freq
```

We find that the customers of this Amazon business have an average order frequency of 1.26 orders over the period of 1.5 years.

#### Calculating the average customer value (average order value / order frequency):
```{r}
avg_cust_value = round(avg_order_value*order_freq, digit=2)
avg_cust_value
```

The average customer value for the given data is 23.12, indicating that this is the amount we expect our average customer to spend over a period of 1.5 years. If data for only 1 year was selected, then this figure represents the customer's value over just one year. To determine the *lifetime value* of our customer, we make a reasonable estimate given our knowledge of the business that the average customer engages with this business for 3 years. That would make the total average lifetime customer value equal to $46.24. Our estimate of 3 years will vary from business to business, and business stakeholders will be in the best position to make this determination