Chapter Details # Support, Confidence, Strong Rules

Define criteria of frequent patterns and association rules Source: http://www.decidyn.com/SmartBundle.php

E-commerce deals with millions of transactions every day (38.5 billions in 2015: Source: http://www.statista.com/). In this chapter, we look at the transaction records and find frequently bought items and associations between items in the shopping baskets. The information we gain from this analysis can be very useful for market analysis, cross-marketing, and product catalogue designs. Frequent patterns are patterns that occur frequently. Shopping items that are frequently bought or web pages visited together are examples of frequent patterns. This kind of information can be useful for various businesses. Lets take a look at transaction records of a grocery shop. This table shows 6 transaction records.

1. The first row says that a customer bought only one banana.
2. The second row says that a customer bought one banana and one pack of coffee together.

Now lets takes a look at the rest of rows, and suggest sets of items that are frequently bought together.

1. We could identify two interesting sets: {banana, coffee} are bought together 3 times.
2. {coffee, milk} are also bought together 3 times as well.

Frequent pattern mining is finding this kind of interesting and useful patterns. Frequent pattern is a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set. This was first proposed by Agrawal, Imielinski, and Swami in 1993 in the context of frequent itemsets and association rule mining in order to find inherent regularities in data.

## Support

Now, lets use some objective measures that we can use to find itemsets that are interesting. Lets start with support measure.

Suppose measure is defined as the ratio of the number of occurrences in a dataset to the size of the dataset.  Then, what would be the support of the itemset of {coffee, milk}?

1. The itemset occurs 3 times in the dataset
2. The size of the dataset is 6 records.
3. So the relative support is 3/6, which is 50%.

Now, we can define what are frequent patterns for our dataset more objectively:

“All itemsets having support greater or equal to a minimum support, which is 50% in this case”

Using this, we now define Frequent Pattern formally as follows: We know that Supp({coffee, milk}) = 50%. Two questions:

1. Suppose we define SuppMin = 40%, is X={coffee, milk} a frequent pattern?
2. Suppose we define SuppMin = 60%, then is X a frequent pattern?

## Association Rule

Once we have frequent itemsets, we can then find associations between the items that are purchased together.

Here is an example association rule from frequent itemset X = {coffee, milk}.

1. Association rule:  coffee -> milk

But how do we know if it is the other way around: milk->coffee.

To determine this and also to see if the rules are really applicable, we use another measure called confidence.

## Confidence

Confidence of a rule R: X->Y  is defined as P(Y | X), which is the conditional probability, "given X, what is the likelihood of Y?"

We can estimate the conditional probability from the transaction data by counting supports:

Conf(R) = Supp(R)/Supp(X)

where Supp(R) = Supp(X U Y).

For R1: coffee -> milk, Conf(R1) = Supp({coffee, milk})/Supp({coffee})

For R2: milk -> coffee, Conf(R2) = Supp({coffee, milk})/Supp({milk})

## Strong Rule

Not all rules are useful. We set some threshold values to filter out non-interesting rules. We say a rule R: X->Y is a strong rule if it meets the following two criteria.

1. Supp(R)>=SuppMin, Note: this also means that Supp(X)>=SuppMin,
2. Conf(R)>=ConfMin,

Quizz

From the transaction tables in tables show previously, calculate these values.

1. Supp( {milk, coffee} ) =
2. Supp( {milk} ) =
3. Supp( {coffee} ) =
4. Confidence(  milk  coffee) =
5. Confidence( coffee -> milk ) =

Using these criteria:  SuppMin = 50% ConfMin = 90%, which are strong rules?

1. milk -> coffee
2. coffee -> milk