Market Basket Analysis (Apriori Algorithm)

Market basket analysis mines transaction data for frequently co-purchased items and derives association rules like \u201cif A then B\u201d, enabling product placement, bundling, and recommendation strategies.


Key Metrics

Association rule mining is driven by three metrics: Support (how often the itemset appears), Confidence (probability of Y given X), and Lift (how much more likely than by chance).

Definitions

  • Support(X) = transactions containing X / total transactions.
  • Confidence(X \u2192 Y) = Support(X \u222a Y) / Support(X).
  • Lift(X \u2192 Y) = Confidence(X \u2192 Y) / Support(Y). Lift > 1 means positive association; = 1 means independent; < 1 means negative association.

The Apriori Algorithm

Apriori exploits the anti-monotone property: any subset of a frequent itemset must also be frequent. This prunes the search space by eliminating infrequent candidates early.

Apriori with mlxtend

<pre><code class="language-python">import pandas as pd from mlxtend.frequent_patterns import apriori, association_rules from mlxtend.preprocessing import TransactionEncoder # Example transaction dataset transactions = [ ['bread', 'milk', 'eggs'], ['bread', 'butter'], ['milk', 'butter', 'cheese'], ['bread', 'milk', 'butter'], ['milk', 'eggs'], ['bread', 'milk'], ['eggs', 'cheese'], ['bread', 'milk', 'butter', 'eggs'] ] te = TransactionEncoder() X = te.fit_transform(transactions) df = pd.DataFrame(X, columns=te.columns_) # Mine frequent itemsets freq_items = apriori(df, min_support=0.3, use_colnames=True) print(freq_items) # Generate association rules rules = association_rules(freq_items, metric='lift', min_threshold=1.2) print(rules[['antecedents','consequents','support','confidence','lift']].round(3))</pre>

Interpreting Rules

Rules with high support, high confidence, and lift &gt; 1 are the most actionable. Sorting by lift identifies the most surprising associations beyond base rates.

Business Applications

  • Product placement: Place frequently co-purchased items near each other.
  • Bundle offers: Discount bundles identified by high-confidence rules.
  • Recommendation engines: Suggest items frequently bought with what's in the cart.
  • Inventory management: Stock frequently co-occurring items together.

FP-Growth: A Faster Alternative

FP-Growth avoids candidate generation by building a compact prefix tree (FP-tree), making it significantly faster than Apriori on large datasets. mlxtend.frequent_patterns.fpgrowth provides a drop-in replacement with the same API as apriori.