Apriori算法Java代码：挖掘频繁项集和关联规则 |21xrx.com

Apriori算法Java代码：挖掘频繁项集和关联规则

2023-11-19 07:57:22 深夜i 10 0

Apriori算法 Java代码频繁项集关联规则挖掘

Apriori算法是一种常用的数据挖掘算法，用于在大规模数据集中挖掘频繁项集和关联规则。该算法基于两个重要的概念：支持度和置信度。

支持度是指在所有事务中某个项集出现的频率。如果一个项集的支持度高于预设的阈值，我们就称其为频繁项集。置信度是指在某个项集出现的事务中，另一个项集出现的频率。如果一个关联规则的置信度高于预设的阈值，我们就称其为强关联规则。

以下是一个使用Java编写的Apriori算法的示例代码：

import java.util.*;
public class AprioriAlgorithm {
  public static void main(String[] args) {
    // 定义一个包含所有事务的数据集
    List<List<String>> transactions = new ArrayList<>();
    // 向数据集中添加样例事务（可根据实际情况进行修改）
    transactions.add(Arrays.asList("面包", "牛奶", "啤酒"));
    transactions.add(Arrays.asList("面包", "牛奶", "尿布", "啤酒"));
    transactions.add(Arrays.asList("尿布", "苹果", "面包", "牛奶"));
    transactions.add(Arrays.asList("尿布", "面包", "牛奶", "可乐"));
    // 设置最小支持度和最小置信度的阈值
    double minSupport = 0.5;
    double minConfidence = 0.6;
    // 调用Apriori算法的实现方法
    List<List<String>> frequentItemsets = apriori(transactions, minSupport);
    List<AssociationRule> associationRules = generateAssociationRules(frequentItemsets, minConfidence);
    // 输出频繁项集
    System.out.println("Frequent Itemsets:");
    for (List<String> itemset : frequentItemsets) {
      System.out.println(itemset);
    }
    // 输出关联规则
    System.out.println("Association Rules:");
    for (AssociationRule rule : associationRules) {
      System.out.println(rule);
    }
  }
  // Apriori算法的实现
  public static List<List<String>> apriori(List<List<String>> transactions, double minSupport) {
    // 存储生成的频繁项集
    List<List<String>> frequentItemsets = new ArrayList<>();
    // 初始化候选项集列表
    List<List<String>> candidateItemsets = generateCandidateItemsets(transactions);
    // 循环迭代生成频繁项集
    while (!candidateItemsets.isEmpty()) {
      // 计算候选项集的支持度
      Map<List<String>, Integer> itemsetCount = new HashMap<>();
      for (List<String> transaction : transactions) {
        for (List<String> candidateItemset : candidateItemsets) {
          if (transaction.containsAll(candidateItemset)) {
            itemsetCount.put(candidateItemset, itemsetCount.getOrDefault(candidateItemset, 0) + 1);
          }
        }
      }
      // 生成满足最小支持度的频繁项集
      List<List<String>> frequentItemsetsInIteration = new ArrayList<>();
      for (Map.Entry<List<String>, Integer> entry : itemsetCount.entrySet()) {
        List<String> itemset = entry.getKey();
        int support = entry.getValue();
        double supportRatio = (double) support / transactions.size();
        if (supportRatio >= minSupport) {
          frequentItemsetsInIteration.add(itemset);
        }
      }
      // 添加本次迭代生成的频繁项集
      frequentItemsets.addAll(frequentItemsetsInIteration);
      // 生成下一轮的候选项集
      candidateItemsets = generateNextCandidateItemsets(frequentItemsetsInIteration);
    }
    return frequentItemsets;
  }
  // 生成候选项集
  public static List<List<String>> generateCandidateItemsets(List<List<String>> transactions) {
    List<List<String>> candidateItemsets = new ArrayList<>();
    for (List<String> transaction : transactions) {
      for (String item : transaction) {
        List<String> candidateItemset = new ArrayList<>();
        candidateItemset.add(item);
        if (!candidateItemsets.contains(candidateItemset)) {
          candidateItemsets.add(candidateItemset);
        }
      }
    }
    return candidateItemsets;
  }
  // 生成下一轮的候选项集
  public static List<List<String>> generateNextCandidateItemsets(List<List<String>> frequentItemsets) {
    List<List<String>> candidateItemsets = new ArrayList<>();
    for (int i = 0; i < frequentItemsets.size(); i++) {
      List<String> itemset1 = frequentItemsets.get(i).subList(0, frequentItemsets.get(i).size() - 1);
      for (int j = i + 1; j < frequentItemsets.size(); j++) {
        List<String> itemset2 = frequentItemsets.get(j).subList(0, frequentItemsets.get(j).size() - 1);
        if (itemset1.equals(itemset2)) {
          List<String> newItemset = new ArrayList<>(frequentItemsets.get(i));
          newItemset.add(frequentItemsets.get(j).get(frequentItemsets.get(j).size() - 1));
          if (!candidateItemsets.contains(newItemset)) {
            candidateItemsets.add(newItemset);
          }
        }
      }
    }
    return candidateItemsets;
  }
  // 生成关联规则
  public static List<AssociationRule> generateAssociationRules(List<List<String>> frequentItemsets, double minConfidence) {
    List<AssociationRule> associationRules = new ArrayList<>();
    for (List<String> frequentItemset : frequentItemsets) {
      // 如果频繁项集中的项数大于1，才能生成关联规则
      if (frequentItemset.size() > 1) {
        // 对频繁项集中的每个项，生成对应的关联规则
        for (int i = 0; i < frequentItemset.size(); i++) {
          List<String> antecedent = new ArrayList<>(frequentItemset.subList(0, i));
          List<String> consequent = new ArrayList<>(frequentItemset.subList(i, frequentItemset.size()));
          double confidence = calculateConfidence(frequentItemsets, antecedent, consequent);
          if (confidence >= minConfidence) {
            associationRules.add(new AssociationRule(antecedent, consequent, confidence));
          }
        }
      }
    }
    return associationRules;
  }
  // 计算置信度
  public static double calculateConfidence(List<List<String>> frequentItemsets, List<String> antecedent, List<String> consequent) {
    int supportAntecedent = 0;
    int supportItemset = 0;
    // 计算前提项集的支持度
    for (List<String> itemset : frequentItemsets) {
      if (itemset.containsAll(antecedent)) {
        supportAntecedent++;
      }
    }
    // 计算整个项集的支持度
    for (List<String> itemset : frequentItemsets) {
      if (itemset.containsAll(consequent)) {
        supportItemset++;
      }
    }
    return (double) supportItemset / supportAntecedent;
  }
}
class AssociationRule {
  private List<String> antecedent;
  private List<String> consequent;
  private double confidence;
  public AssociationRule(List<String> antecedent, List<String> consequent, double confidence)
    this.antecedent = antecedent;
    this.consequent = consequent;
    this.confidence = confidence;
  
  public List<String> getAntecedent()
    return antecedent;
  
  public List<String> getConsequent()
    return consequent;
  
  public double getConfidence()
    return confidence;
  
  @Override
  public String toString() {
    return antecedent + " => " + consequent + " (confidence: " + confidence + ")";
  }
}

以上代码实现了一个完整的Apriori算法，能够计算出给定事务集中的频繁项集和关联规则。在使用该算法时，只需将事务集、最小支持度和最小置信度的阈值作为输入参数传递给

apriori

方法，并通过

frequentItemsets

和

associationRules

分别获取到对应的结果。

通过使用Apriori算法，我们可以从大规模的数据集中挖掘出频繁项集和强关联规则。这些结果对于市场营销、产品推荐和交叉销售等领域具有重要的应用价值。

上一篇: idea打包java可执行jar包

下一篇: FFmpeg CPU解码耗时测试

评论区

相似文章

C++编程——Apriori算法

Apriori算法是一种基于频繁项集的挖掘算法，其主要用于发现数据集合中的频繁项集和关联规则。该算法由Agrawal和Srikant于1994年提出，是挖掘频繁项集的经典算法之一。