2025-11-12T20:28:10.501994

NiaAutoARM: Automated generation and evaluation of Association Rule Mining pipelines

Mlakar, Fister, Fister

The Numerical Association Rule Mining paradigm that includes concurrent dealing with numerical and categorical attributes is beneficial for discovering associations from datasets consisting of both features. The process is not considered as easy since it incorporates several processing steps running sequentially that form an entire pipeline, e.g., preprocessing, algorithm selection, hyper-parameter optimization, and the definition of metrics evaluating the quality of the association rule. In this paper, we proposed a novel Automated Machine Learning method, NiaAutoARM, for constructing the full association rule mining pipelines based on stochastic population-based meta-heuristics automatically. Along with the theoretical representation of the proposed method, we also present a comprehensive experimental evaluation of the proposed method.

academic

NiaAutoARM: Automated generation and evaluation of Association Rule Mining pipelines

基本信息

论文ID: 2501.00138
标题: NiaAutoARM: Automated generation and evaluation of Association Rule Mining pipelines
作者: Uroš Mlakar, Iztok Fister Jr., Iztok Fister (University of Maribor, Slovenia)
分类: cs.NE (Neural and Evolutionary Computation), cs.AI (Artificial Intelligence)
发表时间: 2024年12月30日 (arXiv预印本)
论文链接: https://arxiv.org/abs/2501.00138

摘要

数值关联规则挖掘(Numerical Association Rule Mining, NARM)范式能够同时处理数值和分类属性，对于从包含两种特征类型的数据集中发现关联关系非常有益。然而，该过程并不简单，因为它包含多个顺序执行的处理步骤来形成完整的管道，如预处理、算法选择、超参数优化和评估关联规则质量的指标定义。本文提出了一种新颖的自动机器学习方法NiaAutoARM，基于随机种群元启发式算法自动构建完整的关联规则挖掘管道。除了方法的理论表示，论文还提供了对所提方法的全面实验评估。

研究背景与动机

1. 问题定义

关联规则挖掘(ARM)是一种用于发现事务数据库中项目间关系的机器学习方法。传统的ARM仅限于处理分类属性，而数值关联规则挖掘(NARM)作为ARM的变体，能够同时处理数值和分类属性，从而消除了传统ARM的瓶颈。

2. 问题重要性

民主化需求: 自动机器学习(AutoML)旨在让非专业用户也能使用ML方法，避免"人在环中"的原则
复杂性挑战: ARM管道包含多个复杂组件：数据预处理、算法选择、超参数优化、评估指标选择和评估
无通用解: 根据No Free Lunch定理，不存在适用于所有数据集的通用ARM元启发式算法

3. 现有方法局限性

手动构建ARM管道需要大量人工干预，耗时且复杂
现有研究对ARM预处理步骤关注不足
缺乏专门针对ARM管道自动构建的AutoML方法

4. 研究动机

基于NiaAML方法的启发，将ARM管道构建问题建模为连续优化问题，使用种群元启发式算法自动搜索最优管道配置。

核心贡献

首创性: 提出第一个专门用于ARM管道自动搜索的AutoML解决方案，将自动搜索表示为优化问题
预处理关注: 特别关注ARM预处理步骤，弥补了近期研究工作的不足
实现框架: 实现了名为NiaAutoARM的Python包，提供完整的实用工具
全面评估: 在多个数据集上对所提方法进行了严格的实验评估

方法详解

任务定义

将ARM管道构建定义为连续优化问题，其中每个个体代表一个可行的ARM管道配置，包括：

算法选择
超参数设置
预处理方法
评估指标及权重

模型架构

1. 解表示

每个个体 $x_i^{(t)}$ 表示为：

$x_i^{(t)} = \langle x_{i,1}^{(t)}, y_{i,1}^{(t)}, y_{i,2}^{(t)}, p_{i,1}^{(t)}, \ldots, p_{i,P}^{(t)}, z_{i,1}^{(t)}, \ldots, z_{i,M}^{(t)}, w_{i,1}^{(t)}, \ldots, w_{i,M}^{(t)} \rangle$