ItemSet Mining (Apriori)

Instructions Return to the algorithm

ItemSet Mining:

ItemSet mining is the procedure of identifying frequent itemsets within a set of given transactions (or a transaction database).
We define as support the percentage of transactions in the database (𝐷) that contain (𝑋⊆𝑇) the specific ItemSet (𝑋). Given a support threshold (𝑠), itemsets that appear in at least 𝑠% in the transactions are called frequent itemsets.

Assumptions:

There is a many-to-many relationship between items and transactions.

One transaction may contain many items.
One item may appear in many transactions.

The order of items in a transaction does not matter.

We group all the items appear in a transaction regardless their position.

The quantity of items in a transaction is considered insignificant.

If one transaction contains three items type 'A' and another one contains one item type 'A', the relevant fact is that both transactions contain item 'A'.

There is binary relation between transactions and items.

A transaction may contain an item, or may not.

Association Rule Mining:

Association rule mining is the process of finding all association rules in a given database with high support and confidence.
We define as confidence (𝑐) of a rule (𝑋→𝑌) the percentage of transactions in the database containing 𝑋, that also contain 𝑌.

How does this program work?

1. Insert transactions

Manually

You have to fill in fields with transactions (Maximum number of transactions = 20), dividing the items by comma. Multiple spaces or tabs will be turned to a single space, and all leading and trailing spaces will be trimmed. The items will be extracted from the transactions.

From file

You have to give only txt or xml file format. For convenience, you can download the example of file you want to see the template.

The file should consist of all the transactions, each one of them must be in a line.

This file should consist not only of all the transactions, but also the support and the confidence thresholds.

Random

Here you can select the type of items. The type can be alphabet like 'A' or 'B' or 'C' etc. or groceries like 'Fish Sticks' or 'Bread' or 'Tomatoes'. You need to select the number of different items and the total transactions to be generated. Keep in mind that there is a 50% chance for each item to be contained or not in each transaction. Finally you can set the random seed so you can repeat many times the same experiment.

Rules for transactions in file or manual method:

Each transaction corresponds to a unique transaction. For example, you can give 'A, B, C, D'. Duplicate transactions are NOT omitted!
Duplicate Items in the same transaction ARE omitted! Duplicate items are considered:

'Fish Sticks'
' Fish Sticks '
'Fish Sticks'
' Fish Sticks '
... etc.

After you fill out everything you want, click on tab '2. Insert parameters'.

2. Insert parameters

In this step you need to enter the minimum support as a count and the confidence as percentage.

In this step you CANNOT edit the support if set in an xml file!

In this step you CANNOT edit the confidence if set in an xml file!

After you fill out everything you want, click on tab '3. Results'.

3. Results

You see the results.

Itemset Mining (Apriori)