tag rules

This table contains association rules for tags in the mangadex dataset. We can use these association rules to find relationships and generate recommendations between tags. Also take a look at the network visualization.

notes

We use the FPGrowth algorithm in PySpark to generate the association rules. We also note relevant terminology sourced from Wikipedia below.

support

Support is how often an itemset of tags is present in the dataset.

supp(X)=(i,t)T:XtTsupp(X) = \frac{|(i, t) \in T: X \subseteq t|}{|T|}

where (i,t)(i, t) is the identifier and itemset of a transaction.

confidence

Confidence is the percentage of times that a set of tags is present when another set of tags is present.

conf(XY)=supp(XY)supp(X)conf(X \rightarrow Y) = \frac{supp(X \cap Y)}{supp(X)}

where XYX \rightarrow Y is the rule.

lift

Lift gives a measure of how likely two sets of tags are independent of each other. When lift is greater than 1, then the sets are dependent. When lift is less than 1, then the sets are independent.

lift(XY)=supp(XY)supp(X)×supp(Y)lift(X \rightarrow Y) = \frac{supp(X \cap Y)}{supp(X) \times supp(Y)}