Fleiss Kappa Agreement

Fleiss kappa agreement is a statistical measure used to assess the agreement or consistency of multiple raters or judges when assigning categorical ratings. This measure was developed by Joseph L. Fleiss in 1971, and it is widely used in various fields such as medicine, social sciences, and education.

The Fleiss kappa agreement is useful in situations where there are three or more raters or judges, and each rater assigns a categorical rating to each subject or item being evaluated. The categorical rating could be “yes” or “no,” “agree” or “disagree,” or any other set of mutually exclusive categories.

The Fleiss kappa agreement ranges from -1 to 1. A Fleiss kappa agreement of -1 indicates no agreement among the raters or judges, 0 indicates random agreement, and 1 indicates perfect agreement. An agreement of less than 0.4 is considered poor, 0.4 to 0.75 is fair to good, and above 0.75 is excellent.

The Fleiss kappa agreement formula takes into account the observed proportion of agreement among the raters, as well as the expected proportion of agreement due to chance. The formula is:

K = (Pobs – Pchance) / (1 – Pchance)

Where K is the Fleiss kappa agreement, Pobs is the observed proportion of agreement, and Pchance is the expected proportion of agreement due to chance.

For example, suppose three raters are evaluating five items using a categorical rating of “yes” or “no.” The table below shows the ratings assigned by each rater:

| Item | Rater 1 | Rater 2 | Rater 3 |

|——|———|———|———|

| 1 | Yes | Yes | No |

| 2 | No | Yes | No |

| 3 | Yes | Yes | Yes |

| 4 | No | No | No |

| 5 | Yes | Yes | Yes |

To calculate the Fleiss kappa agreement, we first need to calculate the observed proportion of agreement, which is the proportion of items for which all three raters agreed. In this case, the observed proportion of agreement is:

Pobs = (# of items for which all three raters agreed) / (# of total items)

Pobs = (2 / 5)

Pobs = 0.4

Next, we need to calculate the expected proportion of agreement due to chance. This is calculated based on the overall proportion of “yes” ratings, which is:

Pyes = (# of “yes” ratings) / (# of total ratings)

Pyes = (7 / 15)

Pyes = 0.467

The expected proportion of agreement due to chance is then calculated as:

Pchance = (Pyes^2 x (# of raters – 1)) + ((1 – Pyes)^2 x (# of raters – 1))

Pchance = (0.467^2 x 2) + ((1 – 0.467)^2 x 2)

Pchance = 0.238

Finally, we can calculate the Fleiss kappa agreement as:

K = (Pobs – Pchance) / (1 – Pchance)

K = (0.4 – 0.238) / (1 – 0.238)

K = 0.286

Therefore, the Fleiss kappa agreement for these three raters is 0.286, which indicates poor agreement.

In conclusion, the Fleiss kappa agreement is a useful tool for assessing the agreement or consistency among multiple raters or judges in assigning categorical ratings. This measure provides a quantitative assessment of agreement that takes into account the expected agreement due to chance, and it is widely used in various fields to evaluate the reliability of data collection and analysis.

Scroll to Top