With this tool, you can easily calculate the degree of agreement between two judges during the selection of studies to be included in a meta-analysis. Fill the fields to get the gross percentage of the chord and the value of Cohens Kappa. As Marusteri and Bacarea (9) have found, there is never 100% certainty about the results of the research, even if the statistical significance is reached. The statistical results used to test hypotheses about the relationship between independent and dependent variables are meaningless when there are inconsistencies in the evaluation of variables by evaluators. If the agreement is less than 80%, more than 20% of the data analysed is wrong. With a reliability of only 0.50 to 0.60, it is understandable that 40 to 50% of the data analyzed is wrong. If Kappa values are less than 0.60, the confidence intervals around the received kappa are so wide that it can be assumed that about half of the data may be false (10). It is clear that statistical significance does not mean much when there are so many errors in the results tested. This is a simple procedure when the values are zero and one and the number of data collectors is two. If there are more data collectors, the procedure is a little more complex (Table 2). However, as long as the values are limited to only two values, the calculation remains simple. The researcher calculates only the percentage agreement for each line and on average the lines. Another advantage of the matrix is that it allows the researcher to determine whether errors are accidental and are therefore fairly evenly distributed among all flows and variables, or whether a data collector often indicates different values from other data collectors.
Table 2, which has an overall reliability of 90% for interraters, found that no data collector had an excessive number of outlier assessments (scores that did not agree with the majority of the evaluators` scores). Another advantage of this technique is that it allows the researcher to identify variables that can be problematic. Note that Table 2 shows that evaluators received only 60% approval for variable 10. This variable may warrant a review to determine the cause of such a low match in its assessment. Many situations in the health sector rely on multiple people to collect research or clinical laboratory data. The question of consistency or consistency between data-gathering individuals arises immediately because of variability among human observers. Well-designed research studies must therefore include methods to measure the consistency between different data collectors. Study projects generally include the training of data collectors and the extent to which they record the same values for the same phenomena. Perfect match is rarely achieved and confidence in the study results depends in part on the amount of disagreements or errors introduced in the study due to inconsistencies between the data collectors. The extent of the match between the data collectors is called “the reliability of the Interrater.” On the other hand, if there are more than 12 codes, the expected Kappa value increment becomes flat. As a result, the percentage of the agreement could serve the purpose of measuring the amount of the agreement. In addition, the increment of the sensitivity performance metric apartment values also reaches the asymptote of more than 12 codes.
Statistical packages can calculate a default score (Z-Score) for Cohens Kappa or Fleiss`s Kappa, which can be converted to P. However, even if the P value reaches the threshold of statistical significance (usually less than 0.05), this only indicates that the agreement between the advisors is significantly better than what would be expected. The p value does not tell you if the agreement is good enough to have a high predictive value.