RHM is an alternative method for calculating phylogenetic distan= ces in order to construct phylogenetic tree based on compression algorithms. The method was proposed i= n 2006 by Teemu Roos, Tuomas Heikkil=C3=A4 and Petri Myllym=C3=A4ki (hence = its name) (Roos et al. 2006). Given a set of textual documents, the method = produces a bifurcating stemma. RHM operates in a manner similar to the = ;maximum pars= imony method with certain important differences. Roos and Heikkil= =C3=A4 (2009) have argued that RHM and maximum parsimony actually yield the best res= ults when constructing cladist= ics based stemmata, but they also point out that the computational cost= is high =E2=80=93 i.e. computing a stemma for a tradition with anywhere be= tween 10 and 50 manuscripts may take considerable time, (hours rather minut= es).

The RHM method uses an approximation of Kolmogorov complexity which =E2=
=80=93 theoretically =E2=80=93 is defined as the smallest possible but comp=
lete description of an object (e.g. compressing "aaaaa" into "5a"). Theoret=
ically smallest because for formal languages (like computer languages) it i=
s mathematically impossible to prove that such a description is actually th=
e smallest possible. In practice, therefore, such smallest possible descrip=
tions are always approximated. RHM uses such an approximation to evaluate t=
he distance (i.e. the amount of dissimilarity) between witnesses while cons=
tructing a phylogenetic tree by using GZIP compression as the approximation=
. The use of GZIP automatically gives greater weight to longer variants, e.=
g. the weight assigned to the variation *=E2=80=9Cbeatus=E2=80=
=9D* vs. *=E2=80=9Csanctus=E2=80=9D* is six units=
while the weight assigned to the variation =E2=80=9Cex=E2=80=9D vs =E2=80=
=9Cin=E2=80=9D is only three units. Similarly, variation in word order is u=
sually assigned a smaller weight than variation in the actual words. All of=
this is based only on the actual information content and not on scholarly =
evaluation.

Because RHM uses compression-based comparison without user-intervention,=
all weighting of variations is based on information immanent in the text w=
ithout scholarly evaluation intervening. This means that the application of=
RHM requires less effort than an analysis based on carefully constructed e=
ncodings where, for example, variation that is considered insignificant (ca=
pitalisation, punctuation, etc.) is removed by normalisation, variation in =
word order is encoded using special characters, and so on. This also result=
s in RHM using as its input aligned text files which contain the actual wor=
ds, instead of encoded variant readings using arbitrary characters such as *A,B,C*,=E2=80=
=A6 as is done in, for instance, the Nexus data matrix format.

A difference between RHM and typical maximum parsimony implementations, = such as that in PAUP or PHYLIP, is the search procedure used = to find highly scoring tree structures.The search technique used in RHM tak= es a user-defined parameter, the number of search steps or iterations. The = more iterations, the longer the search takes but also the better a solution= can be expected. The maximum parsimony implementation in PAUP and PHYLIP o= n the other hand, is faster.

**References**

=E2=80=93 Roos, Teemu, Tuomas Heikkil=C3=A4, and Petri Myllym=C3=A4ki. 2=
006. =E2=80=9CA Compression-Based Method for Stemmatic Analysis.=E2=80=9D I=
n *ECAI 2006: Proceedings of the 17th European Conference on Artificial =
Intelligence: August 29 =E2=80=93 September 1, 2006*, edited by Gerhard=
Brewka et al., 805=E2=80=93806. Amsterdam: IOS Press.

=E2=80=93 Roos, T=
eemu, and Tuomas Heikkil=C3=A4. 2009. =E2=80=9CEvaluating Methods for Compu=
ter-Assisted Stemmatology Using Artificial Benchmark Data Sets.=E2=80=9D