In a talk by Arle Lommel (DFKI) and Lucia Specia (University of Sheffield), the two professionals provide an overview and demonstration of the Multidimensional Quality Metrics (MQM) framework, illustrating its advantages for assessing translation quality and method of implementation and collaboration.
Let’s first focus on machine translation moving forward. Before the arrival of MQM, the quality of machine translation was assessed according to three simple words: good, bad, and ugly. “Good” reflects that the translation is passable as is or that it needs very little revision; “Bad” means that the translation is salvageable and requires post-editing; and finally, “Ugly” defines the laughable translations produced by a machine. Easy enough. However, these self-explanatory criteria end up ineffective due to the fact that translation quality becomes much too ambiguous. Like splashes of different colors on a canvas, the translation quality blends throughout the point where some translations are almost good or almost bad. It seems quite obvious that a major change is needed to refine the assessment process of machine translation and to effectively utilize human translators in the world of growing technologies. So the big question is, “How can we recognize translations that are truly good and how can we improve translations overall?”
The problem is graver than it seems. We can dive into the roots beneath the problem by analyzing the current situations. For example, when it comes to the human quality metrics on evaluating translation quality, there are as many as 180 unique issues with very little overlap among them. Simply put, there is no consistency. On the other hand, MT metrics (such as BLEU scores) do not specify the details of a translation issue but instead generalizes whether the machine translation is similar to the human translation. Moreover, it does not evaluate whether it is a good or bad translation.
Jumping straight into the MQM system, let’s take a look at its benefits: why would we need MQM? What distinguishes MQM from other translation quality assessment standards? Although MQM defines as many as 120+ issues as well, the core MQM is a much smaller set that is suitable for most purposes. Here is where the three issue types of accuracy, fluency, and verity enter. And within these issue types, MQM presents 12 distinct dimensions that will guide you towards what exactly to assess in terms of translation quality, including language/locale, text type, audience, purpose, register, style, etc. The fusion of utilizing both issue types and dimensions make MQM multi-dimensional (as the name already indicates), allowing the user to accurately measure and evaluate translations for a specific purpose and need.
The benefit of MQM is that it can be incorporated into whatever type of system you may already have or would like to develop. In other words, it is flexible and customizable enough that the MQM can be simplified to meet a specific client or vendor needs. For that matter, MQM is constantly evolving and developing to best serve its users. The assessment management system is one of such developments, aiming to provide a library of reports, sampling, etc. to refer to as data and analysis. These assets and developments of MQM will ultimately smooth out the QA process by assisting evaluators define specific reasons to the making of a good, bad, or ugly translations. MQM will pave the way to a more creative design and automatic technology in the realm of translation quality assessment. Furthermore, it will bring forth a richer context for translation management because adopting the MQM model will signify centralized and focused standards that provide more data in the long run use.