The ultimate goal of the MT engine is to maintain a great translation quality whilst reducing the time and cost spent during the procedure. And depending on the type of materials for translation, training a Machine Translation (MT) engine can be an arduous process. MT is not perfect, and it requires the step for Post-Edited Machine Translation (PEMT), where human reviewers will manually check for critical, major, and minor errors under a specific criteria or Quality Assurance (QA) model determined by the vendor. Now when the training materials for translation become a mass of legal documents, the process turns into a particular challenge. The following is an Advanced CAT tool project I worked on with fellow Translation, Interpretation, and Localization Management students to train an MT engine — specifically, Microsoft Translator Hub — over a period of four weeks, delineating the overall process of trial and error and, of course, lessons learned.
THE INITIAL PROPOSAL
For the full initial proposal, click here.
Having selected Korean legal documents for translation, we were up against a daunting challenge of meeting the high quality requirement. In a series of ten rounds, we submitted various materials for training, tuning, and testing after manually aligning all the Korean and English text. Among the successes and failures, we found it particularly odd to discover that training with monolingual texts failed us. Additionally, our initial goals for time and cost (25% reduction) and no critical errors were not realistically feasible and did not make sense in terms of calculation.
THE FINAL PROPOSAL AND LESSONS LEARNED
For the full final proposal, click here.
For the final presentation on lessons learned, click here.
Notice that we changed our ultimate goals for time and cost as well as our PEMT grading criteria, allowing up to two critical errors and deduction of 20 points. Post-editors 1 and 5, who had particularly low point deductions, reviewed segments that were comparatively simpler in terms of sentence structure, whereas the remaining post-editors dealt with not only complicated sentence structures, but also different linguistic punctuation (e.g. semicolon used differently in Korean and English context), unfamiliar legal jargon, etc. Here, one of the key lessons learned was that for legal texts, the documents must be very similar: just because they are legal documents, it does not necessarily mean that the jargon will be recognizable across the board. In our case, due to the limitation of available documents related to education and academic laws, we added culture-and-public-relations texts for training, a step that ultimately led to the unavoidable translation errors. Also, because of the much noticeable difference between the nature of Korean and English, thorough manual cleanup process would be helpful to improve accuracy.
KANTAN MT VS MICROSOFT TRANSLATOR HUB
Finally, we took a new spin by trying out the Kantan MT engine with the same legal documents used on Microsoft Translator Hub (MTH). The results were massively diverging.
Above, we have the BLEU score from Microsoft Translator Hub. Considering that the first round recorded a score of 12.56, there was a noticeable increase; however, take a look at the BLEU score from Kantan MT below.
Because Kantan MT’s BLEU score is based on percentage points, it is difficult to determine the exact difference between the scores. But it is quite evident that Kantan has a significantly low BLEU score.
HUMAN EVALUATED MACHINE TRANSLATIONS
Based on the MQM for MT Diagnostics, the criteria used during the pilot project above, we graded the results of PEMT from Kantan and MTH, illustrated below. Note that PEMT was performed on a random group of 300 word segments for approximately 20 minutes.
Kantan MT failed the grading criteria by a significant margin compared to MTH. This was definitely not a surprise, though, as the initial Kantan MT output was filled with mistranslation (or no translation).
Above is a sample of the initial Kantan MT output. A majority of the text was left untranslated or unrecognized despite the fact that we submitted more than 20,000 words (in similar context) to the training batch. It can also be noted that Kantan MT was victimized the peculiar nature of Korean grammar: whereas in English, vocabularies must always be separated by spaces in order to make sense as an independent word (e.g. school committee), Korean actually still makes sense when vocabularies are merged together as one word (e.g. schoolcommittee) — and with legal text, in particular, these tendencies occur frequently. So, while Kantan MT successfully translated the word “school,” for example, it failed to translate “school health law” because the original Korean text had the phrase merged together as one word, which created a trap for Kantan.
NOTABLE FEATURES OF KANTAN MT
For example, the delete function that is nonexistent on MTH. When an incorrect file is uploaded to the engine, we can easily delete the file off the batch.
Kantan MT also has somewhat of an easy-to-navigate User Interface by incorporating various tabs to upload the necessary files. (See below.)
MTH only allows the user to upload all the files in one batch and manually select them for training, tuning, and testing. Kantan MT, however, allows the user to click on each tab and upload the files separately without any confusion.
Also, Kantan has a separate LQR system that allows the user (project manager) to invite external reviewers. MTH, however, assigns the role of the reviewer to solely the members who are invited to the project, which entails all the initial files for training and others. By initializing the review system in a separate LQR system, Kantan eliminates any potential confusion and clashes among the reviewer, translator, and project manager.
I must note one of the downsides of Kantan MT, though, which is that it only allows a specific file naming system as well as a specific number of data files. For example, MTH allowed us to use five tuning files as an entire batch, but Kantan MT required that we only use one file — so we needed to put all five to ten (for training) files together into one document manually and then upload it to the engine accordingly.
CONCLUDING REMARKS: KANTAN MT OR MICROSOFT TRANSLATOR HUB?
For the Korean legal documents we used during our MT engine training project, we recommend using the Microsoft Translator Hub. However, for future training purposes and more accurate comparison of the two engines, it would be advised to manually align the documents as thoroughly as possible and gather one specific law-related documents to train. Kantan MT seems to require further cleanup and more comprehensive approach in one single file per batch, so it would be more beneficial to perform this process first for better comparison. We will most likely continue to use MTH regardless for this particular project, but taking this extra step to finalize the comparison between the two engines may be able to bring justice for Kantan MT.