Challenges

Memsource

It all began with Memsource. While challenges after challenges presented themselves, it appeared that the source of all “evil” was Memsource. But later, it turned out that my neglect and little preparation were the source of all true evil. Let me explain.

The full English transcription was provided by the client, Liberty in North Korea. In an .srt file format, the subtitles consisted of 6,232 words and 773 lines in total. Notice Line 71 in the screenshot below; it contains two lines of subtitles without a doubt:

Line 71 contains two lines of subtitles

But when I uploaded the full .srt file to Memsource, the following division occurred (ignore the translation on the right hand column, as the screenshot was taken after the translation had been completed):

Line 71 from the original .srt file is broken into two lines

Memsource not only was unable to recognize apostrophes, but it also divided a single line of subtitles into two lines at random points, causing the uploaded file to consist of 785 lines instead of 773 (see below):

Total of 785 instead of 773 lines

This issue forced me to ask myself endless amount of questions: should I go through the entire 785 lines and concatenate the divided lines? Should I fix all the broken apostrophes or just notify all translators that the broken symbols represent apostrophes (not that it is difficult to recognize that)? Will Memsource jumble up the .srt file when I download the completed file as it did on its platform?

Good news was that the .srt file was not tainted when I downloaded it back onto my desktop (or so it seemed at the time… to be discussed later). That meant that although Memsource divided 773 lines further into 785 lines, the final .srt output was not affected. So was this bug something that could have been brushed off…? Yes and no, because the ultimate question was, “How am I supposed to assign tasks?” Upload a full .srt file and assign all translators per language and provide them instructions to only translate line 1-245, 246-496, so on and so forth? Or divide the full .srt file into shorter, separate files to be uploaded to Memsource and assign them individually to each translator? The former option ran the definite risk of translators clashing each others’ work without ill intention. Many of the translators were new to Memsource to begin with, so what if Translator A accidentally deletes Translator B’s work?

I honestly could not take that risk. Even if it took more time on my part as the engineer and manager, I would rather divide the full .srt file into individual files and assign them to each translator in order to prevent confusion and potential mechanical as well as linguistic errors. The result of my decision was an inevitably massive list of tasks:

File divided into shorter pieces, customized by each translator’s budgeted word count; numerical values in file name represent line numbers from original .srt that linguist should translate up to

Dividing the full .srt file into shorter pieces never prevented the division of subtitle lines described in the beginning. Frankly, I am still not entirely sure why this bug took place in certain situations and not in others. The division was not necessarily random either, because even when shorter .srt files were uploaded, the same lines experienced it and there weren’t anything particularly special about those lines (for the most part). Perhaps it was the sheer length of the file or the fact that it was an .srt file of subtitles spliced with timestamps. Or perhaps it was the intertitles (on-screen text, or title card) we had in the .srt file — which brings us to the next biggest issue we faced.

Intertitles

At more than ten points throughout the film, the audience sees intertitles in the forms of, including but not limited to, a quote, chapter title, or name plate (see below for examples).

© The Jangmadang Generation, in Russian: Kim Jong-Un’s quote as intertitles

© The Jangmadang Generation, in French: chapter title as intertitles

© The Jangmadang Generation, in Simplified Chinese: individual’s name as intertitles

Recall from Part 1 that the English transcription was completed by the client prior to Globe’s collaboration with LiNK. Assuming that everything was on point, I never gave myself the chance to look over the full .srt file to confirm or raise questions. I could have easily prepared the file better by double-checking to make sure that each subtitle line lasted on screen for a sufficient amount of time, even if that could later be adjusted depending on character numbers and expansion. I could have noticed some odd points like the intertitles that were included in the .srt file and clarify with the client what their expectations for such elements were. But no, I took the matter in such a lighthearted manner that I neglected to do all the preparation and engineering that were absolutely vital to the process, only to regret later and perform post-engineering.

Below are screenshots that perfectly represent how my initial neglect caused a terrible butterfly effect that resulted in many more hours of labor on my part (please read the captions to understand the story behind this challenge):

1) Duplicate line, underlined; 2) individual’s name (correction: Hyunsook, not Hyesook) creating three lines of subtitles instead of maximum two, and duplicate name, circled; 3) duplicate line should be joined and should last until the second end timstamp, triangled

Are you thinking why didn’t I ask the client for clarification on what exactly happened in the disastrous section above? I ask that same question myself. It wasn’t even like I did not notice — so I learned the hard way that when in doubt, ask.

File enters Memsource, butchered by the platform

So Memsource divided the lines in a strange way, but in its defense, the intertitles in this particular section resulted in three lines of subtitles, presumably forcing Memsource to divide the lines as it did. After some deliberation, the client requested that the intertitles be presented on a Google Doc for mechanical ease on their end. But as soon as I extracted the intertitles from the .srt file and placed them under a Google Doc, I realized that this extraction offered myself mechanical ease as well, and it could have been something I did much in advance to prevent the complete mishap on Memsource. 

Only the intertitles extracted to a Google Doc

Problem solved? Not yet.

During the latter engineering process, without recognizing the different end timestamps 10:54,412 and 10:54,646, I had deleted the duplicate line of “There was no food” in the .srt file along with the latter timestamp. Hence, the end timestamp on this line was 10:54,412, meaning that the subtitle did not even last on screen for a full one second:

The infamous timestamps

Ultimately, this particular timestamp issue was resolved either by editing the end timestamp or joining the two lines highlighted above. It is worth noting, however, that for one of the languages, because the line was particularly short, multiple reviewers for linguistic and localization QA never caught the short on-screen subtitle and had no issues with it. At few other points throughout the film, the same short-readability-time issue occurs not because of an engineering mishap but because of character expansion. Granted, the original subtitle already had somewhat of a short time frame, but due to character expansion for some of the languages, CPS (character per second) increased, further lowering readability.

Review: Consistency, Client Review, Validation, and Engineering

Now that we went past the translation stage, it was time for Review. During the review stage, it was crucial to maintain consistency in tone, style, grammar, and punctuation throughout the entire film. This certainly was easier said than done because we had multiple translators for each language — for example, three for Japanese and six for Spanish. So I took all the separated Memsource files, downloaded them, and merged into one .srt file for each language. I uploaded the full .srt files back on Memsource and assigned each of them to my best reviewer for each language — someone who was fully dedicated to the project and, of course, a native speaker of their own respective language.

Then came around the Client Review stage. Our client had numerous acquaintances in their network who came from all walks of life, including journalism; but not all of them were native speakers or professional translators.  So rather than training them on how to use a TMS like Memsource, a brand new platform and concept for them (because Memsource does have a client review function as well), I created a separate spreadsheet to receive client reviewers’ feedback (see below). The spreadsheet provided our Globe reviewers a chance to validate client feedback and ensure that their suggestions, if any, abide by the grammar rules as well as subtitling rules such as character limit and readability.

Client review and validation for German

Some languages had a smoother client review process than others, while others not so much. We had as little as no line of feedback from the client reviewers to as many as 150 lines of feedback. For example, French experienced multiple hand-offs between the client and our reviewers due to stylistic choices and linguistic errors, whereas Portuguese and particularly German had several client feedback and suggestions that were not officially implemented in the final due to grammatical errors and awkwardness in the target language. Of course, that did not make their feedback any less valuable especially because all suggestions provided a different perspective on creative expressions; and whether our reviewer was from the client-side or vendor-side, we were all working for the same objective of perfecting the translation for our documentary film. Thus, at this point, it was crucial that our native-speaking reviewer’s validations had legitimate reasoning with linguistic proof to either confirm or reject client reviewer’s feedback.

With this process in hand, the final step was to implement all accepted changes into the final .srt files. While I handled all the engineering from beginning to (almost) end, towards the final delivery point, I scouted linguists with technical experiences or trained linguists to execute the final engineering process themselves, saving a lot of extra steps and time from the management perspective.

Takeaways

  1. There is no such thing as being too meticulous or too thorough — be as thorough as you can be, whether that is during preparation, translation, review, or engineering.
    • For preparation, be sure to ask questions to the client if anything seems unusual with the source file(s).
    • For translation, check the subtitling guidelines. Remember the rule: approx. 15 characters maximum per line for CJK languages, approx. 42 characters maximum per line for Latin languages; anything beyond 18 CPS (characters per second) is unreadable; and subtitle should not last more than 2 seconds after designated line has been spoken.
    • For review, double-check and triple-check on linguistic points and mechanical points (e.g. is the subtitle too fast?).
    • For final engineering, don’t be afraid to train someone to divide the work with you and attain extra sets of eyes to be, again, meticulous as you can be.
  2. TEP (Translation, Editing, Proofreading) and final linguistic QA should be done by a native speaker, well-versed in the overall translation process; or at the very least, final linguistic QA needs to be done by a native speaker in pro bono projects such as this one with little to no monetary budget.
  3. For pro bono projects, use time and word count as your budget. Show appreciation for all partners involved by expressing your appreciation and keeping them posted on the progress of your project: remember, they are spending their valuable time and energy on a project close to your heart because they believe in your mission and objective. Don’t take their effort for granted.
  4. Communicate with your client consistently and constantly. Be available to them. Keep them posted on your progress. Be appreciative of their belief in you and for their consideration throughout the project. At the end of the day, the final product is for them.
  5. Be a good mediator between your client and linguists; it is key to good project management.
  6. Learn from your mistakes, do not be afraid to fail, and constantly reflect on your errors as you move forward. As you may have already noticed from reading this post, 99% of the challenges I faced were brought on by none other than myself. It was my neglect during pre-production that resulted in all sorts of mechanical issues. But in the end, I can say with confidence that I learned a valuable, hard lesson.

Results

The Jangmadang Generation is now publicly available in Korean, English, Spanish, German, and French. More to be released thanks to the collaboration between LiNK and Globe are Japanese, Simplified Chinese, Traditional Chinese, Russian, and Brazilian Portuguese.

I am incredibly honored to have been so deeply involved in this project, working closely with stunning colleagues at LiNK and MIIS Globe, in order to share the North Korean people’s stories with audiences all around the world. The responses have been beyond positive, as so many people are recognizing the need for a different perspective and narrative around the isolated nation of North Korea. It is empowering to know that our hard work has the influence to fill that void, and it is results like these that remind us why we continue to learn and pursue globalization: to bring the people around the world closer together.

 

Categories: Localization

Leave a Reply

Your email address will not be published. Required fields are marked *

css.php
%d bloggers like this: