Real costs of quality software translations

Henk Boxma
Localization architect

One of the major criteria for selecting a language service provider (LSP) is the translation cost per word. This makes sense for the translation of a "flowing text" where the full context is available. In that case, it is quite straightforward to perform an assessment and select the LSP that has the optimal quality/cost ratio. But is this also the case for software localization?

Having a background in engineering, I have always wondered why the cost per word is still one of the major selection criteria when an LSP is selected for software localization. The cost of translating software itself is just a fraction of the real cost. The process is quite different compared to translating a fl owing text, and software localization should be an integral part of the software development process. In many cases, software localization is treated as a post-development activity, which has significant consequences regarding time, cost and quality.

How general requirements for software translations are implemented is a good predictor of the final attributes that quantify translations, which are time, cost and quality. The interesting point here is that the majority of these requirements needs to be implemented by the engineers during the development process. Hence, engineers have more impact on the final translation attributes than the LSP.

In order to steer each of the three attributes towards our preferred direction, we also should focus on prevention of costs. Today's focus is more on reducing them.

The three factors in product development remain in constant tension: time, cost and quality. The theory is that you should pick two attributes. For example, quality in time will result in higher costs. Reducing costs and maintaining high quality result in a longer development cycle. This is true if you do not want to change and want to stick to the current patterns. However, it is possible to save significant costs on software localization and at the same time to reduce the engineering efforts (time) and improve the quality of translations. This can be achieved by working smarter. Therefore, we need to define what software localization costs are exactly.

Characteristics of software localization

Why does software localization differ from translating flowing text? Software contains labels (strings), which are stored in one or more resource files - for example, resx-files. Each label has a unique identifier. The software uses a label by referencing its identifier. Translating basically means that the label-resource file is copied and that all source texts are replaced by the target translations. The resulting file can be loaded into the software build in order to make the translation available to the users.

If no precautions are taken, the only information the translator would have is a table with strings. Without context, this is hard to translate because, for example, the label edit could be a noun or a verb. Another problem is that the translator does not know if the translated string would fit, since software may have length restrictions when strings are rendered. An important characteristic is that a software localization tool can render labels in the context of the window that is shown in the target application. This provides the user contextual information. As seen in Figure 1, Notepad is loaded in a software localization tool. The translator immediately sees that the Dutch translation of Cancel - which is Afbreken - will not fit.

After all translations have been done, the translator would like to see the translations in context in the target application. Navigational instructions guide the translator along all labels. Unfortunately, some specific labels are only shown in situations that are not easy to simulate, such as specific state warnings.

A software application must be internationalized before it can be localized. The objective of the engineering effort is designing a software application that can be adapted to various languages and regions without engineering changes. Unfortunately, during the localization effort internationalization issues can be detected, and each of them should be reported to the developers, usually via a change request. The change control board then decides if an issue should be processed or not. The outcome depends on various criteria and will not always be optimal from a linguistic perspective. Examples of internationalization issues are hard-coded strings, concatenation of strings, reuse of label-identifiers and lack of support for bidirectional languages.

The main reasons why software localization is so different from translating a flowing text are (1) the dependency on the development process and (2) the contextual information is not completely contained within the resources to be translated itself.


Figure 1: A software localization tool provides context for the Dutch term Cancel, and the translator sees immediately if the translation will fit.

Requirements

Besides the assumption that the software application is well-internationalized, the minimal requirements in order to obtain high-quality software translations are:

  • Context for source and translation: the translator would like to see the string translated in its full source context and the translated string in its full target context. Seeing the target translation in context may prevent linguistic, cosmetic and functional bugs.
  • Navigation: in addition to context between strings that are shown together on a screen, the translator would like to know how to navigate to a window where the string is shown.
  • Clipping: the translator would prefer to get immediate feedback if the translated string will fit in the target application. The severity of a clipping string depends on the type of product. For a user of a mobile phone application it could be annoying, while it could even be life-critical in a medical application. LISA research shows that these kinds of issues could have a disproportionate impact on purchasing decisions, even more than functional issues.
  • Quality: The translator should have an environment that supports linguistic quality. This can be achieved by checking consistency and correct usage of terms. The environment should also be able to find quality issues, by comparing source and target, for example.

If these requirements are ignored during the architectural phase of product development, which is frequently the case, this will result in substantial development costs and an unnecessary increase in time-to-market and as a result missed sales opportunities.

Software localization-related costs

Extra costs for product development are typically hidden as part of the costs for localization and are more difficult to quantify than the translation costs per word.

Engineers put effort in supporting the translation cycle, and costs associated with this could be potentially reduced. Nowadays, many projects are executed in an agile manner, which has the implication that engineers apply changes to resources - create, update and/or delete source strings - during the translation process. The translated resources that the translators return have to be integrated into the software build. This integration process is not a straightforward task without proper tooling - for example, a translation in which the source text changed becomes invalid and has to be retranslated.

If no software localization tool is available, the engineers have to develop and maintain tooling to support the localization process. These tools can become quite complicated and are in general not easy to maintain. These kinds of engineering activities can be largely avoided because a professional software localization tool will support this functionality. There is a clear return on investment on efforts spent by engineers. It will reduce costs for development and maintenance and will make your environment less dependent on architectural choices.

Translations are typically done at the end of the product development phase. In that phase, the engineers typically have a high workload in order to deliver the product on time. Translators will ask (read: disturb) the engineers to support them with contextual or navigational information. This can be a significant effort, even for a medium-size project, because the same question will appear more or less at the same time from each independent translator. Sometimes the translators may also find internationalization issues, such as hard-coded strings or concatenations. Finding these issues late in the process is costly to solve. Software localization tools can support engineers to identify internationalization issues early in the development cycle.

The real cost of localization is testing, rather than the 19¢, 20¢ or 20.3¢ per word where the focus typically is when a customer selects an LSP. In many cases, the test engineers are not trained to find linguistic issues. Their focus is on finding bugs in product functionality and reporting these to the engineers. So, what are the hidden costs of localization for this discipline?

Fortunately, in many organizations the test department is aware that translators need support. This often results in technical solutions (tools) that are not necessarily the most effective tools for translators. In many cases, these tools are difficult to use, have high maintenance costs and will not deliver the appropriate quality.

It is common practice, for example, that the test department has a scripting environment that runs the target application in a controlled manner through all possible scenarios, thereby seeing all strings on all screens. The full test may take more than a couple of days per language, depending on the complexity of the product that is tested. Imagine that in this example, as a by-product of the scripting process, the engineers were able to quite easily fill a database that contains the information on which screen each string is used, including screenshots. The next step is a small tool - or web application - where the translator can enter a screen identifier and see all relevant contextual information. Unfortunately, in this environment the translator cannot see the translation immediately in the translated context or cannot detect if the translation is clipped. First, the translations need to be sent to the engineers to be imported in the software build, and then the scripting environment shall rerun the scripts to generate the target screenshots. This process may take some days before the translator can continue his task. The translator provides feedback to the test engineers that he or she wants to be able to do the translations in context. To support this, the engineers build in an OCR feature that will recognize strings on a bitmap and overwrite these with a target translation. Great, but later it will become clear that this feature does not work for strings that are underlined. Of course, the engineers have a solution for this, and slowly but surely the small application grows and grows.

The key point here is that applications developed in-house by test engineers focus on functional rather than linguistic testing. In general, these in-house tools will not have the state-of-the-art features that professional software localization tools have, such as a connection to a translation memory or support for fuzzy matching or concordance searches. The challenge is how to integrate third-party software localization tools into the software development process.

Burden on translators

A translation may break the code if, for example, the target text does not contain a variable that was used in the source, such as {0}. During execution, the application will throw an exception when it tries to substitute the non-existing variable with a value.

There are two ways of dealing with this:

  • Full freedom to the translators. As a consequence, before the product can be shipped to the market, expensive tests have to be executed per language.
  • Restrict the translators only to make changes to strings that are not in scope for the new feature set. This decision makes sense from a technical and project management point of view, but may degrade the linguistic quality of the final product. As a consequence, the final approver of the product may decide not to ship before the inconsistencies have been resolved. Developers can implement scripts to supervise translators only changing in-scope labels.

A software localization translation environment could have prevented this kind of error. Technically, this prevention can be done by a quality assurance (QA) component. Such a feature would, for example, be able to compare the translated string against the source and detect a missing variable. Another method of prevention is more process-oriented, by letting the project manager mark the not in-scope strings as read-only. This will prevent the translator from touching them.

Suboptimal localization support is a high burden on the daily work of translators and the translation project manager. After the translator has finished the translation, it may take a couple of days until the engineers have processed the translations and generated the target materials. The translator reviews and adapts the materials, and then the cycle is repeated. As a result, both translators and engineers are continuously interrupted and cannot focus on one task. This is highly unproductive.

Each customer, or even a separate development group within a company, has different processes and architectures, each requiring a different translation environment for the translator. In many cases, the translation environment is more or less dictated by the development department. Translators need training for each environment, which is a hidden cost. This would not have been the case if the translators had a uniform translation environment that supports most architectures and processes. Switching tasks consumes a lot of time. In some cases, the translator has to perform a re-installation of in-house developed solutions when new materials become available.

After translations are ready, compiled into the application and tested by the functional testers, the strings can be visualized in-context. Linguistic errors may be found that were not determined by the functional testers - for example, issues with bidirectional languages. Worst case, this can have a major impact on design and increase time-to-market. It could even lead to a decision not to release the product for these languages. In general, a bug that is found late in the process results in significant costs.

An in-country specialist will review the product for the specific translation. These product specialists have a full agenda, and reviewing is certainly not their core activity. It is important that the review process be done in an efficient manner. Ideally, the specialist has the flexibility to determine the moments in time by himself or herself. Inefficiencies in this process result in a waste of time and money for the specialist. The specialist needs clear instructions on how to navigate quickly through the application and which screens to inspect.

Tools for software localization

A software localization tool such as SDL Passolo or CATALYST offers a translation environment that satisfies the high-level requirements at first impression. Resources can be visualized for most commonly used architectures as was illustrated in Figure 1 with the Notepad example. It provides the user an environment where the translation can be done in context - user sees both source and target - and the tool will also validate whether strings fit. Besides that, a QA component is available to do a large number of quality checks. It is easy to navigate quickly between screens in the project, but these tools do not provide a scripting engine that will guide users through the screens.

Besides the lack of a navigation feature, the current software localization tools do not support the following:

  • Dynamic content: Localization tools can display strings in the context of a window that the code generates on-the-fly. The window resources that will be loaded in the software localization tool do not refer to these strings. Consequently, the tool cannot display these strings in-context.
  • Proprietary resources: Most software architectures are usually supported, but what about older or proprietary architectures? Many organizations still have large amounts of resources that cannot be visualized in today’s out-of-the-box software localization tools.

The tool vendors cannot be blamed for this because it mainly depends on the way that the engineers created the resources. Fortunately, there are possible workarounds that provide great value, but this will require an additional investment. Business cases demonstrate a substantial return on investment.

Recommendations

Software localization costs are often of significant magnitude. Many of these costs are hidden in the development organization and are difficult to quantify.

The translation cost per word is easy to quantify and is typically where LSPs compete. QA and quality of terminology are often used where different agencies try to distinguish themselves. I do not argue that these issues are not important. On the contrary, my question is: How can someone deliver a good software translation quality when the inputs are not optimal? The only way to deliver quality in that situation is to expend an enormous effort on testing. You will literally need to test the quality into the product.

In most cases, the burden on testing is on the customer's testing department, since it has to deliver all the required materials to the translators. Therefore, my thesis is that the real costs for software localization are mainly paid by the development departments, even without them knowing it.

There are ways to reduce these costs significantly. My most important recommendations are:

  • Involve the LSP or a localization project manager in an early stage of the development process. The LSP provides requirements from a linguistic perspective, which will have implications on the way that an application will be developed.
  • Use third-party software localization tools for their powerful linguistic features. Try to avoid in-house solutions.
  • Run a pilot in an early stage. Find issues and implement QA solutions.

There is a huge opportunity for LSPs to offer new value adding services for their customers during the early phases of a project in order to improve development processes.

Independently as well as through his association with RIGI Technologies, Henk Boxma provides his client base with localization and internationalization expertise.