Open-source Software and Localization

An introduction to OSS and
its impact on the language industry


FRANK BERGMANN


Open-source software (OSS) is already part of the mainstream information technology. Most medium-sized and large companies in the world are already using it in some way or another. Apart from being cheaper, OSS is considered to be more secure and more flexible than its commercial counterparts. Corporate customers love the independence from a particular software vendor and the possibility to customize the software to the company's needs, thus making it difficult for closed-software providers to compete with OSS.

However, OSS just recently became the candidate for "the next big thing" in the IT industry, the driver of a major wave of change that might radically alter the market forces, comparable only to the introduction of the PC or the internet. But this time, the revolution is not that much about technology, but about the business models of the IT companies. This article explores some of these potential changes and how they might affect localization customers and providers.

The Rise of Open-source Software

Before starting to discuss the impact of OSS on the software localization process, we need to understand how OSS went from its roots to conquest of the corporate world. OSS was "born" in the 1960s and 1970s in the university and research environment. Researchers started to use computer programs for their activities and, working in a non-competitive environment, began to share the resulting computer programs among them just as they shared their research findings. These groups of collaborating software developers are today known as "open-source developer communities."

These early open-source developers wanted to make sure, however, that they were recognized as the authors of the code, in a manner similar to the scientific system of quoting research publications. So the GNU Public License (GPL) software license emerged, implementing the scientific citation rule in the domain of intellectual property rights. The GPL advocates that everybody can use, modify and redistribute "GPLed" software, provided that the initial authorship information is maintained. However, modifications and additions to GPLed software are GPLed again, creating what is known today as a "viral effect." The GPL "infects" other code when combined so that the body of OSS grows and grows.


References & Resources
"A Brief History of Free/Open Source Software Movement": http://www.openknowledge.org/writing/open-source/scb/brief-open-source-history.html

European Commission IDABC Open Source Observatory: http://europa.eu.int/idabc/en/chapter/452

GNU gettext Utilities: http://www.gnu.org/software/gettext/manual/html_mono/gettext.html

GNU General Public License: http://www.gnu.org/copyleft/gpl.html

"Governments Mull Open Source": http://www.businessempowered.com/issues/2004/03/en/dept_shortcuts.shtml#opensource

KBabel: http://i18n.kde.org/tools/kbabel

Linux Online: http://www.linux.org

"Open, closed: Novell's 'mixed source' software": http://star-techcentral.com/tech/story.asp?file=/2004/9/10/technology/8872977&sec=technology

OpenOffice.org: http://www.openoffice.org

The L10N-Framework of OpenOffice.org: http://l10n.openoffice.org/L10N_Framework/index.html

OpenOffice Localization Pilot Process: http://l10n.openoffice.org/localization/L10n_pilotprocess.html

International Institute of Infonomics FLOSS Final Report: http://www.infonomics.nl/FLOSS/report

Project/Open: http://www.project-open.com

OSS Leaves the Academic Niche

A major breakthrough for OSS came with the advent of the dot.com boom. The internet initially developed in research institutions, and most of it is based on OSS. The first industry-strength versions of Linux also appeared during this time, creating an ideal environment for the young entrepreneurs. So, it is no surprise that many startups during the dot.com boom used the readily available OSS as a base for their businesses. Google, eBay, Yahoo! and Amazon are all still using this infrastructure.

Another breakthrough came with the need of these first OSS companies to support and maintain their software. They started to outsource these services to other companies, effectively creating a market for the first Linux distribution companies such as Red Hat and SuSE. The business model of these companies is based on selling professional services around the free OSS product.

The support work of these companies contributed to the quality of the OSS, lifting it into the same quality dimension as its closed-source competitors. And the availability of professional services made OSS an attractive choice for companies of all sizes that had to slash costs after the dot.com bust.

Finally, another important wave of change is just starting: OSS-based companies have started to offer "mixed-source" software, extending OSS with proprietary functionality. These companies use OSS merely as a base, while providing the same service level to their customers as their closed-source competitors. As a result, the marketing muscles of these companies now push OSS. The most famous examples in this field are IBM and Novell with their Linux strategy and Sun Microsystems with its StarOffice/OpenOffice and Java Desktop products.

The "Pure OSS" Localization Market

But what is the localization market that is created by these new players going to look like? To answer this question we are going to differentiate between "pure" and "mixed-source" OSS.

Looking at the localization needs of "pure" open-source developer communities, we may find that these communities are not very attractive customers because they do not earn any revenues from their software products. Instead, they have to rely on volunteers from within the open-source community in the same way as they rely on volunteers for software development. The quality of these translations is, in general, not as high as in closed-source software. However, this situation actually stimulates unhappy users to participate in the open-source project and to contribute an improved translation.

There are, however, some notable exceptions to this system — namely when OSS customers are willing to pay for a professional localization. In particular, this is the case in the public sector where government agencies around the world seem to favor OSS over proprietary software. There are bodies in the European Union facilitating these efforts, so we may expect an increasing standardization in the products being employed and a need for professional localization.

The Mixed-source Localization Market

The situation is more promising in the realm of mixed-source companies that somehow combine OSS with proprietary software in order to deliver a professional product to the market. These companies need to provide high-quality localizations and have a budget and an organization in place to provide this service. For instance, Melissa Biggs from the globalization engineering group at Sun Microsystems said in a telephone interview that the "localization processes for OpenOffice are basically the same as for other Sun products."

Mixed-source companies can also rely, however, on the localization volunteers from the open-source community, depending on quality and completeness requirements and the available budget. The Sun globalization engineering group, for instance, has started a "Pilot Process" to "improve communication" between the Sun globalization group and the open-source community.

OSS Localization Technology

We are now turning our focus towards the technical resources and skills that a localization company needs in order to enter the OSS localization market. Next, I will illustrate the localization architectures of three very different OSS applications: Linux is an operating system, OpenOffice is a desktop application similar to Microsoft Office, and Project/Open is a web-based application.

The three systems are also different with respect to the localization organization, with Linux being a "pure" OSS and localization by community volunteers, OpenOffice localization management split depending on the language (Sun manages ten languages, the open-source community the rest) and Project/Open localization split depending on application modules.

Common to all three systems is that their localization processes are considerably different from the ones used for standard Windows applications. Every system comes with its own set of localization tools and philosophy, requiring a considerable learning effort from a potential localization provider.

Linux Localization

Linux is probably the best-known open-source product. Linux servers represent 15.6% of the 2003 overall server market with growth rates of 40% annually, according to IDC. Linux is currently localized into some 73 languages.

The Linux localization software architecture is based on the GNU "gettext" tool suite, together with a range of gettext compatible translator's tools such as KBabel, PO-Edit, GTranslator and others. Gettext allows identifying translatable strings in the Linux source code and extracting them into a format suitable for KBabel and the other localization tools. This localization architecture is shared by the majority of open-source projects, forming the de facto standard in open-source related localization.

KBabel main translation screen


KBabel Catalog Screen allows keeping up with translation in large projects


KBabel directory for basic terminology maintenance

The quality requirements for the Linux operating system and server software in general are low because most Linux users are system administrators with a high level of English. Also, users of open-source software typically don't expect a very high level of translation quality and completeness.

The localization "market" of gettext is organized as groups of volunteers from the target language countries. Most of these volunteers are university students who are using the software for their own purposes.

OpenOffice Localization

OpenOffice is an open-source office suite similar to Microsoft Office, including applications such as word processor, spreadsheet, presentations and drawings. OpenOffice has been localized into 25 languages and has been downloaded by more than 16 million users. OpenOffice is an open-source variant of Sun Microsystems StarOffice product and is localized under the organizational umbrella of Sun.

The OpenOffice localization architecture is similar to the GNU gettext architecture explained earlier. A specific localization tool called "localize.pl" is used to extract translatable strings from the source code. This list can be converted into the gettext format suitable for KBabel or into a format suitable for Trados and other translation memories.

The localization quality requirements for OpenOffice depend on each language. OpenOffice inherits the professional localization of the ten languages under the responsibility of Sun's G11N Engineering Group (French, Italian, German, Spanish, Swedish, Brazilian Portuguese, Japanese, Korean, Simplified Chinese and Traditional Chinese). Several open-source groups consisting of volunteers handle the translation of the remaining languages.

OpenOffice is currently developing a "Localization Pilot Process" to involve the open-source community in the localization process, probably with the goal of cutting costs. This process will reduce the need for professional localization outsourcing, if successful.

Project/Translation Localization

Project/Translation is a web-based project management and workflow system specifically designed for translation and localization companies. Project/Translation is "mixed source" software because most of its modules are open source, while a company provides professional services and extension modules.



Being a typical web-based application, Project/Translation can rely on a relational database to store its localization strings. This organization allows Project/Translation to provide several localization tools via a web interface. In particular, it supports a "translation mode" (see screenshots) that allows for online translations within the application context, similar to the CATALYST and PASSOLO resource editors.

The quality requirements for such a mixed-source web application are in line with industry standards.

Members of the open-source community are currently carrying out most of the translation work of the open-source modules. The localization of the closed-source modules is outsourced to professional translators.

Conclusion

Open-source software localization is probably not an interesting mainstream localization market yet, and pure OSS will probably never be. However, the overall share of OSS is growing fast and mixed-source localization will become an interesting market in the near future.

Companies that are determined to enter this market will need considerable in-house technology resources. Getting involved in a particular OSS project may be a good start to investigating the new terrain.


Frank Bergmann is a localization consultant and founder of Project/Open. He can be reached at frank.bergmann@project-open.com


This article reprinted from #70 Volume 16 Issue 2 of
MultiLingual Computing & Technology published by MultiLingual Computing, Inc., 319 North First Ave., Sandpoint, Idaho, USA, 208-263-8178, Fax: 208-263-6310.