Friday, March 30, 2007

Evolving translations & terminology - the open source way

By G Karunakar, Ravishankar Shrivastava

Translation has traditionally been implied with translating literary text, publicity information, official documents, books, news, communication, technology etc. While in the former modes the vocabulary was sufficient or could represent new objects & subjects, there has been lack of it for technology evolving at a fast pace. Traditional language resources have not had the vocabulary to be used with the newer inventions & discoveries happening day by day. Post independence there were efforts to compile and define terminology dictionaries, while being comprehensive covering many areas of modern science, technology & society, they fell short of being widely accepted. With the advent of computers and IT revolution, there has been a new set of vocabulary getting in common usage, which has kind of gone untouched by translation process until recently.

Software interfaces have traditionally been English based because their evolution has primarily been in English-speaking locales. Though after the availability of more advanced programming techniques and a increasingly commercial market for software products forming globally there has been a rise in multilingual interfaces to software, which have primarily been brought by market demand or by government dictum (that software sold in a specific state/country be available in the national language). Unavailability of software in local language has created a kind of digital divide. But an important aspect of the divide being that there suitable terminology around the new technologies has not evolved at the same pace as the technology itself. And with respect to Indian languages the problem has been more acute since there has not been much of early concerted effort to evolve terminology for software interfaces in IT domain.

There have been independent efforts in evolving terminology but not in a inclusive way to set a standard for it. Even if available they have failed to get wide acceptance due to different reasons. So translating IT terminology into Indian languages has been left to the market players, who have made their own versions often differing in the translation and coinage of words for new terms. An example being Indian language interfaces of proprietary products like Windows XP, Lotus notes and other Indian language tools available from different commercial vendors.

New in this domain has been Free & Open source software which by its inherent philosophy allows easy modification by anyone interested. Since the source code is available freely (as in allowing modification and re-distribution), open-source software has been translated to many languages in the world, and also available in many Indian languages. This paper will outline the learning of opens-source based translation efforts.

The Free & Open Source model

The FOSS model is different in its functioning than a traditional development model, in that its more open in its scope & operation than a closed source one. While not going into technical details, the important perspective offered by FOSS model is its collaborative ways of working, peer review and philosophy of sharing.

Internet plays the role of a backbone and glue to make it all possible. In brief life-cycle of FOSS project is like below

  • Project begins by individual effort/motivation
  • Initial version released in open license - where source code is available for anyone to try and use.
  • More people get involved starting as users & then as contributors since they find it useful & interesting
  • With more contributions & testing the project matures.
  • In time reaches a point where it can sustain itself, with the significant no of contributors
  • The open & inclusive model means anyone with requisite skills can join to contribute in different ways
  • like writing source code
  • testing
  • writing documentation
  • advocacy
  • translations

An important aspect of FOSS model is that the contributors come from varied geography, with different cultural backgrounds, languages, so the obvious fallout of it being that the FOSS software has multilingual capabilities by design. And the localization process is simple enough for anyone to contribute to translation efforts. Our interest is in the last part about how translation or creating lexical resources can benefit from a FOSS model.

FOSS translation model

FOSS localization has a good working model based on message catalogs, where all translatable strings of a software are extracted into a catalog, more like a database, which is then translated into multiple language, with per language catalogs.

Typically there is a translation team for a language, which coordinates translation effort for that language. They keep a track of translatable catalogs coming in & complete translations based on work already done before. Tools available provide for rough translation and tracking changes across catalogs.

Message catalogs

Message catalog, also called Portable Object (PO), contains string pairs of original english string and its translation. An untranslated pair will have an empty translation entry. String duplication is avoided by having a common string pair for all similar occurrences.

Across multiple files translations can be reused, ie if terms like "File", "Edit", "Save as" etc occur in multiple catalogs, an existing translation can be reused, rather than creating a new one, in a way keeping consistency. Catalogs can be updated for changes in strings & translations. A PO file also contains a header for meta-information like last translator, file creation date, updation date, language etc.

Processes

The translation process starts by the formation of a language team. typically this is driven by a loose team of motivated individuals. The team starts translation by taking up essential parts of a software, typically libraries which are used by other application components. If its a new language with no existing translation history nor any standardized glossary available, then it has to start the painstaking effort of setting up a standard glossary by a constant process of review and learning.

The first version of translations may not be the best, but gives a taste of efforts needed apart from exposing the problems in terminology evolution. The successive revision and translations can take this experience into account and improve upon that. Since the translations are also publicly available its easy to get comments on terminology and feedback from users. The same can then be incorporated into future activity. This cycle of feedback, learning and innovation ensures that over time once the efforts reach a maturity level (say by working for 1-2 years or completing a big volume of translations) a standard is reached in the translation terminology and quality is consistent.

Tools

The tools make all the difference. Tools allow for collaborative working and reusing existing work. Broadly the tools can be classified as

  • Offline - where translator works on own computer not connected to internet, updates are local

  • Online - where multiple translators can work on common pool of strings while being connected to internet or over a local network.
  • Communication tools like email, instant messaging, forums and mailing list are available for discussing translation issues & distributing & coordinating work among team members.

Kbabel

Its the most popular & powerful tool used in translations. It supports managing multiple catalogs. A simple & fast interface gives facility to add translations for new strings, with support for rough translations based on existing translations. It can build a database of existing translations, to give suggestions for untranslated entries, which the translator can choose to use. It also checks for common technical errors in translations, apart from giving facility to integrate with a spell checker. While the tool itself is used at individual level, duplication of work and consistency is achieved by coordinating distribution of work and sharing of translation memory (also called PO compendium) among the translation team.

Entrans

An online translation tool, where message catalogs can be uploaded onto the server and a common pool of strings formed, which still keeping them grouped based on their source. It provides for the following

  • Registration for translators & validators
  • Anonymous users visiting site can go through strings and suggest translations and also vote for suggestions
  • Rough translations automatically get generated based on existing translations, which are available as machine suggestions.
  • Translators can add new translations which get included as suggestions.
  • Translators given the role of validator can select from multiple suggestions and approve translations
  • When translating / reviewing a string the interface shows multiple options of which one can be selected & approved or a different one provided.
  • A quick lookup search is provided for looking up translations for common words & phrases. This tool is very useful if there are large number of interested contributors who can make small contributions regularly in their free time. It also avoids any need of complex setup, all that is needed is a browser and input methods to type in their language, both of which are now available on all platforms. A live installation is available at http://www.indlinux.org/entrans/

Pootle

Pootle is another online translation management tool, allowing multiple teams to work on a common pool of strings drawn from different open source projects. While it does not give rough translation feature, it gives a dictionary lookup of translated terms. Apart from that it has lot of facility & tools to integrated with software projects, details of which are beyond scope. This tools is very easy to setup and can be used standalone, or in a networked environment, its best used for conducting translation marathons - where a large number of people can come and translate in a short period of time, large amount of work. A live setup is available here - http://pootle.wordforge.org/

Case study -

IndLinux experience of translations Taking Indlinux Hindi as an example, here is the quick recount of Linux localization: Somewhere around 2002, Gnome had shown capabilities for initial support for Indic Unicode characters - typically, Hindi. This led to the dream of having full fledged Linux OS in Indic languages. Things were got assembled from ground zero - and bits and pieces were put together to work. From fonts to rendering engines, bugs were everywhere. Work on steady pace was going on for about at least two years without any visible progress in hand. Indlinux was mostly concentrated in Hindi, and later on other team started on their own - taking inputs and help from Indlinux. Ankur Bangla, PunLinux Panjabi, Utkarsh Gujarati etc were such teams. Since translations were done by volunteers from all across India, translated terms vary as per local usage. This led to need of translation workshop that helps weed out translation errors, out of context translations, inconsistencies etc.

Hindi translation got a shot in the arm when Sarai sponsored initial project to translate GNOME 2.0 in Hindi, and later, KDE 3.2 and OpenOffice2.0 Help. Sarai was also pioneer in initiating and pushing Hindi translation workshops to refine translations and to train translators. A Hindi glossary was also evolved to form as a standard for common technical terms used. The workshops were immense help in giving idea -what to translate and what not and coining new words. What we have today that refined Hindi Linux Desktop might not have been possible without the help of Sarai's sponsorships.

From this experience, to get results, two things were clear -

1. Sponsor translator and translation team - at least those who can deliver,

2. Organize workshops to translate / refine existing translation en-mass.

Without these, the achieved progress may not be up to the mark. In the initial days of localization, translators do not have much options for their translation tool. They were bound to use plain odd yudit - the unicode editor. Every string needed to translated and typed at every occurrence - however umpteen times they were repeated. These makes translation works very slow, boring, and repetitive and test-less. Things changed dramatically when translation tools like Gtranslator, Kbabel, Poedit came in picture. As the work grew, with the help of translation database and facility of machine added auto translations, productivity as well as consistency in translation were maintained at high level.

Soon, under Indlinux, a multilingual live Linux CD Rangoli was released that had clearly showed the potential of local language computing. At a later stage, online translation tools like Rosetta, Entrans also made available, but they failed to attract Indic translators much. May be, due to their tedious and time consuming, slow responsive interfaces. Anyway, online translation tools like Entrans were the need of the day, and will ultimately find their way and most translations will be done through these tools in near future.

A comparison of terminology efforts

A small compilation of translated terminology evolved by three independent efforts is given below. Due to space limitations only a small set is taken, which may not prove well benefits of open source process, since language being same, translations can ultimately emerge to be similar or same through constant process of review.

The comparison is between user interface terminology from Hindi version of Windows XP, with respect to IT terms glossary provided by CSTT-CDAC and the translations done by IndLinux Hindi translation effort. Notes: Blank entries indicate either there was no translation listed or could not be searched upon easily or an occurrence not found in interface reviewed.

XP Hindi CSTT IndLinux

Accessories सहायक उपकरण सहयंत्र सहायक उपकरण

Active सक्रिय सक्रिय सक्रिय

Add जोड़ें योजी संकलन जोड़ें

Address bar पता पट्टी - पता पट्टी

Address book पता पुस्तिका पता पुस्तक पता पुस्तिका

Align संरेखित संरेखण कतारबद्ध,पंक्तिबद्ध

Appearance प्रकटन - रूप, प्रकटन

Apply लागू करें - लागू करें

Apply लागू - प्रयोग करें

Arrange व्यवस्थित - जमाएँ

Automatic स्वचलित स्वचालित स्वचलित

Back वापस पश्च वापस, पिछला

Background पृष्ठ भूमि पृष्ठ भूमि पृष्ठभूमि

Bar पट्टी स्तम्भ, रेखिका पट्टी

Bookmark पसंद पृष्ठित स्मृति पुस्तचिह्न, पसंद

Cancel रद्द करें निरसन रद्द

Centred केंद्रित - -

Change बदलें - बदलें

Clear साफ करे रिक्त साफ करें

Close बंद बन्द बंद करें

Comments टिप्पणियां टिप्पणी टिप्पणियाँ

Control panel नियंत्रण कक्ष नियंत्रण पट्टिका नियंत्रण कक्ष

Copy प्रतिलिपि प्रतिलिपि प्रतिलिपि, नकल

Customize अनुकूलित करें - फ़रमाइशें

Default डिफ़ॉल्ट डिफ़ॉल्ट, तयशुदा, सुनिश्चित

Details विवरण विस्तार, विवरण विवरण

Display प्रकटन प्रदर्श प्रदर्शक, प्रदर्शन

Document दस्तावेज़ प्रलेख दस्तावेज़

Edit संपादन संपादन सम्पादन

Empty रिक्त रिक्त खाली, रिक्त

Existing मौजूदा विद्यमान मौजूदा

Exit बाहर निकलें निर्गम बाहर, निर्गम, निकास

Expand विस्तारित विस्तारित फैलाएँ, विस्तार

Favourites पसंद - पसंदीदा पसंदीदा

Find ढूंढे अन्वेषण ढूंढें

File फ़ाइल संचिका फ़ाइल

Format bar स्वरूप पट्टी संरूप -

Forward अग्रेषित करें अग्र आगे बढाएँ, अग्रेषित

Go to इस पर जाएं - यहां पर जाएँ, जाएँ

Help मदद सहायता मदद

Home मुख आमुख, निजी घर, आशियाना, शुरुआत

Home page मुख पृष्ठ आमुख पृष्ठ, मूल स्थान मुख पृष्ठ

Icons चिह्न - प्रतीक, चिह्न

Invert selection चयन पलटें - चयन पलटें, चुने हुए को उलटें

Keyboard कुंजीपटल कुंजीपटल, कीबोर्ड कीबोर्ड, कुंजीपट

List सूची सूची सूची

Lock अवरोधित पाश, लॉक ताला

Notification सूचनाएं - अधिसूचना

Ok ठीक - ठीक

Open खोलें खुला खोलें,खोलना,खुला,मुक्त

Option विकल्प विकल्प विकल्प

Page setup पृष्ठ सेटअप पृष्ठ पृष्ठ सेटअप, पृष्ठ विन्यास

Paragraph अनुच्छेद पैराग्राफ अनुच्छेद, पैरागाफ

Paste special - - विशेष चिपकायें

Print मुद्रण - छापें,छपाई,मुद्रण

Print preview मुद्रण पूर्वावलोकन मुद्रण छपाई नमूना, छपाई पूर्वावलोकन

Property गुण - गुण

Re do दोहराएं - दोहराएँ, पुनः करें, फिर से करें

Refresh ताज़ा करें पुनश्चर्या ताज़ा करें

Registered पंजीकृत पंजीकृत पंजीकृत

Remove हटाएं - मिटाएँ

Rename नाम बदलें पुन:नामकरण पुनर्नामकरण,नाम बदलें,नया नाम

Replace बदलें - बदलें

Restart पुनरारंभ - फिर प्रारंभ करें

Restore पुनर्स्थापित करें पुन:स्थापन बहाल करें, पुरानी स्थिति में लाएं,

Ruler मापनी - -

Save सहेजें सुरक्षित करो सहेजें

Save as इस रूप में सहेजें - ऐसे सहेजें

Search खोजें खोज ढूंढें, खोजें

Security सुरक्षा सुरक्षा सुरक्षा

Select all सभी का चयन करें - सभी चुनें, सबको चुने

Send to भेजें - -

Share साझा शेयर साझा, साझेदारी

Shared साझा - साझा, साझेदारी

Standard मानक मानक प्रमाणिक,मानक,प्रामाणिक

Start प्रारंभ प्रारंभ प्रारंभ,शुरू,चालू

Start menu प्रारंभ मेनू - स्टार्ट मेन्यू

Status bar स्थिति पट्टी अवस्थिति स्थिति पट्टी

Stop रूकें विराम रोकें, रुकें, बन्द

Taskbar कार्य पट्टी - कार्यपट्टी

Text wrap पाठ लपेटे - पाठ लपेटें

Tip युक्ति - युक्ति, संकेत

Tools उपकरण उपकरण उपकरण

Un do पूर्ववत करें - पहले जैसा

Untitled अनामांकित - बेनाम,अनाम

Up ऊपर - ऊपर

Up one level एक स्तर ऊपर - एक स्तर ऊपर

Urgent अत्यावश्यक - अत्यावश्यक, तत्काल

User उपयोगकर्ता उपयोगकर्ता, प्रयोक्ता उपयोक्ता, प्रयोक्ता

View दृष्य दृश्य देखें, दिखाएँ, दर्शन, नजारा

Welcome सुस्वागतम - सुस्वागतम, स्वागतम्

Window विंडो विंडो विंडो

Zip संपीडित संकुचित संपीडित

Zoom जूम जूम जूम, छोटा-बडा करें

Conclusion

While what has been discussed may not be conclusive enough to prove that open source model of translation is better than a closed one, but the model and tools in offer give flexibility for innovation and improving the process of building translations, lexicons etc. A case in example are wiki and content management systems which allow anyone to post , review content, with moderation strings attached. Eg. Wikipedia which has quickly become the largest online encyclopedia. Also Wiktionary.org which bases itself on wiki and is evolving a multilingual lexicon and dictionary. The open source tools like Entrans, Pootle etc can be customized to build dictionaries or lexicons which can be development in a collaborative way, drawing in a large pool of contributors. While a closed door model may bring in good quality by having experts do the work, open model benefits from the volumes and that its the users of the final product have a say from the beginning, which means the lexicon has a wider acceptance and is more in tune with the prevalent language usage and vocabulary. While it may not satisfy the purist's view, but ultimately if a language is not open to accept & accommodate it may well be on its path to oblivion.

Tag ,,,

Add to your del.icio.usdel.icio.us Digg this storyDigg this

No comments:

More Articles...

Translate in your own language

Want to translate this article in your own language? Just click the Flag below