Language negotiation and localization of server-generated messages

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Language negotiation and localization of server-generated messages

Daniel Rall
Lately I've been thinking about language negotiation and localization of
server-generated messages.  Over time, there's been a good deal of
discussion around this topic on the dev list, and Erik Huelsmann was
good enough to capture the majority of it in a document named "l10n-
problems" (r13271).  Recently I started chatting with people on IRC
about this problem, reading list archives, etc. and have both added some
more information to that document, and laid out a set of requirements
and possible strategy for server-side L10N.

http://svn.collab.net/repos/svn/trunk/notes/l10n-problems

I'd be interested in hearing feedback on the current "Translations on
the server" section of the document:

--- snip ---
On systems which define the LC_MESSAGES constant, setlocale() can be used
to set string translation for all (error) strings even those outside
the Subversion domain.

Windows doesn't define LC_MESSAGES.  Instead GNU gettext uses the environ-
ment variables LANGUAGE, LC_ALL, LC_MESSAGES and LANG (in that order) to
find out what language to translate to.  If none of these are defined, the
system and user default locales are queried.  Though setting one of
the aforementioned variables before starting the server will avoid
localization by Subversion to the default locale, messages generated
by the system itself are likely to still be in its default locale
(they are on Windows).

While systems which have the LC_MESSAGES flag (or setenv() - of which
Windows has neither) allow languages to be switched at run time, this cannot
be done portably.

Any attempt to use setlocale() in an Apache environment may conflict
with settings other modules expect to be setup (even when using a
prefork MPM).  On the svnserve side, having no portable way to change
languages dynamically means that the environment has to be set up
correctly from the start.  Given that, the svnserve protocol doesn't
yet support content negotiation

In other words, there is no way -- programmatically -- to ensure that
messages are served in any specific language using a traditional
gettext implementation.  Current consensus is that gettext must be
replaced on the server side with a more flexible implementation.

Server requirement(s):
 - Language negotiation on a per-client session basis.
 - Avoid contamination of environment used by other code.

I18N requirement(s):
 - Cross-platform.
 - Interoperable with gettext() tools.

I18N nice-to-have(s):
 - gettext()-like API.

Possible implementation:
 - Based around a new gettext-like module with per-struct or
   per-thread locale mutator functions and storage for name/value
   pairs (a glorified apr_hash_t).  Nicol?s Lichtmaier wrote something
   along these lines already
   <http://svn.haxx.se/dev/archive-2004-04/0788.shtml>.
 - Language used by httpd/mod_dav_svn derived from the Accept-Language
   HTTP header, and setup by mod_negotiation.
 - Language used by svnserve derived from additions to the protocol
   which allow for HTTP-style content negotiation on a per-session
   basis.

Investigation?: A brief canvasing of developers (on IRC) indicated that
no thorough investigation of existing solutions which might meet the
above requirements has been done.  This incomplete canvasing may not
paint an accurate picture, however.

Historical note: Original consensus indicated that messages from the
server side should stay untranslated for transmission to the client.
However, client side localization is not an option, because by then
the parameter values have been inserted into the string, meaning that
it can't be looked up in the messages catalogue anymore.  So any
localization must occur on the server, or significantly increase the
complexity of marshalling messages from the server as
unlocalized/unformatted data structures and localizing them on the
client side using some additional wrapper APIs to handle the
unmarshalling and message formatting.
--- snip ---


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Language negotiation and localization of server-generated messages (see URL!)

Daniel Rall
On Thu, 2005-06-02 at 00:26 -0700, Daniel L. Rall wrote:
...
> http://svn.collab.net/repos/svn/trunk/notes/l10n-problems
>
> I'd be interested in hearing feedback on the current "Translations on
> the server" section of the document:
...
[removed stale text]

Please see the above URL for the latest version of the text.



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Language negotiation and localization of server-generated messages (see URL!)

Nicol�Lichtmaier
Daniel L. Rall wrote:

>>http://svn.collab.net/repos/svn/trunk/notes/l10n-problems
>>
>>I'd be interested in hearing feedback on the current "Translations on
>>the server" section of the document:
>>    
>>
>...
>[removed stale text]
>
>Please see the above URL for the latest version of the text.
>  
>

In that document you wrote:

>Nicol?s Lichtmaier wrote something along the lines of the module
>referenced in the "Possible implementation" section
><http://svn.haxx.se/dev/archive-2004-04/0788.shtml>, which has been
>committed to the server-l10n branch.  However, it depends upon the GNU
>gettext .mo format, and the GNU implementation may not be available on
>all platforms (unless re-implemented).  This module will need to be
>enhanced or replaced, ideally completely obviating the need for
>linkage against a platform's own gettext implementation.
>

You seem to imply that my code depends in some way on GNU gettext code.
That's not true. Of course there's a dependency on msgfmt for creating
the .mo files, but that's already the case in Subversion. Using an
accepted file format for message catalogs is not a downside, is a plus!


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Language negotiation and localization of server-generated messages (see URL!)

Peter N. Lundblad
On Thu, 2 Jun 2005, Nicolás Lichtmaier wrote:

> Daniel L. Rall wrote:
>
> >Nicolás Lichtmaier wrote something along the lines of the module
> >referenced in the "Possible implementation" section
> ><http://svn.haxx.se/dev/archive-2004-04/0788.shtml>, which has been
> >committed to the server-l10n branch.  However, it depends upon the GNU
> >gettext .mo format, and the GNU implementation may not be available on
> >all platforms (unless re-implemented).  This module will need to be
> >enhanced or replaced, ideally completely obviating the need for
> >linkage against a platform's own gettext implementation.
> >
>
> You seem to imply that my code depends in some way on GNU gettext code.
> That's not true. Of course there's a dependency on msgfmt for creating
> the .mo files, but that's already the case in Subversion. Using an
> accepted file format for message catalogs is not a downside, is a plus!
>
But is that file format the same in non-GNU gettext implementations? Else
we start depending on GNU gettext, and can't use the systems's gettext
anymore.

Regards,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Language negotiation and localization of server-generated messages (see URL!)

Nicol�Lichtmaier

>>You seem to imply that my code depends in some way on GNU gettext code.
>>That's not true. Of course there's a dependency on msgfmt for creating
>>the .mo files, but that's already the case in Subversion. Using an
>>accepted file format for message catalogs is not a downside, is a plus!
>>    
>>
>But is that file format the same in non-GNU gettext implementations? Else
>we start depending on GNU gettext, and can't use the systems's gettext
>anymore.
>  
>

Oh, you are right. I've missed that. The .mo format is gettext specific.

But I see no other way. System implementations work with a per-process
locale. The only option I see other than forcing GNU's gettext
everywhere is to fork a child for each locale and delegate gettext
invocation to these childs...


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Language negotiation and localization of server-generated messages (see URL!)

Peter N. Lundblad
On Thu, 2 Jun 2005, Nicolás Lichtmaier wrote:

>
> >But is that file format the same in non-GNU gettext implementations? Else
> >we start depending on GNU gettext, and can't use the systems's gettext
> >anymore.
> >
> >
>
> But I see no other way. System implementations work with a per-process
> locale. The only option I see other than forcing GNU's gettext
> everywhere is to fork a child for each locale and delegate gettext
> invocation to these childs...
>
Or invent our own format and write our own msgfmt (we could still use GNU
tools to validate if available). I'm not saying that this is ideal, but
it's another possibility.

I don't knopw how big a problem it is to depend on GNU gettext.  But it
seems fragile to depend on a proprietary .mo format.

Regards,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Language negotiation and localization of server-generated messages (see URL!)

Nicol�Lichtmaier

>>>But is that file format the same in non-GNU gettext implementations? Else
>>>we start depending on GNU gettext, and can't use the systems's gettext
>>>anymore.
>>>      
>>>
>>But I see no other way. System implementations work with a per-process
>>locale. The only option I see other than forcing GNU's gettext
>>everywhere is to fork a child for each locale and delegate gettext
>>invocation to these childs...
>>    
>>
>Or invent our own format and write our own msgfmt (we could still use GNU
>tools to validate if available). I'm not saying that this is ideal, but
>it's another possibility.
>  
>

Why not just including it, as many projects do, and ignore the system
gettexts? Gettext is LGPL, which is totally acceptable for Subversion.

>I don't knopw how big a problem it is to depend on GNU gettext.  But it
>seems fragile to depend on a proprietary .mo format.
>  
>

GNU's .mo format is not prprietary.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Language negotiation and localization of server-generated messages (see URL!)

Michael Sweet
In reply to this post by Peter N. Lundblad
Peter N. Lundblad wrote:

> On Thu, 2 Jun 2005, Nicolás Lichtmaier wrote:
>
>
>>>But is that file format the same in non-GNU gettext implementations? Else
>>>we start depending on GNU gettext, and can't use the systems's gettext
>>>anymore.
>>>
>>>
>>
>>But I see no other way. System implementations work with a per-process
>>locale. The only option I see other than forcing GNU's gettext
>>everywhere is to fork a child for each locale and delegate gettext
>>invocation to these childs...
>>
>
> Or invent our own format and write our own msgfmt (we could still use GNU
> tools to validate if available). I'm not saying that this is ideal, but
> it's another possibility.
>
> I don't knopw how big a problem it is to depend on GNU gettext.  But it
> seems fragile to depend on a proprietary .mo format.

I've actually got some code that reads .po files and compiles to a
compact binary format, which you can then load and use on-the-fly,
even supporting multiple languages from the same thread.  The code
will be part of CUPS 1.2 and I've been using it in flPhoto and the
CUPS DDK for a couple years now.

Let me know if you are interested, I'll be happy to donate to the
cause!

--
______________________________________________________________________
Michael Sweet, Easy Software Products           mike at easysw dot com
Internet Printing and Document Software          http://www.easysw.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Language negotiation and localization of server-generated messages (see URL!)

Daniel Rall
In reply to this post by Peter N. Lundblad
On Thu, 2005-06-02 at 21:57 +0200, Peter N. Lundblad wrote:

>On Thu, 2 Jun 2005, Nicol??s Lichtmaier wrote:
>
>> Daniel L. Rall wrote:
>>
>> >Nicol??s Lichtmaier wrote something along the lines of the module
>> >referenced in the "Possible implementation" section
>> ><http://svn.haxx.se/dev/archive-2004-04/0788.shtml>, which has been
>> >committed to the server-l10n branch.  However, it depends upon the GNU
>> >gettext .mo format, and the GNU implementation may not be available on
>> >all platforms (unless re-implemented).  This module will need to be
>> >enhanced or replaced, ideally completely obviating the need for
>> >linkage against a platform's own gettext implementation.
>> >
>>
>> You seem to imply that my code depends in some way on GNU gettext code.
>> That's not true. Of course there's a dependency on msgfmt for creating
>> the .mo files, but that's already the case in Subversion. Using an
>> accepted file format for message catalogs is not a downside, is a plus!
>>
>But is that file format the same in non-GNU gettext implementations? Else
>we start depending on GNU gettext, and can't use the systems's gettext
>anymore.

Peter's clarification here is where the statement upon GNU gettext
dependency -- on the package which supplies msgfmt, rather than the API
-- originally came from.  Do you know whether the format of the .mo
files created by GNU msgfmt is "standard"?

Here's a somewhat relevant section from configure.in:

USE_NLS="no"
if test "$enable_nls" = "yes"; then
  dnl First, check to see if there is a working msgfmt.
  AC_PATH_PROG(MSGFMT, msgfmt, none)
  AC_PATH_PROG(MSGMERGE, msgmerge, none)
  AC_PATH_PROG(XGETTEXT, xgettext, none)
  if test "$MSGFMT" != "none"; then
    AC_SEARCH_LIBS(bindtextdomain, [intl], [],
                   [
                    AC_MSG_WARN([bindtextdomain() not found.  Disabling
NLS.])
                    enable_nls="no"
                   ])
    if test "$enable_nls" = "yes"; then
      AC_DEFINE(ENABLE_NLS, 1,
                [Define to 1 if translation of program messages to the
user's
                 native language is requested.])
      USE_NLS="yes"
    fi
  fi
fi



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Language negotiation and localization of server-generated messages (see URL!)

Daniel Rall
In reply to this post by Michael Sweet
On Thu, 2005-06-02 at 17:08 -0400, Michael Sweet wrote:
...
>I've actually got some code that reads .po files and compiles to a
>compact binary format, which you can then load and use on-the-fly,
>even supporting multiple languages from the same thread.

Does it use the same format as the .mo files, which are easily mmap'd
into shared memory?



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Language negotiation and localization of server-generated messages (see URL!)

Michael Sweet
Daniel Rall wrote:

> On Thu, 2005-06-02 at 17:08 -0400, Michael Sweet wrote:
> ...
>
>>I've actually got some code that reads .po files and compiles to a
>>compact binary format, which you can then load and use on-the-fly,
>>even supporting multiple languages from the same thread.
>
>
> Does it use the same format as the .mo files, which are easily mmap'd
> into shared memory?

Since the .mo format isn't documented (at least not that I've found,
anyways), my code uses its own binary format that is "compiled" from
the .po files.  It is conceivable that you could mmap() the files,
however I've avoided any non-portable constructs and just read the
files using stdio functions.

--
______________________________________________________________________
Michael Sweet, Easy Software Products           mike at easysw dot com
Internet Printing and Publishing Software        http://www.easysw.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]