Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multilingual support #5

Open
azerupi opened this issue Jul 29, 2015 · 63 comments · May be fixed by #1201 or #1306
Open

Add multilingual support #5

azerupi opened this issue Jul 29, 2015 · 63 comments · May be fixed by #1201 or #1306
Labels
A-Internal-representation Area: Internal Representation A-Localization Area: Localization, language support, etc. C-enhancement Category: Enhancement or feature request M-Discussion Meta: Discussion S-Wishlist Status: Wishlist

Comments

@azerupi
Copy link
Contributor

azerupi commented Jul 29, 2015

Add support for multiple languages.

@FuGangqiang
Copy link
Contributor

multiple languages for document?

@azerupi
Copy link
Contributor Author

azerupi commented Aug 12, 2015

Yes, I think Gitbook does support something like that.

Instead of having the markdown files directly in the source folder you would have some sub folders like this:

src/
├── de
├── en
└── fr

And their would be an easy way to change the language in the rendered book.

It's definitely something I would like to add, but it's not the highest priority at the moment

@azerupi
Copy link
Contributor Author

azerupi commented Jan 12, 2016

Multiple designs possible:

  • One SUMMARY.md to rule them all

    pros:

    • Changes in structure are reflected in all languages immediately
    • 1 to 1 mapping from pages from one language to another, would allow changing the language of the page directly from a menu button

    cons:

    • If one language is lagging behind it's going to get ugly
  • One SUMMARY.md for every language

    pros:

    • Every language can have it's own pace

    cons:

    • Does not push all languages to be up to date / coherent
    • No 1 to 1 mapping guarantee and thus not possible to toggle the language from a page without having the risk that the page does not exist in the other language

@mkpankov
Copy link

I don't think one SUMMARY.md for everything is a good idea. I consider consistency within translated version more important than consistency with original. Otherwise, we can easily start having broken links because upstream renamed some chapter and translation didn't, yet. I believe a book that has no broken links is the minimum standard.

Also, I don't support the idea of "pushing" to be up-to-date. AFAIK, translations (not only ours) are done by enthusiasts and it's not always possible to keep up at all times.

Moreover, 1 to 1 mapping of pages doesn't look straightforward to me, even in case there's single SUMMARY. Words have different length in different languages, and in Russian translation we consistently have sentences that are noticeably longer than original. But I'd love to have it so that one click can show the same point in text in original language.

I think this can be handled by tracking 1-to-1 mapping of paragraphs - sections aka markdown files are too big. Paragraphs also seem a good candidate because sentences get paraphrased and reordered sometimes, but the paragraphs stay in same order and have same gist.

@azerupi
Copy link
Contributor Author

azerupi commented Jan 13, 2016

Thanks for the input! I really appreciate the feedback :)

Otherwise, we can easily start having broken links because upstream renamed some chapter and translation didn't, yet. I believe a book that has no broken links is the minimum standard.

Moreover, 1 to 1 mapping of pages doesn't look straightforward to me

When I am talking about 1 to 1 mapping I am talking about page to page mapping, not sentence to sentence (that would be insane 😉).

Let's take a hypothetical situation with the Rust book. Let's say I am reading a blog post and it references some chapter in the Rust book, for example the chapter about ownership. But English is not my main language and it would be a lot easier to understand the chapter in my native language. If we have 1 to 1 mapping on page / chapter level the user could then select his language (if it is supported) from a dropdown menu and he would land on the exact same page in his chosen language.

However for this to work correctly we need a guarantee that every page in one language has an equivalent page in the other language. If you allow a different SUMMARY.md per language there is no way to know what pages are equivalent if any equivalent page even exists at all.

Also, I don't support the idea of "pushing" to be up-to-date. AFAIK, translations (not only ours) are done by enthusiasts and it's not always possible to keep up at all times.

Of course, I totally agree with you. But the SUMMARY.md is only about structure, so what order the chapters come in, not the content.

If there is one SUMMARY.md for all languages I think it will only cause trouble if:

  1. New chapters get added, as equivalent chapter in other languages will just be blank until they are translated
  2. The markdown files get renamed, this should not happen often when it does it is not difficult to rename the files accordingly for every language
  3. A reorderering of the chapters where the continuity of the content is broken. This too should not happen often, but it's more challenging to fix as it requires the translators to translate the text that changed

To be honest, once a book has it's definitive structure the SUMMARY.md is not likely to change often unless there is a major rewrite being done.

I think both designs have advantages and drawbacks, we need to figure out which one we want / need the most.


Idea for Rust book workflow when translations are in tree

When / if translations are moved into the official repository we could create a more elaborate pull request process. This is only an idea, it may be flawed 😉

When a pull request is made that contain changes that need translation (e.g. not typos) we could wait to merge the pull request until translations have been made for all officially supported languages.

The pull request could track what translations have been made using a check list like this:

  • Russian
  • French
  • German

Once all the translations are ready the pull request is merged in.
Officially supported languages could be languages with a minimum number of "official" maintainers.

This would add a little / lot of overhead for the english version but it would solve the two big issues with translations.

  1. Translations would always be up to date!
  2. This is probably the easiest way to track changes

There may be organizational problems I haven't considered though. @steveklabnik

@steveklabnik
Copy link
Member

The biggest problem with blocking English changes to non-English changes is that I am paid for my work, but others are not. This places a big burden on them; I'm gonna want to land changes ASAP, and that's not fair to people who can't do this as a day job.

@azerupi
Copy link
Contributor Author

azerupi commented Jan 14, 2016

That's true, didn't think of that.
It could still be applied without blocking the English changes? Just for tracking. Not sure if it's worth the overhead though.

Anyways, do you have a preference for any of the two design choices (one vs. multiple SUMMARY.md)?

@steveklabnik
Copy link
Member

I think I prefer a single for the reasons you've stated, but since I'm not doing the translations themselves, I don't think my opinions matter much :)

And yeah, tracking might be different/better than actually blocking on them landing.

@mkpankov
Copy link

When I am talking about 1 to 1 mapping I am talking about page to page mapping, not sentence to sentence (that would be insane 😉).

Ok, I think what I was trying to say but couldn't get across is this: page-to-page mapping isn't enough for printed versions, as same pages will have different content. And if by page you meant a web page, that is not enough either. Some sections (pages) are tens of screens long, and to provide smooth transition from one version to another we should track smaller units than entire files (web pages).

I originally thought you were talking about printed pages and written the following, but I'm not sure now. For printed versions, depending on length of the section and sentence-length difference with the original, this can very from "I see not the beginning of the paragraph that talks about Foo feature, but the end" to "I don't see the paragraph that talks about Foo feature on screen at all", when linked to "page 83 of PDF".

So let's clarify the terms before continuing as apparently I misunderstood something 😄

@azerupi
Copy link
Contributor Author

azerupi commented Jan 14, 2016

Ok yes, I will try to do my best to explain what I envision:

So in this issue I am not at all talking about tracking any changes for translations, only about how to support multiple languages in the same folder / book.

Before I continue, let's explain what the SUMMARy.md does exactly.

When you render the book (mdbook build) it is going to search for the SUMMARY.md and parse it. The SUMMARY dictates

  • The Order of the chapters.
  • The names of the chapters.
  • The markdown file corresponding to each chapter.

That is the "only" information we get from the SUMMARY.md

If we want to support multiple languages for one book, there are two possible designs (that I thought off):

  • One SUMMARY.md at the root of the source directory that will be used for all languages.
  • SUMMARY.md for every language

Let's see both in more details.

One SUMMARY.md for all languages

Consider this SUMMARY.md for a book:

# Summary

- [hello world](hello-world.md)
- [second chapter](second-chapter.md)

and this directory structure:

├── book
└── src
    ├── en
    │   ├── hello-world.md
    │   └── second-chapter.md
    ├── fr
    │   ├── hello-world.md
    │   └── second-chapter.md
    ├── ru
    │   ├── hello-world.md
    │   └── second-chapter.md
    └── SUMMARY.md

As you can see here, every language has the same markdown files defined in the global SUMMARY.md. This means that the "hello world" chapter has a corresponding page in every language! (1 to 1 mapping)

Advantages

Having a guarantee that every chapter in one language has a corresponding chapter in another language gives us the possibility to change the language from any chapter and land on that same chapter in the other language.

Example: I am reading the "borrowing" chapter of the Rust book. I want to see that same chapter in French. I just select "French" from the dropdown button in the menu-bar and I will land on the French version of the chapter.

Drawbacks

When the SUMMARY.md is modified it can cause some consistency problems in the translations because changes in the SUMMARY.md
will be reflected immediately in all languages. However, changes in the SUMMARY.md should be relatively rare once the book has found it's "final" structure.

Problems that could occur:

  • Chapter is moved: When a chapter is moved (the order of the chapters is rearenged) it could cause problems with text flow.
  • markdown file is renamed: When a markdown file is renamed it should be renamed in all languages and in all the references to it. This should not be too big of a problem.
  • New chapter is added: When a new chapter is added it will appear blank in the other languages until it's translated.

Content is not modified by the SUMMARY.md so any of the designs here is not going to cause any trouble with the content if the SUMMARY.md is modified.

Another drawback is that I am not sure yet how translations will give a translation for the chapter titles in the sidebar (SUMMARY.md). Maybe just take the first heading from the corresponding markdown file?

One SUMMARY.md for EVERY language

Let's consider this directory structure:

├── book
└── src
    ├── en
    │   ├── hello-world.md
    │   ├── second-chapter.md
    │   └── SUMMARY.md
    ├── fr
    │   ├── hello-world.md
    │   └── SUMMARY.md
    └── ru
        ├── hello-world.md
        ├── second-chapter.md
        └── SUMMARY.md

As you can see here, every language has it's own SUMMARY.md and thus can define the order of their chapters and the markdown files as they wish.

There is absolutely no more guarantee that the French version contains the same chapters as the English version. No 1 to 1 mapping. Essentially every language is its own separate book, they could have exactly the same structure or they could have totally different chapters. There is no way for the program to know that.

It is thus impossible to change the language from a chapter. You would have navigate to the French version manually and search for the chapter you were reading if it exists in the French version at all!

Advantages

Translations have a lot more freedom, but this can also be seen as a drawback. Translations do not need to have the same structure, so when the SUMMARY.md is changed in the English version, absolutely nothing is going to change in the other languages. Every change in the translations has to be done manually.

Drawbacks

There is no guarantee that a chapter in one language as an equivalent in another language.(No 1 to 1 mapping) The program can not know what chapters are equivalent in the different languages and it would thus be impossible to change the language from a chapter to land on the same chapter in the other language.


I hope this made it more clear, if there is still something you don't understand I can elaborate more on some specific area. 😉

EDIT: A little quote from a response I made on Rust's internals forum:

And to be honest, if you have different TOCs you essentially have different books. There is little gain to support that, other than being able to group all the translations in one directory and build them in one go.

You can already group the multiple translations in one directory as different books each with it's own SUMMARY.md and book.json and if you configure the source and destination directories correctly there should be minimum trouble to integrate with automatic deployment scripts etc.

@defuz
Copy link

defuz commented Jan 14, 2016

There is no guarantee that a chapter in one language as an equivalent in another language.

Regarding Rust Book translation process, it is not disadvantages of some solution, but simply a fact. I think that the other projects that will use mdBook with multiple languages will have the same problem.

The program can not know what chapters are equivalent in the different languages and it would thus be impossible to change the language from a chapter to land on the same chapter in the other language.

Can we make it simple and assume that the files with the same name in different languages are the same chapter? Then we can give the opportunity to switch to another language. I think this approach will satisfy both cases:

  1. When there is complete consistency between all languages.
  2. When consistency between languages is not complete.

@defuz
Copy link

defuz commented Jan 14, 2016

Also, I don't like the idea that when I read the book in Russian, I'll see TOC in English. I think we should not assume that the reader is familiar enough with the language of original to understand the chapter titles.

@azerupi
Copy link
Contributor Author

azerupi commented Jan 14, 2016

When consistency between languages is not complete.

How would you handle that? On some pages you can change the language and on others not? That would be really confusing for users I think.

Also, I don't like the idea that when I read the book in Russian, I'll see TOC in English.

Of course that was not the plan, I just hadn't found a good solution for it yet so I didn't discuss it too much

@defuz
Copy link

defuz commented Jan 14, 2016

How would you handle that? On some pages you can change the language and on others not? That would be really confusing for users I think.

Why not? We can clearly indicate that the translation for this chapter is not available yet. Another possible situation is that translation for some languages is available, but for other languages it's not.

@defuz
Copy link

defuz commented Jan 14, 2016

Another example that I care about.

Let's compare the structure of the section "Getting started" in the nightly and stable books. As you can see, Steve joined 4 chapters into one. Imagine that not all the language versions supported this change yet. If we have common TOC, this means that there is no possibility to open "Installing Rust", "Hello World" and "Hello Cargo" chapters in non-English version of book, because they do not exist in the original TOC anymore.

@azerupi
Copy link
Contributor Author

azerupi commented Jan 14, 2016

Yes I totally agree with you! This would be a big problem. However I am not sure I want to settle with the solution Gitbook proposes either. Maybe we can come up with something better that combines all the advantages and none of the drawbacks? (even if it's a little more complex)

Gitbook uses the "one SUMMARY.md per language" method and to be honest I don't think it is real multilingual support. They essentially have one book per language no cross-linking between the different languages except on a landing page...

I think you could already achieve something very similar with mdBook with multiple books and configuring the source and output directories according to what you want. The only difference is that Gitbook makes it just a little bit easier to setup.

@defuz
Copy link

defuz commented Jan 14, 2016

My suggestion is to have "one SUMMARY.md per language", but support page-to-page cross-linking between the different languages. The easiest way to do this is to consider that the files with the same name are the same chapters. In 99% this should work. A more complex way to do this is to add some kind of identifier to each file (something like UUID). If the identifiers of the files are identical, we can cross-link them.

@azerupi
Copy link
Contributor Author

azerupi commented Jan 14, 2016

Hmm yes that might be a good compromise. At least if the translations don't diverge to much from the original. I will try to think about this a little more and see if I can come up with other ideas.

Thanks for the valuable input! :)

@mdinger
Copy link
Contributor

mdinger commented Jan 1, 2017

FWIW, there are tools to handle translations which I didn't see mentioned here yet. For example, crowdin is used (or was when I was involved) over at freecad for document translation of their wiki. It was noteworthy that when an update was made to an english file, the plugin would notify you that the other translations need to be updated for that specific section or they would be out of date. The page linked above actually lists how complete each language translation is and maintains that information.

It is possible a tool like crowdin could just be added to the build process as a plugin which has been notified of which files require translating. Then it will maintain the database itself somewhere and you could tell mdbook where the translated files are located.

A solution like this seems worth the time exploring before spending effort creating a new ground up approach to solve the same problem.


EDIT: Also note they offer free support to open source projects

mgeisler added a commit to google/comprehensive-rust that referenced this issue Jan 10, 2023
This implements a translation pipeline using the industry-standard
Gettext[1] system.

I picked Gettext for the reasons described in [2] and [3]:

* It’s widely used in open source software. This means that there are
  graphical editors which will help you in editing the `.po` files. An
  example is Poedit[4], which is available for all major platforms.

  There are also many online systems for doing translations. An
  example is Pontoon[5], which is used for the Rust website itself. We
  can consider setting up such an instance ourselves.

* It is a light-weight yet structured format. This means that nothing
  changes with regards to how you update the original English text. We
  can still accept fixes and PRs like normal.

  The structure means that translators can see exactly which part of
  the course they need to update after a change. This is completely
  lost if you simply copy over the original text and translate it
  in-place in the Markdown files.

The code here only adds support for translations. They are not yet
tested, published or used for anything. Next steps will be:

* Add support for switching languages via a bit of JavaScript on each
  page.

* Update the speaker notes feature to support translations (right now
  “Speaker Notes” is hard-coded into the generated HTML). I think we
  should turn it into a mdbook preprocessor instead.

* Add testing: We should test that the `.po` files are well-formed. We
  should also run `mdbook test` on each language since the
  translations can alter the embedded code.

Fixes #115.

[1]: https://www.gnu.org/software/gettext/manual/html_node/index.html
[2]: rust-lang/mdBook#1864
[3]:
rust-lang/mdBook#5 (comment)
[4]: https://poedit.net/
[5]: https://pontoon.rust-lang.org/
mgeisler added a commit to google/comprehensive-rust that referenced this issue Jan 10, 2023
This implements a translation pipeline using the industry-standard
Gettext[1] system.

I picked Gettext for the reasons described in [2] and [3]:

* It’s widely used in open source software. This means that there are
  graphical editors which will help you in editing the `.po` files. An
  example is Poedit[4], which is available for all major platforms.

  There are also many online systems for doing translations. An
  example is Pontoon[5], which is used for the Rust website itself. We
  can consider setting up such an instance ourselves.

* It is a light-weight yet structured format. This means that nothing
  changes with regards to how you update the original English text. We
  can still accept fixes and PRs like normal.

  The structure means that translators can see exactly which part of
  the course they need to update after a change. This is completely
  lost if you simply copy over the original text and translate it
  in-place in the Markdown files.

The code here only adds support for translations. They are not yet
tested, published or used for anything. Next steps will be:

* Add support for switching languages via a bit of JavaScript on each
  page.

* Update the speaker notes feature to support translations (right now
  “Speaker Notes” is hard-coded into the generated HTML). I think we
  should turn it into a mdbook preprocessor instead.

* Add testing: We should test that the `.po` files are well-formed. We
  should also run `mdbook test` on each language since the
  translations can alter the embedded code.

Fixes #115.

[1]: https://www.gnu.org/software/gettext/manual/html_node/index.html
[2]: rust-lang/mdBook#1864
[3]:
rust-lang/mdBook#5 (comment)
[4]: https://poedit.net/
[5]: https://pontoon.rust-lang.org/
mgeisler added a commit to google/comprehensive-rust that referenced this issue Jan 11, 2023
This implements a translation pipeline using the industry-standard
Gettext[1] system.

I picked Gettext for the reasons described in [2] and [3]:

* It’s widely used in open source software. This means that there are
  graphical editors which will help you in editing the `.po` files. An
  example is Poedit[4], which is available for all major platforms.

  There are also many online systems for doing translations. An
  example is Pontoon[5], which is used for the Rust website itself. We
  can consider setting up such an instance ourselves.

* It is a light-weight yet structured format. This means that nothing
  changes with regards to how you update the original English text. We
  can still accept fixes and PRs like normal.

  The structure means that translators can see exactly which part of
  the course they need to update after a change. This is completely
  lost if you simply copy over the original text and translate it
  in-place in the Markdown files.

The code here only adds support for translations. They are not yet
tested, published or used for anything. Next steps will be:

* Add support for switching languages via a bit of JavaScript on each
  page.

* Update the speaker notes feature to support translations (right now
  “Speaker Notes” is hard-coded into the generated HTML). I think we
  should turn it into a mdbook preprocessor instead.

* Add testing: We should test that the `.po` files are well-formed. We
  should also run `mdbook test` on each language since the
  translations can alter the embedded code.

Fixes #115.

[1]: https://www.gnu.org/software/gettext/manual/html_node/index.html
[2]: rust-lang/mdBook#1864
[3]:
rust-lang/mdBook#5 (comment)
[4]: https://poedit.net/
[5]: https://pontoon.rust-lang.org/
mgeisler added a commit to google/comprehensive-rust that referenced this issue Jan 17, 2023
This implements a translation pipeline using the industry-standard
Gettext[1] system.

I picked Gettext for the reasons described in [2] and [3]:

* It’s widely used in open source software. This means that there are
  graphical editors which will help you in editing the `.po` files. An
  example is Poedit[4], which is available for all major platforms.

  There are also many online systems for doing translations. An
  example is Pontoon[5], which is used for the Rust website itself. We
  can consider setting up such an instance ourselves.

* It is a light-weight yet structured format. This means that nothing
  changes with regards to how you update the original English text. We
  can still accept fixes and PRs like normal.

  The structure means that translators can see exactly which part of
  the course they need to update after a change. This is completely
  lost if you simply copy over the original text and translate it
  in-place in the Markdown files.

The code here only adds support for translations. They are not yet
tested, published or used for anything. Next steps will be:

* Add support for switching languages via a bit of JavaScript on each
  page.

* Update the speaker notes feature to support translations (right now
  “Speaker Notes” is hard-coded into the generated HTML). I think we
  should turn it into a mdbook preprocessor instead.

* Add testing: We should test that the `.po` files are well-formed. We
  should also run `mdbook test` on each language since the
  translations can alter the embedded code.

Fixes #115.

[1]: https://www.gnu.org/software/gettext/manual/html_node/index.html
[2]: rust-lang/mdBook#1864
[3]:
rust-lang/mdBook#5 (comment)
[4]: https://poedit.net/
[5]: https://pontoon.rust-lang.org/
mgeisler added a commit to google/comprehensive-rust that referenced this issue Jan 18, 2023
This implements a translation pipeline using the industry-standard
Gettext[1] system.

I picked Gettext for the reasons described in [2] and [3]:

* It’s widely used in open source software. This means that there are
  graphical editors which will help you in editing the `.po` files. An
  example is Poedit[4], which is available for all major platforms.

  There are also many online systems for doing translations. An
  example is Pontoon[5], which is used for the Rust website itself. We
  can consider setting up such an instance ourselves.

* It is a light-weight yet structured format. This means that nothing
  changes with regards to how you update the original English text. We
  can still accept fixes and PRs like normal.

  The structure means that translators can see exactly which part of
  the course they need to update after a change. This is completely
  lost if you simply copy over the original text and translate it
  in-place in the Markdown files.

The code here only adds support for translations. They are not yet
tested, published or used for anything. Next steps will be:

* Add support for switching languages via a bit of JavaScript on each
  page.

* Update the speaker notes feature to support translations (right now
  “Speaker Notes” is hard-coded into the generated HTML). I think we
  should turn it into a mdbook preprocessor instead.

* Add testing: We should test that the `.po` files are well-formed. We
  should also run `mdbook test` on each language since the
  translations can alter the embedded code.

Fixes #115.

[1]: https://www.gnu.org/software/gettext/manual/html_node/index.html
[2]: rust-lang/mdBook#1864
[3]:
rust-lang/mdBook#5 (comment)
[4]: https://poedit.net/
[5]: https://pontoon.rust-lang.org/
mgeisler added a commit to google/comprehensive-rust that referenced this issue Jan 18, 2023
This implements a translation pipeline using the industry-standard
Gettext[1] system.

I picked Gettext for the reasons described in [2] and [3]:

* It’s widely used in open source software. This means that there are
  graphical editors which will help you in editing the `.po` files. An
  example is Poedit[4], which is available for all major platforms.

  There are also many online systems for doing translations. An
  example is Pontoon[5], which is used for the Rust website itself. We
  can consider setting up such an instance ourselves.

* It is a light-weight yet structured format. This means that nothing
  changes with regards to how you update the original English text. We
  can still accept fixes and PRs like normal.

  The structure means that translators can see exactly which part of
  the course they need to update after a change. This is completely
  lost if you simply copy over the original text and translate it
  in-place in the Markdown files.

The code here only adds support for translations. They are not yet
tested, published or used for anything. Next steps will be:

* Add support for switching languages via a bit of JavaScript on each
  page.

* Update the speaker notes feature to support translations (right now
  “Speaker Notes” is hard-coded into the generated HTML). I think we
  should turn it into a mdbook preprocessor instead.

* Add testing: We should test that the `.po` files are well-formed. We
  should also run `mdbook test` on each language since the
  translations can alter the embedded code.

Fixes #115.

[1]: https://www.gnu.org/software/gettext/manual/html_node/index.html
[2]: rust-lang/mdBook#1864
[3]:
rust-lang/mdBook#5 (comment)
[4]: https://poedit.net/
[5]: https://pontoon.rust-lang.org/
mgeisler added a commit to google/comprehensive-rust that referenced this issue Jan 18, 2023
This implements a translation pipeline using the industry-standard
Gettext[1] system.

I picked Gettext for the reasons described in [2] and [3]:

* It’s widely used in open source software. This means that there are
  graphical editors which will help you in editing the `.po` files. An
  example is Poedit[4], which is available for all major platforms.

  There are also many online systems for doing translations. An
  example is Pontoon[5], which is used for the Rust website itself. We
  can consider setting up such an instance ourselves.

* It is a light-weight yet structured format. This means that nothing
  changes with regards to how you update the original English text. We
  can still accept fixes and PRs like normal.

  The structure means that translators can see exactly which part of
  the course they need to update after a change. This is completely
  lost if you simply copy over the original text and translate it
  in-place in the Markdown files.

The code here only adds support for translations. They are not yet
tested, published or used for anything. Next steps will be:

* Add support for switching languages via a bit of JavaScript on each
  page.

* Update the speaker notes feature to support translations (right now
  “Speaker Notes” is hard-coded into the generated HTML). I think we
  should turn it into a mdbook preprocessor instead.

* Add testing: We should test that the `.po` files are well-formed. We
  should also run `mdbook test` on each language since the
  translations can alter the embedded code.

Fixes #115.

[1]: https://www.gnu.org/software/gettext/manual/html_node/index.html
[2]: rust-lang/mdBook#1864
[3]:
rust-lang/mdBook#5 (comment)
[4]: https://poedit.net/
[5]: https://pontoon.rust-lang.org/
mgeisler added a commit to google/comprehensive-rust that referenced this issue Jan 18, 2023
This implements a translation pipeline using the industry-standard
Gettext[1] system.

I picked Gettext for the reasons described in [2] and [3]:

* It’s widely used in open source software. This means that there are
  graphical editors which will help you in editing the `.po` files. An
  example is Poedit[4], which is available for all major platforms.

  There are also many online systems for doing translations. An
  example is Pontoon[5], which is used for the Rust website itself. We
  can consider setting up such an instance ourselves.

* It is a light-weight yet structured format. This means that nothing
  changes with regards to how you update the original English text. We
  can still accept fixes and PRs like normal.

  The structure means that translators can see exactly which part of
  the course they need to update after a change. This is completely
  lost if you simply copy over the original text and translate it
  in-place in the Markdown files.

The code here only adds support for translations. They are not yet
tested, published or used for anything. Next steps will be:

* Add support for switching languages via a bit of JavaScript on each
  page.

* Update the speaker notes feature to support translations (right now
  “Speaker Notes” is hard-coded into the generated HTML). I think we
  should turn it into a mdbook preprocessor instead.

* Add testing: We should test that the `.po` files are well-formed. We
  should also run `mdbook test` on each language since the
  translations can alter the embedded code.

Fixes #115.

[1]: https://www.gnu.org/software/gettext/manual/html_node/index.html
[2]: rust-lang/mdBook#1864
[3]:
rust-lang/mdBook#5 (comment)
[4]: https://poedit.net/
[5]: https://pontoon.rust-lang.org/
@Apkawa Apkawa mentioned this issue Mar 22, 2023
9 tasks
@mgeisler
Copy link
Contributor

mgeisler commented Apr 4, 2023

Hi all, I've published the plugins for a Gettext i18n translation workflow as a separate crate! You can install it with the usual

cargo install mdbook-i18n-helpers

Please see https://crates.io/crates/mdbook-i18n-helpers and let me know what you think in https://github.com/google/mdbook-i18n-helpers.

We've been using this infrastructure for 4 months now in the Comprehensive Rust 🦀 project. People have translated the course into Korean and Brazilian Portuguese and we have a few more languages in the pipeline.

What I like about this approach is that it's a very classic approach — Gettext is more than 30 years old now and there are a lot of tools out there which can help translators wrangle the .po files it uses.

@wc7086
Copy link

wc7086 commented May 14, 2023

Hi all, I've published the plugins for a Gettext i18n translation workflow as a separate crate! You can install it with the usual

cargo install mdbook-i18n-helpers

Please see https://crates.io/crates/mdbook-i18n-helpers and let me know what you think in https://github.com/google/mdbook-i18n-helpers.

This is indeed a good idea, but unfortunately, every time switch languages need to reload some styles, and maybe mdbook needs to make some changes for this to get a better experience.

@mgeisler
Copy link
Contributor

This is indeed a good idea, but unfortunately, every time switch languages need to reload some styles,

Are you talking about how the different languages are completely independent books (with their own assets such as stylesheets, images, etc)? I agree that it's a bit unfortunate.

and maybe mdbook needs to make some changes for this to get a better experience.

Yes, it could certainly be made easier! One pain point right now is that I need to copy the index.hbs file to be able to add a language picker to it.

NoahDragon pushed a commit to wnghl/comprehensive-rust that referenced this issue Jul 19, 2023
This implements a translation pipeline using the industry-standard
Gettext[1] system.

I picked Gettext for the reasons described in [2] and [3]:

* It’s widely used in open source software. This means that there are
  graphical editors which will help you in editing the `.po` files. An
  example is Poedit[4], which is available for all major platforms.

  There are also many online systems for doing translations. An
  example is Pontoon[5], which is used for the Rust website itself. We
  can consider setting up such an instance ourselves.

* It is a light-weight yet structured format. This means that nothing
  changes with regards to how you update the original English text. We
  can still accept fixes and PRs like normal.

  The structure means that translators can see exactly which part of
  the course they need to update after a change. This is completely
  lost if you simply copy over the original text and translate it
  in-place in the Markdown files.

The code here only adds support for translations. They are not yet
tested, published or used for anything. Next steps will be:

* Add support for switching languages via a bit of JavaScript on each
  page.

* Update the speaker notes feature to support translations (right now
  “Speaker Notes” is hard-coded into the generated HTML). I think we
  should turn it into a mdbook preprocessor instead.

* Add testing: We should test that the `.po` files are well-formed. We
  should also run `mdbook test` on each language since the
  translations can alter the embedded code.

Fixes google#115.

[1]: https://www.gnu.org/software/gettext/manual/html_node/index.html
[2]: rust-lang/mdBook#1864
[3]:
rust-lang/mdBook#5 (comment)
[4]: https://poedit.net/
[5]: https://pontoon.rust-lang.org/
@mgeisler
Copy link
Contributor

Hi again, just wanted to let people here know that I've released a version 0.2 of mdbook-i18n-helpers. This version changes how the text is extracted: paragraph are now unwrapped, headings are stripped of #, and tables are translated on a cell-by-cell basis.

A normalization tool is included to help you convert old translation files to the new format — we have ~18 translations now for Comprehensive Rust, so it's important for us to have a migration path for those files.

I would be very interested in feedback if you try it out! Thanks 🙂

@simonsan
Copy link

As I haven't really made a lot of progress on this front, besides setting up the template. It makes more sense I guess to reinitialize the whole .po/.pot files with the new version. I'll do that when I can spend time on it. 👍🏽 Thanks for keeping us up-to-date! <3

@PhuNH
Copy link

PhuNH commented Sep 5, 2023

I know you might want to use Rust here but at KDE we have written and used a Python program to do i18n with gettext for our Hugo websites for some years. Recently I have separated the Markdown stuff from the Hugo-specific stuff, and so if you want to do i18n and l10n for individual Markdown files, markdown-gettext might be helpful for you. It is compliant with CommonMark, and has support for all core Markdown elements, as well as YAML front matter, table, and definition list. The support here means that only text is processed (i12ized/localized), all formatting characters (at block level) are ignored during i18n but the file structure will be the same after l10n.

I understand this package might not be 100% fit with mdBook; however, writing an extension for the lib behind it is not difficult. I hope by using the package, you won't have to recreate the processing of common Markdown elements, and can focus on the differences.

@mgeisler
Copy link
Contributor

mgeisler commented Sep 5, 2023

Hi @PhuNH,

It is compliant with CommonMark, and has support for all core Markdown elements, as well as YAML front matter, table, and definition list. The support here means that only text is processed (i12ized/localized), all formatting characters (at block level) are ignored during i18n but the file structure will be the same after l10n.

That sounds nice and it sounds similar to the processing done by mdbook-i18n-helpers. A mdbook preprocessor can be written in any language — the manual has a Python example. It's probably very easy to create a wrapper around your library.

I recently found another tool for translating Markdown: https://github.com/mondeja/mdpo, also written in Python. There is also https://po4a.org/index.php.en, which handles even more formats.

mirrorcult added a commit to space-wizards/mdBook-spacewizards that referenced this issue Sep 28, 2023
commit 8664faea083017b1ec7c9d811be28427b8408bef
Author: Kara <[email protected]>
Date:   Thu Sep 28 10:47:37 2023 -0500

    Update for new mdbook version

commit 1b45e7a7a6521b4df6d441788a7fff105eba9240
Merge: e74fdb1 79edc75
Author: Kara <[email protected]>
Date:   Thu Sep 28 10:03:55 2023 -0500

    Merge branch 'master' into localization

    # Conflicts:
    #	Cargo.lock
    #	Cargo.toml
    #	src/book/book.rs
    #	src/book/init.rs
    #	src/book/mod.rs
    #	src/cmd/build.rs
    #	src/cmd/clean.rs
    #	src/cmd/serve.rs
    #	src/cmd/test.rs
    #	src/cmd/watch.rs
    #	src/config.rs
    #	src/preprocess/links.rs
    #	src/renderer/html_handlebars/hbs_renderer.rs
    #	src/renderer/markdown_renderer.rs
    #	src/utils/mod.rs
    #	tests/init.rs

commit e74fdb1
Author: Ruin0x11 <[email protected]>
Date:   Fri Feb 25 14:30:38 2022 -0800

    Make `chapter_titles` optional in Book

commit 7305e8c
Merge: 9d8147c 5921f59
Author: Ruin0x11 <[email protected]>
Date:   Fri Feb 25 14:13:22 2022 -0800

    Merge remote-tracking branch 'upstream/master' into localization

    # Conflicts:
    #	.gitignore
    #	guide/src/en/cli/completions.md
    #	guide/src/en/format/images/rust-logo-blk.svg
    #	guide/src/en/format/markdown.md
    #	guide/src/en/misc/introduction.md
    #	src/renderer/html_handlebars/hbs_renderer.rs
    #	src/utils/mod.rs

commit 9d8147c
Author: Ruin0x11 <[email protected]>
Date:   Wed Sep 15 21:49:58 2021 -0700

    Remove extra `localization.md`

commit 56e72a2
Author: Ruin0x11 <[email protected]>
Date:   Wed Sep 15 15:33:28 2021 -0700

    [localization] rustfmt

commit 92ec3dd
Author: Ruin0x11 <[email protected]>
Date:   Wed Sep 15 15:25:31 2021 -0700

    [localization] Fixes for latest master

commit d6c27ab
Author: Ruin0x11 <[email protected]>
Date:   Sat Aug 29 16:11:47 2020 -0700

    Implement translation fallback of files included with preprocessing

commit 5fed5e8
Author: Ruin0x11 <[email protected]>
Date:   Wed Sep 15 14:29:30 2021 -0700

    Update mdBook manual to have information about translations

commit 09a8b66
Author: Ruin0x11 <[email protected]>
Date:   Sat Aug 29 14:41:08 2020 -0700

    Improve robustness of link rewriting

commit 8d1c086
Author: Ruin0x11 <[email protected]>
Date:   Fri Aug 28 16:33:02 2020 -0700

    Fix {{#include}} directives for default language

commit 98c3a04
Author: Ruin0x11 <[email protected]>
Date:   Fri Aug 28 16:11:21 2020 -0700

    Move example book to multilingual structure

commit c72ce18
Author: Ruin0x11 <[email protected]>
Date:   Fri Aug 28 14:50:04 2020 -0700

    Rewrite links in Markdown to point to fallback if missing in translation

    It will follow relative links to other pages and embedded images.

commit ee740ac
Author: Ruin0x11 <[email protected]>
Date:   Fri Aug 28 12:26:08 2020 -0700

    Remove 'default' property on languages, use book.language instead

commit a042cfc
Author: Ruin0x11 <[email protected]>
Date:   Fri Aug 28 11:35:42 2020 -0700

    Make `mdbook init` output multilingual structure

commit 5e223e0
Author: Ruin0x11 <[email protected]>
Date:   Fri Aug 28 03:17:26 2020 -0700

    Support localizing book title/description

commit e17ce64
Author: Ruin0x11 <[email protected]>
Date:   Fri Aug 28 02:29:07 2020 -0700

    Fix test using create_missing

commit 282fdaa
Author: Ruin0x11 <[email protected]>
Date:   Fri Aug 28 02:05:21 2020 -0700

    Redirect to a 404 page when serving translated

    We can't redirect in warp based on the URL, so redirect to the default
    language's 404 page instead.

    See: seanmonstar/warp#171

commit 85ab4d3
Author: Ruin0x11 <[email protected]>
Date:   Fri Aug 28 01:36:22 2020 -0700

    Redirect to translation index page in serve command

commit 8869c2c
Author: Ruin0x11 <[email protected]>
Date:   Fri Aug 28 00:24:33 2020 -0700

    Build multiple books from localizations at once

    Changes how the `book` module loads books. Now it is possible to load
    all of the translations of a book and put them into a single output
    folder. If a book is generated this way, a menu will be created in the
    handlebars renderer for switching between languages.

commit 96d9271
Author: Ruin0x11 <[email protected]>
Date:   Thu Aug 27 19:44:24 2020 -0700

    Specify language for book in command line args

    - Add a [language] table to book.toml. Each key in the table defines a
    new language with `name` and `default` properties.
    - Changes the directory structure of localized books. If the [language]
    table exists, mdBook will now assume the src/ directory contains
    subdirectories named after the keys in [language]. The behavior is
    backwards-compatible if you don't specify [language].
    - Specify which language of book to build using the -l/--language
    argument to `mdbook build` and similar, or omit to use the default
    language.
    - Specify the default language by setting the `default` property to
    `true` in an entry in [language]. Exactly one language must have `default`
    set to `true` if the [language] table is defined.
    - Each language has its own SUMMARY.md. It can include links to files
    not in other translations. If a link in SUMMARY.md refers to a
    nonexistent file that is specified in the default language, the renderer
    will gracefully degrade the link to the default language's page. If it
    still doesn't exist, the config's `create_missing` option will be
    respected instead.

commit 3049d9f
Author: Ruin0x11 <[email protected]>
Date:   Thu Aug 27 16:35:00 2020 -0700

    Actually, don't change source root

    The book paths have to gracefully degrade to the default language if
    they aren't available.

commit 24e6d6b
Author: Ruin0x11 <[email protected]>
Date:   Thu Aug 27 16:26:07 2020 -0700

    Change book source root depending on language

commit e4b443c
Author: Ruin0x11 <[email protected]>
Date:   Thu Aug 27 13:27:47 2020 -0700

    Add language config section

    Referencing rust-lang#5 (comment).
@lcmgh lcmgh mentioned this issue Feb 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Internal-representation Area: Internal Representation A-Localization Area: Localization, language support, etc. C-enhancement Category: Enhancement or feature request M-Discussion Meta: Discussion S-Wishlist Status: Wishlist
Projects
None yet