Help:Proofread
Proofreading at Wikisource is an activity that everyone can take part in, based on scanned images of the original document, which are presented side-by-side with the text version. If you are new to Wikisource, then take a look at the Beginner's guide to proofreading. This page has details that you can refer to when you need.
Looking at documents
Page, Index and main Namespaces
Namespaces are used to differentiate page types at Wikisource. For example, the page you are reading now is in the "Help:" namespace. See also Help:Namespaces
- The Index: namespace, holds indexes for documents, displaying all the pages of a text together. A sample index can be found at Index:Cinderella (1865).djvu. An index page shows publication details of the document, a link to the File, and the set of links to its Pages. The Index can be modified to show the original page numbers. This type of page is used by the ProofreadPage extension to build navigation buttons, these are:
- Index pages on this wiki are listed in Category:Index.
- The Page: namespace is used to display page text side-by-side with individual page images, and allows transcription of the original text. A sample page can be found at Page:Cinderella (1865).djvu/5. You can zoom in on the page image by clicking and scrolling on the image in the right-hand pane.
- Page numbers for DjVu files are indicated by adding a slash followed by the page number to the file name. For example, Page:Sketch of Connecticut, Forty Years Since.djvu/27 displays the 27th page of the file. The text on the left can be modified in edit mode, only the contents of the edit box are displayed on the 'main' page.
- Some text and formatting may be placed outside of the edit box, in 'no-include' sections. These boxes can be opened by selecting the 'Proofread tools' in the upper left corner and clicking the button. The header and footer are automatically placed inside <noinclude> tags which prevent transclusion of their contents. An example is Page:Cinderella (1865).djvu/5 which hides the title, repeated on every page, and the page number.
- The 'main' namespace uses 'transclusion' to display a number of pages as a chapter or section of the work. A link at left provides access to the Page: namespace. The original pages numbering can be displayed at the links, information provided by the Index. The text in the header and footer of the Page: namespace is not displayed.
- A mainspace page, such as Cinderella, or the Little Glass Slipper, displays the text and illustrations of that work.
Editing pages
The following buttons appear for navigating and editing the Page: namespace
- previous page
- next page
- the Index for the page
- show/hide the interface for editing the header and footer
- zoom out on scan
- zoom in on scan
- reset zoom
Formatting conventions
The following conventions are considered best practices for pages in the Page: namespace (DjVu files and other files which use the ProofreadPage extension). For general article formatting conventions and guidelines see Wikisource:Style guide.
- A scanned page's header and footer often include page number and titles, each of which are not needed for the new page. Place the information in the 'no-include' sections, accessed by clicking the which appears above the edit window. The {{RunningHeader}} template is useful for formatting these headers, and is used as follows:
{{RunningHeader|left=|center=|right=}}
- Text in the left, center, or right parameters will appear on the same line.
- Remove end-of-line hyphens and line breaks. To start a new paragraph, media-wiki pages use two returns.
- When a word is hyphenated onto two different pages of the DjVu scans, use {{hws}} and {{hwe}} (if you wish, you can also use {{Hyphenated word start}} and {{Hyphenated word end}}). These templates will make the word appear hyphenated in the Page: namespace and remove the hyphen when the text is transcluded. Example: (first Page and second Page, result in main page "pretending")
{{hws|FIRST HALF OF WORD|WHOLE WORD}}
{{hwe|LAST HALF OF WORD|WHOLE WORD}}
- If a paragraph ends at the bottom of a DjVu page and a new paragraph in the same chapter will start on the next page, add {{nop}} at the bottom of the page, to force a break in the text. Otherwise, when the pages are transcluded the separation between the two pages will be treated as single space rather than a new line. Example: (Page:Personal Recollections of Joan of Arc.djvu/476 and Personal Recollections of Joan of Arc/Book III/Chapter 14).
- Note: formerly the {{Blank line}} template was used for this purpose at the top of the following page; {{nop}} is now preferred.
- If you need to indicate a word/phrase should be in Small Caps, use the {{small-caps}} template.
- If you need to indicate a word/phrase should be in a smaller (or larger) font, use the {{smaller}} template. Similarly, the {{xx-smaller}}, {{x-smaller}}, {{larger}}, {{x-larger}}, {{xx-larger}}, and {{font-size}} templates can be used to modify text size.
- Using standard templates instead of other types of markup, gives Wikisource protection from undesirable external changes. eg.{{right}} instead of
<p align="right">
More templates and information about them can be found at Typography templates and Category:Formatting templates. - Published (foot)notes in the scanned print version of the DjVu book should be displayed using the <ref></ref> and {{reflist}} mark up. If the note is carried over on more than one print page, the solution is to use a named reference <ref name="pageno"> on the first page, and on subsequent pages containing the footnote use the syntax <ref follow="pageno"> (use same name). Example: (use of "name" Page:London Journal of Botany, Volume 2 (1843).djvu/188, use of "follow" Page:London Journal of Botany, Volume 2 (1843).djvu/189 and output at London Journal of Botany/Volume 2/Swan River Botany). Note: that names of <ref name=…> must start with a letter, not a number, to work properly on a wiki.
Page status
The status of a page is reflected both in the color of its block on the index page, and by the banner on the page. The ProofreadPage Extension is used to implement the status.
Pages in the "Page:" namespace at Wikisource each have an associated "status", which refers to how thoroughly the text on that page has been proofread.
Page status is visible to all users, though can only be changed by users logged into their Wikimedia user account. As such, IP editors cannot change page status.
Validation path
The validation path of the ProofreadPage extension involves five states:
↗ Without text | |||
empty page | → Not Proofread | → Proofread | → Validated |
↘ Problematic ↗ |
Details
The first three are the normal pathway:
- Not Proofread is the default value. If you've worked on a page, but don't have time to finish, this is the status to use. (See all pages.)
- Proofread means proofread by one contributor. You have brought the page to the best condition you can where the text matches the scan and basic formatting is done, and are when you are reasonably happy with it. A Proofread page is ready for display to the casual reader—even if it has little errors in it. (See all pages.)
- Validated means proofread by two contributors. The corresponding button is available only if the page has been already proofread by someone else: you cannot both proofread and validate the same page. (See all pages.)
In addition,
- Without text is for blank pages. It is not used on pages with published content, such as images. (See all pages.)
- Problematic indicates a problem that needs further work or discussion among contributors, such as a table, missing image, or a different alphabet. (See all pages.)
You will find the buttons that indicate these states under the edit window. If a previous contributor has proofread the page already, they will appear as below: | ||
If no one has proofread the page yet, the buttons will appear as follows: | ||
In both cases, you can change the status of the page by selecting the appropriate button and saving the page. If the green "Validate" button is not available to you, it will appear for other users if you select the yellow "Proofread" button and save.
The state of each page in a document is shown on the index page, as shown in the screenshot on the right.
CSS
The template {{Page status text}} can be used to apply the status formatting.
{{page status text|without text}}
: Without text{{page status text|not proofread}}
: Not Proofread{{page status text|problematic}}
: Problematic{{page status text|proofread}}
: Proofread{{page status text|validated}}
: Validated
The above status colours can be also be used in CSS with the following class names (to set the background-color
property):
class="quality0"
class="quality1"
class="quality2"
class="quality3"
class="quality4"
Transclusion
After the text of the work is populated into each side-by-side image page, "transclusion" is used to display the text from the Page: namespace on pages in the main namespace. Transclusion displays the page of another text without having to copy or manually update it later. The purpose of transcluding the text is to group it into logical, reasonably sized chunks—most frequently chapters or sections.
Beginning the proofreading project
Most proofreading projects use djvu files to contain scans or photographs of each page of a document, together with an OCR layer representing the extracted text. To convert a .pdf to .djvu format (even "secured" .pdf files), the GPLv2 open source application pdf2djvu may be used.
Producing Page: files for side by side editing
Once a .djvu document has been produced and uploaded to Wikimedia Commons, the index is started as a new article under the name "Index:[name of Wikimedia Commons file]" using Mediawiki:Proofreadpage index template. Copy the full parameter list and fill in what you can. Set "Progress=MS" and "Pages=<pagelist />". You should include a Wikilink in the "Title" which points to the text article at Wikisource (which may have the same name as the index file, but does not contain "Index:").
New files for each page in the Page: namespace are then produced for the file by using User:Phe-bot. See Help:Match and split for this procedure.
ProofreadPage Extension
Wikisource uses the ProofreadPage extension, which allows you to render text along with its corresponding scanned image.
Users new to proofreading can experiment with the concept, and test their abilities with these simple introductory tests on the Distributed Proofreading's website. Working examples can be seen by finding a project in progress, such as Wikisource:Proofread of the Month.
Plus our active proofreading work
The current Proofread of the Month is The Silent Prince
(1900) Last month completed: A Journey to Lhasa and Central Tibet |
See also
- Help:Page Status
- Help:Adding texts
- Help:Beginner's guide to Index: files
- Help:Page breaks
- Help:Digitising texts and images for Wikisource
- Help:Match and Split
- Category:Index category of page scan indexes of works being proofread or validated
- Wikisource:Proofreading, collaboration page
- Wikisource:Transcription Projects
- Extension:Proofread Page at Mediawiki.
- Proofread Page Statistics multi-domain stats
- oldwikisource:Wikisource:ProofreadPage is the multi-domain page concerning this extension. It is used for announcements and help to users. The discussion page is used to post bug reports, request features, etc.