Distributed Proofreaders

Last Change: March 10, 2012

Project Gutenberg

Founded in 1971, Project Gutenberg, or PG, (Wikipedia page) is the world's oldest electronic library. Its holdings number about 40,000 works, almost exclusively in the public domain (under US copyright law), and are available at no cost over the Internet.

Dozens of public domain science books may be downloaded freely (libre and gratis).

Distributed Proofreaders

About half of PG's catalog is produced by Distributed Proofreaders, or DP (Wikipedia page), an all-volunteer Internet community devoted to production of high-quality free ebooks. At DP, books are scanned and run through optical character recognition (OCR) software, then proofread for errors, formatted for semantic structure and visual appearance, and assembled into finished ebooks. Proofreading (or proofing) and formatting are done one page at a time, typically over a period of days or weeks, by dozens of volunteers. The site's name derives from the distributed nature of the workflow.

Other, larger collections of ebooks exist. However, DP is devoted to producing high-quality, text-based ebooks. The Internet Archive and Google Books provide books made from scanned images. These are enormous files, tens or even hundreds of megabytes, and are effectively unavailable to people without fast Internet access, including users on dial-up world-wide, and most users in Africa, the Middle East, South America, and Asia. PG's collection consists of text files, tens of kilobytes to a few megabytes. The entire collection currently fits on one DVD, making it possible to distribute the catalog by ordinary post.

Ebooks created from page images suffer from whatever limitations existed when the book was scanned. If the resolution is low, the contrast is poor, or if a page was stained or damaged, those defects carry into the final product. By comparison, a PG ebook is a text file, which suffers from none of these legibility issues.

Google provides OCRed text of its books, but if you've tried to read raw OCR, you'll discover the technology has a long way to go. Typos are commonplace, and unusual elements, such as mathematics, often scan as garbage. At DP, volunteers correct the raw OCR to match the scan, focusing as needed on pages that are difficult to read, and create a marked-up file containing tags needed to display the book with good fidelity to the original.

My Activities at DP

At DP, I work almost exclusively on mathematics and physics books formatted with the typesetting language LaTeX. My long-term goal at DP is systematically to digitize selecta from the public domain mathematics literature. You can browse the list of books I've post-processed, namely, assembled into finished ebooks and uploaded to PG.

For reasons indicated above, mathematics books benefit particularly from DP's work flow. Nonetheless, digitizing mathematics is a slow process and demands extensive training. A typical mathematics book requires about an hour per page, and much of that work must be performed by volunteers familiar or fluent with LaTeX. On the flip side, volunteering at DP can be an excellent path to learning LaTeX, provided you have the time and commitment.

Since joining DP, I've become involved with many aspects of LaTeX ebook production: formatting individual pages, helping to develop the distinctive work flow for mathematical projects, writing manuals for proofing, formatting, and post-processing LaTeX, creating interactive training materials, coordinating LaTeX-knowledgeable volunteers, and writing the software PG uses to package uploaded LaTeX projects. All of these tasks have been carried out in collaboration with volunteers from around the world, none of whom I've (yet) met in person.

Getting Involved

To work at DP, you must register as a volunteer. This takes only a minute or two, and is similar to registration at any free online site: You request a user name and provide an email address, to which your account information (a welcome message and your initial password) is sent. As a volunteer, you may visit the site, browse and work on available projects, and communicate with other volunteers in bulletin-board style forums on a variety of topics.

Access to some activities is limited, to ensure you have been sufficiently trained and have accumulated enough knowledge of the site to carry out tasks productively. In some cases access is granted after a certain amount of time on site and a certain number of pages completed. In other cases, access is granted by evaluation of your work by a more experienced volunteer.

DP's success as a source of high-quality ebooks is testament to the potential of the Internet, which allowed a far-flung group of like-minded individuals to grow into a thriving community founded on the desire to preserve history one page at a time.

DP Books Post-Processed for PG

  1. Alexander McAulay:
    The Utility of Quaternions in Physics
  2. Amos Emerson Dolbear:
    The Machinery of the Universe
  3. Karl Weierstrass:
    Theorie der Abel'schen Functionen
  4. Ernst Leonard Lindelöf:
    Le Calcul des Résidus et ses Applications à la Théorie des Fonctions
  5. Arthur Stanley Eddington:
    Space, Time and Gravitation
  6. Vito Volterra:
    Leçons sur l'Intégration des Équations Différentielles aux Dérivées Partielles
  7. Michel A. Melkanoff et al.:
    A Fortran Program for Elastic Scattering Analyses with the Nuclear Optical Model
  8. Leonard Eugene Dickson:
    First Course in the Theory of Equations
  9. Jacques Hadamard:
    Four Lectures on Mathematics
  10. Maurice Godefroy:
    La Fonction Gamma
  11. Albert Ribaucour:
    Étude des Élassoïdes ou Surfaces A Courbure Moyenne Nulle
  12. H. E. Slaught and N. J. Lennes:
    Solid Geometry with Problems and Applications (Revised edition)
  13. Florian Cajori:
    A History of Mathematics
  14. Arthur L. Baker:
    Elliptic Functions
  15. Amos Emerson Dolbear:
    Matter, Ether, and Motion
  16. Max Planck:
    Vorlesungen über Thermodynamik
  17. Forest Ray Moulton:
    An Introduction to Astronomy
  18. Philip H. Wicksteed:
    The Alphabet of Economic Science
  19. John Maynard Keynes:
    A Treatise on Probability
  20. Rudolf Clausius:
    Die Potentialfunction und das Potential
  21. John Henry:
    Abrégé de la Théorie des Fonctions Elliptiques
  22. Richard Chace Tolman:
    The Theory of the Relativity of Motion
  23. Hugh Blackburn:
    Elements of Plane Trigonometry
  24. Hermann Laurent:
    Sur les Principes Fondamentaux de la Théorie des Nombres et de la Géométrie
  25. Thomas Roberts:
    Tappet and Dobby Looms: Their Mechanism and Management
  26. Horace Bryon Heywood and Maurice Fréchet:
    l'Équation de Fredholm
  27. Sylvanus Phillips Thompson:
    Calculus Made Easy
  28. Wallie Abraham Hurwitz:
    Randwertaufgaben bei Systemen von Linearen Partiellen Differentialgleichungen erster Ordnung
  29. Leo Koenigsberger:
    Vorlesungen über die Theorie der Hyperelliptischen Integrale
  30. Ernest Vessiot:
    Leçons de Géométrie Supérieure
  31. George Howard Darwin:
    Scientific Papers, Vol. V
  32. Felix Klein:
    The Evanston Colloquium: Lectures on Mathematics Delivered From Aug. 28 to Sept. 9, 1893 Before Members of the Congress of Mathematics Held in Connection with the World's Fair in Chicago
  33. Heinrich Emil Timerding:
    Die Analyse des Zufalls
  34. David Bierens de Haan:
    Note sur une Méthode pour la Réduction d'Intégrales Définies et sur son Application à Quelques Formules Spécials
  35. George Albert Wentworth:
    The First Steps in Algebra
  36. Charles Briot and Jean Claude Bouquet:
    Théorie des Fonctions Elliptiques
  37. Godfrey Harold Hardy:
    Orders of Infinity: The `Infinitärcalcül' of Paul Du Bois-Reymond
  38. George Howard Darwin:
    The Tides and Kindred Phenomena in the Solar System
  39. Godfrey Harold Hardy:
    A Course of Pure Mathematics
  40. Kurt Hensel:
    Zahlentheorie

Other Books Produced for PG

  1. Albert Einstein:
    Relativity: The Special and the General Theory, 3rd Ed.
  2. Albert Einstein:
    The Meaning of Relativity: Four Lectures Delivered at Princeton, May, 1921
  3. Joseph Louis Lagrange:
    Lectures on Elementary Mathematics
  4. Karl Friedrich Gauss:
    General Investigations of Curved Surfaces of 1827 and 1825
  5. George Boole:
    The Mathematical Analysis of Logic: Being an Essay Towards a Calculus of Deductive Reasoning.
  6. Felix Klein:
    On Riemann's Theory of Algebraic Functions and their Integrals: A Supplement to the Usual Treatises.
  7. Henri Poincaré:
    Science and Hypothesis.
  8. Augustus De Morgan:
    Elementary Illustrations of the Differential and Integral Calculus.
  9. Augustus De Morgan:
    On the study and difficulties of mathematics.