I decided to work with the great nineteenth century Brazilian author Machado de Assis (author of Brás Cubas), and analyze the results in a more careful way than when I am researching for my study. It was not easy to find a great variety of titles by this author, so I had to choose from a selected group of titles that had full text versions available (because most of them were protected for copyright reasons). In Gutenberg Project, I only found two of the books that Machado wrote, so I decided to work with Varias historias (Many stories) a collection of sixteen short stories that was published in 1896, in Google Books, HathiTrust and Internet archives. I did not know I was going to find so many problems!
The first option has only a snippet view, and it is a translation into Spanish, actually. So I went to the second option to read it in full, and I saw that it is from the Library of the University of Texas at Austin, a 1903 edition. It is a text that was first published in 1896, so this edition comes just seven years after that. Google books only offers the name of the publishing house, H. Garnier, the year, 1903, and the number of pages, 282 pages. The formats offered are: plain text, PDF, EPUB. You can download the text, and in the online version the table of contents has links to the different parts of the book. It is possible to read it in “Google play”, as well, a kind of digital cloud to store books, music, etc. So, you can make your own google books library.
As far as restrictions on the digital contents are concerned, users are not allowed to sell the digital content or remove the watermark or other sign that says it belongs to Google. These are the same restrictions that HathiTrust and Internet Archive have.
The scanned version had all the pages. But I realized that the print copy itself had a lot of problems instead! In one instance, the page number was reversed (175 instead of 157), and there was a line mistakenly inserted in a dialogue. But, fortunately, one of the readers of this book in its printed form corrected the mistake, so we can now “read it the way it should be”.
The copy was full of marks that made the reading really annoying. In addition to this, another reader, who seems to be learning Portuguese, tried to “help” by translating some words he did not know!
My question is: What is the advantage of having access to an edition like this? Why digitize such a poorly printed and preserved copy? And it is the first option when Google digitized many other versions of this book?
The copy I was looking for appears in the entry as written in Spanish! The site says that the publisher is Casa de las Américas, its year of publication, 1904 (which is the first problem, because “Casa de las Américas was created after Cuban Revolution), its language is Spanish, and it belongs to the collection of an “unknown library.” But when I “opened” the book, the first thing that appeared is the bookplate of Stanford University, it is a book in the Portuguese language, and digitized by Google. When I searched in the catalog of Stanford University, the book appeared there, of course.
So, why did they say they do not know the origin? Why is the information so poor? There is a mix of correct information of this book (the publication year) with another book: its translation into Spanish more than sixty years after, published by Casa de las Américas. But if the two entries were few, when I began reading the book’s inside cover I found a third bibliographical entry on a post-it!
This copy was published by the same publishing house just one year later than the copy I found in Google books: the edition was corrected, and (fortunately) the copy was clean! The formats offered were PDF, EPUB, Kindle, DJVu, Metadata. But if you want to read it online, there are many problems with some pages, they look like this:
It’s frustrating! This aside, the catalog record is incorrect. And that annoys me a lot, because I see once again the same mistake: thinking that Portuguese and Spanish is the same. I found that there is an “editable web page” through “Open Library.” So I created an account to see what options I had to correct the mistake. It said that it had four revisions, but none of them changed the bibliographical entry. Now I had the chance to add some information about the book, and CHANGE the information given. So I changed the information about the publishing house, date, language…I was feeling much better after that! BUT I could not change the Language edition… it is like a curse… Spanish is NOT Portuguese… so I just added a comment warning that it was the original Portuguese edition, instead of the Spanish one that it announced.
The copy I found here belongs to the New York Public Library, and it was digitized by Google (even though it is not possible to read in full in Google books).
The publishing house is the same as the others, H. Garnier, but they do not know the date of publication. It should be after 1903, because it is a corrected version. It is strange because the data does not appear where it appeared in the other two versions. There was only one format, PDF, but it is possible to read it online as well. But this copy is almost illegible!
Many stories lack from one to three pages, a whole story is missing, and there is one page that was attacked by a cannibal or something:
HathiTrust has a feedback form to report problems. But if problems come from books digitized by Google, they only say that “Google is continually improving the quality of images and OCR it delivers to HAthiTrust partners.” So, the real answer is: wait.
It is possible to read the text in a Classic view, Scroll, Flip, Thumbnails and Plain text, which I found interesting and useful – but not so useful if the copy lacks pages and sometimes it is almost illegible!
You can download the PDF version only if you are part of the partner institutions (American universities, basically, and just one from Spain and France). You can create a collection (that can be private or public) and add the book.
Yes, digitization has a long way to go, but there are things that can be done just paying more attention to the information that is posted. The quality of the scan is sometimes very poor, if not the original!