Browse for Maps

Search this website

 

Useful links (coming soon!)

« Steamboats, Yellow Fever, and Making Maps Move | Main | The Mysteriously Shifting Distribution of Yellow Fever »
Thursday
Jul082010

Ministry of the Empire Reports are now "Searchable"

 

Many of the day-to-day affairs of the Brazilian Empire (1822-1889) were overseen by the Ministerio do Imperio (Ministry of the Empire).  Once or twice a year this Ministry published an update of the Empire's state of affairs, including reports on schools, municipal elections, and the imperial family. These reports are valuable for this project because they became increasingly detailed in their discussion of public health following the outbreaks of yellow fever and cholera in the 1850s. By the 1860s, they included special reports written by Brazil's top health authorities.  Beyond health, historians studying education or searching for details on some of the smaller provinces will also find them to be a rich primary source.

Ministerial Reports are one of the most accessible primary sources for the imperial period because they can be read online though the Center for Research Libraries (CRL) website.  The CRL does an invaluable service making these available without any special licence or access.  Historians are limited by this source, however, by their length  (as many as 1000 pages for some in the 1870s) and the lack of detailed index.  Furthermore, the CRL only holds page images that have not, until now, been OCR processed.  Without machine readable text, they cannot be searched, making the task of finding answers to specific questions much like finding a needle in a haystack.

I've been working with my father for more than a year now to make part of the enormous collection of government documents at the CRL searchable.  My father created a web-based program that runs on a Linux server and uses MySQL to store and retrieve text records. A separate Python program processes the OCR text information and organizes it in the MySQL database. This technology allows anyone with an internet connection to be able to search the Ministry of the Empire reports by keyword or character combination here.  In total, there are 19,640 pages from 60 reports covering the period 1832-1888.

One caveat:  Often the image or text quality is so poor that our OCR program (AABBY) couldn't read the text correctly.  This means that there are many misspelled words or misidentified characters.   Working around this problem, we included regular expressions searches, a powerful way to find words using wildcard characters or other expressions.  We've also included a quick link to a specially created .pdf page and the CRL's page image for each returned search hit, and these are often easier to read.

Eventually we would like to expand to include more reports.  Even though this could be done for a relatively small price, the cost for us at this point is prohibitive.   We are looking for sources of funds that could make this possible and are open to any ideas.  Potentially, the program's use could be greatly broadened because it could provide a search engine for other primary sources that have been OCR processed. 

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>