TraPrInq (04_2023)

Transcribing the Portuguese Inquisitorial Proceedings: TraPrInq

By Celeste Pedro

April 2023 – This month, the Journal of the Portuguese Association of Librarians, Archivists and Documentation Professionals (BAD) announced a new automatic transcription project to be released in June 2023, coordinated by Hervé Baudry: TraPrInq. Automatic transcription has been the most powerful AI tool for researchers working with large collections. With increasing accuracy as more information is added to transcription models, it is still a time-consuming task to encode texts, and calligraphical documents are still the hardest to sample, as shape variation is at its highest, especially in historical records, where the legibility and readability of the texts was not a main concern.

Developed as an eighteen-month exploratory project financed by FCT (Fundação para a Ciência e Tecnologia) and taking place at CHAM-Centro de Humanidades, Nova de Lisboa, TraPrInq focuses on Portuguese manuscripts from the 16th to the 19th century from the Tribunal do Santo Ofício do Arquivo Nacional da Torre do Tombo. TraprInq objectives will be reached with the transcription of five thousand pages, feeding a readable and searchable database, in order to create a recognition model that will support the mass transcription of archival records (more than forty thousand in this case).

The technology is based on Handwritten Text Recognition (HTR), which has proven more satisfactory than Optical Character Recognition (OCR), even for print materials. The group of more than ten experts primarily uses the Transkribus Platform and has been focusing on the proceedings of the Inquisition in the kingdom of Portugal. The model used has for now proven promising, with error margins of around 5% in the so-far automated transcribed data from the Inquisition trial’s best-preserved collection.

Transkribus is an AI platform profusely used by researchers studying large historical documents and sets of documents. Transkribus allows an automated recognition of texts, layouts, and structures of digitised pages. Training AI models on Transkribus (what TraPrInq is doing) allows researchers to add metadata and collaborate with others, making more materials easier to find, read and share based on specific collections.
You can try it out on your own using Transkribus Lite.

If you’re interested in learning more about the project’s approach, take a look at one of their publications, where the training models are explained step by step using a concrete example: “Ultima scripta e primeiras transcrições automatizadas. A carta testamento de Francisco Gomes Henriques (1654) submetida ao primeiro modelo de HTR (Handwritten Text Recognition) do projeto TraPrInq

Checkout the presentation video they set up:

On the 22nd (online and in-person) and 23rd (in-person workshop) of June, there will be a closing conference: Colóquio Humanidades Digitais e Estudos Inquisitoriais, in Lisbon. Oral communications will be in Portuguese and English and will cover both historical research on the chosen corpus, as well as TraPrInq applicability to different corpus and objectives. And finally, for a more detailed update on the TraPrInq model development training, check out the short articles here.