BCN y Humanidades Digitales

After the 2013 meeting of the Association for Spanish and Portuguese Historical Studies, Andrea Davis (then a sharp graduate student) kindly shared with me a list of Spanish digital resources she prepared. I just (April 22, 2016) finished poking through looking for new items and I saw that the Instituto Nacional de Estadística (INE) had posted the 1930 Anuario. {I will always try to provide a complete and corrected reference: Instituto Nacional de Estadística (Spain). Anuario Estadístico de España. Año XVI 1930. Madrid: Sucesores de Rivadeneyra, 1932.}

http://www.ine.es/inebaseweb/pdfDispacher.do?td=50641&ext=.pdf — Title page 1930 annual

The root of this DH project is an experiment with tools and historical questions that center on Barcelona in the early period of the Second Republic (1931-1933). A major portion of that is creating a functional dataset from the Padrón Municipal de 1930. That the 1930 national statistical volume now online is incredible, I could have used it on many times in the past… but there are issues.

To start, there is no way to download the entire volume at once. Instead, you have to wade through a file tree and download each table (I apologize for the poor screenshot).

Screen shot of the 1930 anuario's file tree structure — Screen shot of the 1930 Anuario’s file tree structure

As you can see there are branches on each. There are seven initially:

But these branch out into “sub-branches” and often branches there. Ultimately each table is listed and you have to click there to load the separate pdf file. When you click on the link to the actual pdf, a new WINDOW opens (unless you simultaneously hold down the control key to open it in a new tab). This window then gives you the pdf which you have to download — but without a file extension. So as I downloaded each file I have to add the .pdf to the file name. The file names do not correspond to anything I can recognize but are a series of numbers and not in numerical order. The title page (portada in the image at the top of this page) is file 5061. What appears to be page 17 of the volume is the table labeled “II. Resultados provisionales del Censo de 1930, en las capitales de provincia” of Demografía is file 4362 — 699 digits less.

I did up to the “Agricultura” sub-branch of “Producción, consumo, y cambio” ( Agricultura has the further sub-divisions of “Producción agrícola (35 tablas) / Colonización agrícola (2 tablas) / Producción forestal (18 tablas) / and Ganadería (9 tablas)). This was when I gave up trying to do it in one setting. 127 files later, after adding .pdf to each, I used Adobe Acrobat Standard to combine them. The files changed order when I sorted by name so I sorted by time downloaded to combine into a single file, and named the file.

The file name is the name of the volume and currently contains 177 pages. {And now I am too tired to review the combined files to see what is the resulting order.} I saved it to my desktop so I will see it and remember to continue with the “Industria” sub-branch of “Producción, consumo, y cambio.” I also pasted the list of branches still to be done using Adobe’s “Document properties” feature (accessible by control-d in the file).

And to be honest, I had to do the combining a second time after I deleted the files (but they were still recoverable) because apparently I did not save the original combined file.

So when I am done — what should I do with the resulting file dear readers? And done in this case means several things:

adding all the files for 1930 from the INE website
making a proper table of contents in the Adobe file
trying to number the pages so these correspond to the actual printed pages (unlike Oxford’s obnoxious text numbering in Oxford Scholarship Online)
OCR
Reducing the file size

And a final note, I am sure there is a way to do this that is more technologically adept, by “harvesting” the files. I need to read Ian Milligan’s post on doing this on another site and take a stab at it myself.

Downloading from Spain’s Instituto Nacional de Estadística

Bienvenidos y buenaventura