After the 2013 meeting of the Association for Spanish and Portuguese Historical Studies, Andrea Davis (then a sharp graduate student) kindly shared with me a list of Spanish digital resources she prepared. I just (April 22, 2016) finished poking through looking for new items and I saw that the Instituto Nacional de Estadística (INE) had posted the 1930 Anuario. {I will always try to provide a complete and corrected reference: Instituto Nacional de Estadística (Spain). Anuario Estadístico de España. Año XVI 1930. Madrid: Sucesores de Rivadeneyra, 1932.}
The root of this DH project is an experiment with tools and historical questions that center on Barcelona in the early period of the Second Republic (1931-1933). A major portion of that is creating a functional dataset from the Padrón Municipal de 1930. That the 1930 national statistical volume now online is incredible, I could have used it on many times in the past… but there are issues.
To start, there is no way to download the entire volume at once. Instead, you have to wade through a file tree and download each table (I apologize for the poor screenshot).
As you can see there are branches on each. There are seven initially:
I did up to the “Agricultura” sub-branch of “Producción, consumo, y cambio” ( Agricultura has the further sub-divisions of “Producción agrícola (35 tablas) / Colonización agrícola (2 tablas) / Producción forestal (18 tablas) / and Ganadería (9 tablas)). This was when I gave up trying to do it in one setting. 127 files later, after adding .pdf to each, I used Adobe Acrobat Standard to combine them. The files changed order when I sorted by name so I sorted by time downloaded to combine into a single file, and named the file.
The file name is the name of the volume and currently contains 177 pages. {And now I am too tired to review the combined files to see what is the resulting order.} I saved it to my desktop so I will see it and remember to continue with the “Industria” sub-branch of “Producción, consumo, y cambio.” I also pasted the list of branches still to be done using Adobe’s “Document properties” feature (accessible by control-d in the file).
And to be honest, I had to do the combining a second time after I deleted the files (but they were still recoverable) because apparently I did not save the original combined file.
So when I am done — what should I do with the resulting file dear readers? And done in this case means several things:
- adding all the files for 1930 from the INE website
- making a proper table of contents in the Adobe file
- trying to number the pages so these correspond to the actual printed pages (unlike Oxford’s obnoxious text numbering in Oxford Scholarship Online)
- OCR
- Reducing the file size
And a final note, I am sure there is a way to do this that is more technologically adept, by “harvesting” the files. I need to read Ian Milligan’s post on doing this on another site and take a stab at it myself.