We are currently developing a prototype of the VisualPage software application which will enable humanities researchers to explore and analyze large collections of digitized printed books. Scholarly archives recognize that the historical and cultural meanings of printed documents are conveyed not only in their linguistic content, but in their bibliographic and visual elements as well, and thus typically provide users with digitized page images as well as extracted text. Although we have tools for large scale analysis of text, researchers interested in those visual features of digitized printed books have been limited to what they can see and compare with human eyes. The VisualPage tool enables researchers to explore large document collections and to identify unique or representative items, historical trends in typography, page layout and book design, and to make comparisons not accessible to the human eye.
For the start-up phase of the project, funded by the National Endowment for the Humanities [HD5156012], we have selected poetry as our focus for our initial work because the visual appearance of the printed page contributes to the reader’s understanding of the poem’s form and meaning through the conventions of line length, line indentation, and the distribution of white space. The initial data set for this start-up period consists of 300 digitized books of Victorian poetry (approximately 60,000 images) published between 1860-1880.
To learn more about this project:
- Read our paper (PDF): “VisualPage: Towards Large Scale Analysis of Nineteenth-Century Print Culture,” which will be published in the proceedings of the 2013 IEEE International Conference on Big Data as part of the Workshop on Big Humanities
- A longer project description is available here
- Our successful grant application is available from the NEH Office of Digital Humanities