Une ontologie pour les archives : l’exemple de l’APUG
C’est à l’automne dernier que nous avons eu le grand plaisir de rencontrer Damiana Luzzi et de faire la connaissance de Padre Moralès et Irene Pedretti, ces derniers respectivement responsable et archiviste des archives historiques de l’Université Pontificale Grégorienne, à Rome.
Damiana et Irene collaborent depuis 2010 au développement d’une ontologie pour la description des archives conservées par l’APUG. Dans la perspective annoncée de l’évolution vers un modèle conceptuel pour les archives  et de l’inhérente évolution des normes et formats standard qui devrait en découler, la démarche de nos voisins nous a semblé singulièrement intéressante.
En effet, dans le contexte d’un petit service dépositaire d’un fonds extrêmement précieux et varié (archives, manuscrits, ouvrages, objets, etc.), il s’agissait d’inscrire la production des métadonnées descriptives dans la perspective d’une utilisation des technologies les plus avancées d’Internet et du Web sémantique sans pour autant négliger les fondamentaux métier et tout en visant un niveau très fin de granularité.
Damiana nous livre ici une synthèse de la démarche adoptée et de leurs options. Nous avons fait le choix de laisser le texte de Damiana dans sa version originale.
The Historical Archives of the Pontifical Gregorian University (APUG) owns a precious heritage formed by many of the manuscripts belonging to the Roman College, founded by the Jesuits in 1551.
The college was closed at the time of the Society of Jesus’ Suppression in 1773. It was later reopened in 1824, ten years after the restoration of the Society, and continued its educational function until today, as Pontifical Gregorian University.
Because of this long history APUG describes the educational practices over a large stretch of time, about five centuries. Its funds comprises more than 5000 codes attesting the lessons in the Roman College such as: rhetoric, grammar, theology, philosophy, that were hold during two centuries, in addition to the studies of Greek and Latin classics, astronomy, mathematics and phy
sics, and Latin, Hebrew, Greek and Arabic languages. Along with this material, other important documents attest to the fervid activity of research and study that took place at the Roman College:
- the correspondence of Athanasius Kircher, Christopher Clavius;
- the codes used by Sforza Pallavicini to write the Story of the Council of Trent;
- other documents show the relations that many Jesuits around the world maintained with the masters of the Roman College;
- the first topographic maps of China made by the Jesuit D’Elia and documents on the missions in Asia.
The material includes:
- printed texts, some of these glossed by author and with notes of the censors;
- about 6,000 manuscripts with handmade illustrations, glosses, erasures and some of these with insertion of papers and fragments;
- many modern archival documents;
- ancient and modern books, both with a lot of handwritten notes;
- graphic material, such as drawings, prints, maps and photos.
Moreover, the APUG funds attest to the extra educational activity of the teachers, especially for the XIX and XX centuries: so, for example, it also hold documentation from the Second Vatican Council, where some Jesuits professors at Pontifical Gregorian University had an important role. At the end, the troubled vicissitudes lived, especially in Rome, by the Jesuits and other religious Orders’ goods after the unification of Italy, have brought among the APUG funds a lot of “stranger” documents, sometimes totally disconnected from its history.
This variety of resources implies, apart from preservation issues, at least one kind of problems connected with theirs cataloguing: to describe different resources, the standards developed for each of them were used: ISAD(G)  for archival documents, FRBRoo Model  for bibliographic material or ISAAR  and FRAD  for authority records, and so on. Usually these standards have their own software expressly developed to simplify their use, application and search.
Trying to give a different answer to these problems, at the end of 2010 APUG contacted the Digital Renaissance Foundation, an Italian institution which was specialized in the development of advanced solutions for cultural heritage (e.g. semantic web technologies) and with whom I was working. In close collaboration with the staff of the APUG, I developed a prototype of ontology schema which could solve the cited problems without neglecting the principles of APUG work, especially in terms of standardization and conceptualization, and which would allow APUG to describe a great variety of resources at a deeper level. Here are two examples taken from the prototype ontological schema.
The Physical Object and document description
The detailed information about the physical aspect of the resources, in particular about manuscript, is structured on FRBRoo Model, an evolution and merging of CIDOC CRM and FRBR. FRBRoo Model has provided a new class inside the conceptualization, the F4 Manifestation Singleton, expressly thought for the manuscripts.
The Item class provided a detailed description of all the physical elements which compose the document, e.g. binding, material (paper, parchment etc.), damages, restorations, accompanying material: just considering all of these features it is possible to give the right value to an ancient object, which is never fully represented just by the text it carries. At the end, the scientific topics described in APUG documents have also required specific classes for other two kinds of physical objects: Astronomical, such as planets, stars etc., and Instrument, mainly considering scientific instruments.
The Actor class and its specializations
The organization of information about “actors” is inspired by FOAF  ontology: opting for Actor instead of Agent, more familiar term in the humanities and more generic of the term Creator present in ISAD (G). The Actor class is specialized in classes Group and Person. The subclasses of the Group are:
- Organization (e.g. institutions, organizations, agencies, public or private companies) specialized in classes: Family, ReligiousOrder);
- TemporaryOrganization describes bodies of a temporary nature (e.g. committee, consortium, conference).
The Person class is related (isMemberOf) with to Group this relationship indicates whether one or more persons are part of a group. The classes Person and Group have other relationships and attributes respectively to enter:
- date and place of birth and death;
- date and place of foundation and termination of activities.
In cataloguing systems it is very important how the names are structured: a simplification could affect the generation of the relevant indices. Appellation class is used to refer to and define the “names” of: people (PersonName), groups (GroupName), places (PlaceName), objects (Name) and titles (Title). The Names are placed in the normalized form indicating, where applicable, the authority file reference (e.g. VIAF ) and in their possible variant forms (e.g. Clavius, Cristoforo Clavio, Christophorus Clavius, Christoph Clavius, Christoph Clau) also indicating the time and place in which the variant is in use (e.g. some Jesuits have lived in China and they have changed their name). In addition, the classes responsible for the management of the names was designed both to allow simple insertion, following the rules that the inclusion of cataloguing national or international. So, the first name, last name and the particle name are entered separately in order to decide how to build the index of names and how to display the name of the person in the search interface: first name and last name, last name followed by a comma and then the first name, etc. Instead of creating many classes to describe the relationships between actors (Actor) and objects (Object), only the Role class was created. Role, linked to ActorInRole, express, without predetermined patterns, each possible role that an actor can play in given period and place on the basis of an object or archival resource (e.g. author, creator, publisher, sender, addressee, teacher, student, archivist, administrator, official).
On the basis of my experience in this and in other projects, I can point out that an ontology, and its tools, has been very useful, because it facilitates the share, interoperability and reuse of information. It is easy to provide the URI to any class, property and instance present in the ontology schema, so thanks to the Linked Data Model it is possible to make available and gather information generated by other archives, library, museum, galleries, research centres, etc.
At the same time the ontology offers different views and perspectives on resources and on the concepts that they convey, and it will open new ways for further studies and analysis. Such an “enhanced” search allows you to infer, thanks to a reasoner (e.g. Apache Jena Fuseki, HermiT) and deduce new knowledge based on what is available.
- http://www.ica.org/13845/egad-activities-and-projects/egad-strategic-work-plan.html, la version française étant moins expressive.
- General International Standard Archival Description-Second edition
- Functional Requirements for Bibliographic Records object oriented, Version 2.0
- International Standard Archival Authority Record for Corporate Bodies, Persons and Families, 2nd Edition ISAAR (CPF)
- Functional Requirements for Authority Data
- Friend of a Friend expresses information about persons and groups
- Virtual International Authority File