New DTD for Digitized Periodicals

 

Draft of the Version 1.0

 

This new DTD was based on analysis of the current format for structuring digitized periodicals (application of the DOBM language[1]) in the Kramerius programme and the format described in the manual of the former European DIEPER project[2].

 

Neither DOBM nor DIEPER were able to offer a XML DTD, but DOBM was well declared in its specification for periodicals, while the analysis discovered that in spite of an apparently clearly written manual – with description of tags – there were many inconsistencies or not fully expressed facts in the DIEPER manual.

 

It was decided to offer such a metadata granularity that would comply with requirements of both approaches, but the names of elements were not taken – with some exceptions – from these projects, as they could create many misunderstandings. All the names were derived from the English terms to make the definitions internationally understandable and to enable to future users non-equivocal navigation in the descriptive structure.

 

Pre-requisites

 

The structure of the new format is based on several important entities of which the periodicals or their descriptions consist.

 

The DTD respects the natural physical structure of the periodical:

 

Periodical as a whole → periodical volume → periodical item → periodical page

 

A digitized periodical exists in case that it has at least one volume that may be, however, only partly available. The volume can consist of items among which the most current type is issue. The volume or issue consist of pages that may have various attributes (title page, list of maps, table of contents …) the most current of which is normal page.

 

The items or pages can be visually or textually represented by referenced image or text files. The technical information (description) concerning these files can be written for the periodical or its volumes as a whole or it can be separately attached to individual representations that might be necessary in case of non-typical files used individually outside of the general approach.

 

In addition, the DTD can offer descriptions of internal components parts, i.e. those parts of the periodical that are not equal to its physical component parts. Such parts are of various types such as article, abstract, map, preface, etc. These parts can be contained in volumes or items.

 

Each part of the periodical may have its own bibliographic description with the exception of the periodical page; therefore, the core bibliographic description is handled as a separate entity that may be referenced from almost any of the periodical structural parts. Due to the fact of various requirements for description of various parts or their types, the core bibliographic description is not too strict as to mandatory elements.

 

Another important requirement is existence of a unique identifier of each described part of the periodical. For the moment being this element is not defined as mandatory, but if it appears, it must appear at least in the form of a SICI-based identifier. Such identifier can be generated automatically if a generator is developed for this purpose. In any case, it requires the mandatory occurrence of ISSN.

 

 

Later corrections:

 

30 July 2003

·        Added attribute Table to the PeriodicalPage and PeriodicalInternalComponentPart elements

·        Added attribute Blank to the PeriodicalPage element

 

16 July 2003

·        Change structure of the elements PageRepresentation and ItemRepresentation into:

<!ELEMENT ItemRepresentation ((ItemImage | ItemText), TechnicalDescription?)>

<!ELEMENT PageRepresentation ((PageImage | PageText), TechnicalDescription?)>

 

14 July 2003

·        removed attribute Id from the Attribution Lists to the elements ItemImage, ItemText, Page Image, and PageText

 

11 July 2003

·        Element Periodicity made repeatable (Periodicity*)

·        Element Title enabled only 0 or 1 occurrence (Title?)

§         Element MainTitle made mandatory and repeatable with non-mandatory elements SubTitle and  ParallelTitle (MainTitle, SubTitle?, ParallelTitle?)+

§         Elements SortingTitle, KeyTitle, and Coden allowed max. one occurrence (SortingTitle?, KeyTitle?, Coden?)

§         The whole Title group is now written like this:

<!ELEMENT Title ((MainTitle, SubTitle?, ParallelTitle?)+, SortingTitle?, KeyTitle?, Coden?)>

<!ELEMENT MainTitle (#PCDATA)>

<!ELEMENT SubTitle (#PCDATA)>

<!ELEMENT ParallelTitle (#PCDATA)>

<!ELEMENT SortingTitle (#PCDATA)>

<!ELEMENT KeyTitle (#PCDATA)>

<!ELEMENT Coden (#PCDATA)>

(The reason for changes is a better adjustement to the metadata already input in the old structures)

 

3 July 2003

·        Element UniqueIdentifier made again not mandatory (if it appears, the UniqueIdentifierSICIType is mandatory) on any level it is defined for (unable to generate it from data available for conversion)

 

30 June 2003

·        Element ISSN made mandatory and made child of Periodical from being child of CoreBibliographicDescriptionPeriodical

·        Element PageNumber made child also of the PeriodicalInternalComponentPart together with PageReference element (this one used for indicating extension of pages on which the part is located)

Added one more Role for the Creator element: EditorInChief

 

24 June 2003

·        Element UniqueIdentifier made mandatory (will be used on SICI base) on any level it is defined for

·        Element Accessibility no longer mandatory

·        Element ShelfNumber made mandatory and repeatable with the same occurrence of the PeriodicalOwner element (more shelf-numbers enabled for one library)

 

 

 

 

 


The digitized periodical is defined by the following group of autonomous entities:

 

Periodical– defined in Periodical.dtd – root of the entity tree

PeriodicalVolume – defined in PeriodicalVolume.dtd

PeriodicalItem – defined in PeriodicalItem.dtd

PeriodicalPage – defined in PeriodicalPage.dtd

PeriodicalInternalComponentPart – defined in PeriodicalInternalComponentPart.dtd

 

UniqueIdentifier – defined in PeriodicalIdentifier.dtd

CoreBibliographicDescriptionPeriodical – defined in CoreBibliographicDescription.dtd

 

 

 

It is evident that the UniqueIdentifier element and the CoreBibliographicDescriptionPeriodical element are largely referenced as entities in the whole structure of the digitized periodical.


 

UniqueIdentifier

 

The unique identifier (UniqueIdentifier element) is needed for each part described on basis of the above mentioned DTD, because it identifies important component parts of the periodical, be they identical with the components of the file (physical) structure or with internal components parts such as, for example, articles, abstracts, or photographs.

 

 

The SICI type (UniqueIdentifierSICIType) element should be preferred for the moment being. Its creation is described in detail in a separate standard[3] and its creation should be solved separately from this DTD definitions.


 

CoreBibliographicDescriptionPeriodical

 

It defines the CoreBibliographicDescriptionPeriodical element, which consists of the following child elements that form the core of the bibliographic description for the periodical:

 

 

 

 

Title

 

The Title element consists of elements as follows: MainTitle, SubTitle, SortingTitle, ParallelTitle, KeyTitle, and Coden. These elements are used in conformity with existing cataloguing rules applicable for periodicals, e.g. AACR2. The MainTitle element is mandatory.

 

 

 

 

Creator and Contributor

 

These elements have analogue structures. Each of the consists of CreatorSurname and CreatorName elements. The CreatorName element is repeatable to enable entry of more names with one family name (CreatorSurname). If the name of a person cannot be split in two parts, it is marked entirely as CreatorSurname.

 

 

The creator and contributor may have various roles with reference to the document. The applicable roles are for both: Annotator, Artist, Assignee, Author, AuthorOfScreenplay, BibliographicAntecedent, Cartographer, Commentator, Compiler, Composer, Dedicatee, Dedicator, DubiousAuthor, Editor EditorInChief (only for Creator), Engraver, Etcher, FilmEditor, GraphicTechnician, Honoree, Illuminator, Illustrator, Interviewee,  Interviewer, Librettist, Litographer, Lyricist, MetalEngraver, Originator, Other, Performer, Photographer, Recipient, Rubricator, Scenarist, Secretary, Translator, TypeDesigner, Typographer, and WoodEngraver. The default value for the Creator element is Author, while the default value for the Contributor element is Editor. The roles have been selected from the list of relation codes as listed in the Supplement C to the UNIMARC Manual[4].

 

The creator has decisive responsibility for the creation of a document, while the contributor is another person that participated in its creation without having the core responsibility. The difference between these two elements in practice is upon decision of cataloguing staff.

 

 

 

 

GMD

 

GMD element indicates the type of the document. It will be mostly periodical, but it can have also other values in component parts, e.g. monograph – in case of a monograph supplement.

 

 

Publisher and Printer

 

 

 

 

The structure of these two elements is similar, because they both mark place, name, and date of publishing or printing activities; therefore, they consist of child elements that express this idea: PlaceOfPublication or PlaceOfPrinting, PublisherName or PrinterName, and DateOfPublication or DateOfPrinting.

 

 

PhysicalDescription

 

This element consists of three child elements: Size (it indicates the size, e.g. 40x65 cm), Extent (it indicates the extent, e.g. 67 pp.), and Technique (it indicates the technique of publication, e.g. typewritten material, current printing, hand written material, etc.).

 

 

Series

 

It marks the series in which the periodical title is published.

 

Language

 

It indicates the language in which the periodical is printed. The two-character abbreviation code for language is used following the ISO 636 standard.

 

Subject

 

It consists of two child elements: UDC and DDC.

 

 

The application of DDC at least its main 100 classes as listed in the DIEPER manual is needed for international co-operation.

 

Keyword

 

Keywords are freely attributed terms for better subject retrieval in natural language.

 

Accessibility

 

Accessibility indicates access rights for various groups of users or possibly also the difference between free and copyrighted works.

 

Notes

 

Notes are used for any other information concerning the elements of the bibliographic description.

 

Annotation

 

Annotation is used for providing the context important for better understanding of cultural, historical, artistic, or other values of the digitized title.

 

 

 

 

 


 

Periodical

 

The structure of the Periodical element is shown below. The elements UniqueIdentifier, CoreBibliographicDescriptionPeriodical, and PeriodicalVolume are external entities defined by separate DTDs that are needed also for other files.

 

 

 

 

The specific description elements of the Periodical are:

 

PeriodicalOwner

 

 

It consists of Library element and ShelfNumber element, which express where the periodical is stored and which shelf-number it has. The PeriodicalOwner element can be used as many times as necessary.

 

DescriptionBasedIssue

 

It marks the issue on which the bibliographic description of the whole periodical was made.

 

ISSN

 

ISSN is indispensable for international communication. Old periodicals should receive their ISSN from the National ISSN Agency in the State Technical Library, Prague.

 

 

 

Periodicity

 

It brings information about how many times the periodical appears in the given period. The changes in periodicity during the publication of the periodical should be mentioned in Notes element.

 

 

RegularSupplement

SpecialSupplement

 

These elements describe regular or special supplements that are relevant for the characterization of the whole periodical. Any supplement can be described separately and fully within the PeriodicalItem.dtd.

 

TechnicalDescription

 

 

It contains the following elements: ScanningDevice, ScanningParameters, and OtherImagingInformation. It can be referenced as an entity also from other parts of the periodical or from individual visual or textual representation.

 


 

 

PeriodicalVolume

 

The PeriodicalVolume works with separately defined entities as well: UniqueIdentifier, CoreBibliographicDescriptionPeriodical, PeriodicalInternalComponentPart, and Periodical Page (they have their own DTD). The element Technical Description is taken from the Periodical.dtd.

 

 

The only specific element of the PeriodicalVolume.dtd is:

 

PeriodicalVolumeIdentification

 

 

It consists of the following elements:

 

PeriodicalVolumeNumber

 

It marks the number that identifies the volume.

 

PeriodicalVolumeSorting

 

It marks any artificially assigned number that is necessary for sorting.

 

PeriodicalVolumeDate

 

It marks the date of publication of the volume. It is mandatory.

 

Defects – this element serves for noting any missing parts of the volume.


 

PeriodicalItem

 

This element works also with separate entities:

UniqueIdentifier, CoreBibliographicDescriptionPeriodical, PeriodicalInternalComponentPart, and PeriodicalPage. The periodical item may be of following types: Appendix, PeriodicalIssue, PeriodicalSupplement, or PeriodicalOtherItem. The default value is PeriodicalIssue.

 

 

The specific elements of the PeriodicalItem are:

 

PeriodicalItemIdentification

 

It consists of the PeriodicalItemNumber, PeriodicalItemNumberSorting, and PeriodicalItemDate elements. Their meaning is analogue to similar elements used for the identification of the periodical volume.

 

 

ItemRepresentation

 

The item can be represented by image files (ItemImage element) or by text files (ItemText element).

 

 

 

 

The whole item can be represented especially by such image formats/files that enable multi-page representation such as PDF, DjVu, TIFF, or LDF. It can be also represented by one or more text files.


 

 

PeriodicalPage

 

The PeriodicalPage works with the UniqueIdentifier entity defined separately. It may be of the following types: Advertisement, Blank, Index, ListOfIllustrations, ListOfMaps, ListOfTables, NormalPage, Table, TableOfContents, or TitlePage. The default value is NormalPage.

 

 

The specific child elements of the PeriodicalPage element are: PageNumber that is necessary for organizing the item structure and the PageRepresentation element. The element Notes is taken from the bibliographic description and it may be needed to express some specific attributes of the page.

 

The PageRepresentation element consists of PageImage and PageText child elements.

 

 

From here the external image entities (JPG, GIF, TIF, DjVu, SID …) are referenced as well as text entities (XML, HTML, SGML, TXT …). The TechnicalDescription (see above) can accompany each of them if necessary.


 

 

PeriodicalInternalComponentPart

 

The PeriodicalInternalComponentPart works with separately defined entities UniqueIdentifier and CoreBibliographicDescriptionPeriodical. It may be of following types: Abstract, Article, Bibliography, Chapter, Contributors, Debate, Dedication, EditorsNote, Figure, Illustration, Introduction, Map, Obituary, OtherDocStrct, OtherNote, Photograph, Preface, Remark, Reviews, Table, or TechnicalPlanScheme. The default value is Article.

 

 

The PeriodicalInternalComponentPart element consists of the following specific elements:

 

InternalComponentPartIdNumber

 

It is used in case the parts, e.g. articles, illustrations, maps, etc., are numbered in the printed periodical.

 

PageReference

 

This element marks the extent from the first page until the last page, which concern the described object.

 

PageNumber

 

The element marks the first page on which the part starts.

 

 

Adolf Knoll

16 July 2003

 

 

 

 

 

 

 



[1] Digitization of Rare Library Materials: Storage and Access to Data / Project Management by Adolf Knoll and Stanislav Psohlavec. Authors: Adolf Knoll, Stanislav Psohlavec, Jan Mottl, Jan Vomlel, Tomáš Mayer, ...
Prague, National Library - Albertina icome Praha, 1999. Memoriae Mundi Series Bohemica. Available on-line from the URL http://www.nkp.cz/start/knihcin/digit/WWW/ENTER.HTM

[2] Electronic Document Format for Digitised Periodicals. Deliverable 14/17. Specification reports / Manual for fulltext capturing, text encoding and document structuring. Göttingen, Universitätsbibliothek, 2000. 57 pp. Available on-line from the URL http://gdz.sub.uni-goettingen.de/dieper/d14_17f4.pdf

 

[3] Serial Item and Contribution Identifier (SICI). ANSI/NISO Z39.56-1996 (Version 2) / developed by the National Information Standards Organization : approved August 14, 1996 by the American National

Standards Institute. 37 pp.  — (National information standards series, ISSN 1041-5653) ISBN 1-880124-28-9. Available on-line from the URL http://sunsite.berkeley.edu/SICI/

 

[4] UNIMARC Manual. Ed. IFLA / Holt, Brian P., with assitance of McCallum, Sally H. / Long, A. B. 1987. V, 482 pp. (+ Errata UBCIM, 1992); see also the Slovak edition from 1994.