Funding for the Methods Network ended March 31st 2008. The website will be preserved in its current state.

The Historical Thesaurus of English (HTE)

The Historical Thesaurus of English is the first historical thesaurus to be compiled for any of the world's languages. It intends to include almost the entire recorded vocabulary of English from Old English to the modern period, taken from the Oxford English Dictionary and dictionaries of Old English. The distinctive, semantically-structured hierarchy of the HTE data allows scholars access to material in a uniquely flexible manner, making it an invaluable resource to historians and linguists in particular.

The Historical Thesaurus was begun in 1964 by Professor M.L.Samuels and has been funded by the AHRB, British Academy, the Leverhulme Trust, the Carnegie Trust, and The University of Glasgow. The project is hosted by the University of Glasgow.

The Project

The Historical Thesaurus of English (HTE) offers scholars unique materials for the study of the history of the English language. It includes almost the entire recorded vocabulary of English from Old English to the modern period, arranged in chronological order in a semantically-structured hierarchy. It resembles works like Roget's Thesaurus in that words are arranged according to their meanings rather than listed alphabetically. It differs, however, from any other thesaurus so far produced by listing obsolete words and obsolete meanings of current words as well as treating contemporary English comprehensively. It also has a new system of classification suited to a large body of historical material. The Thesaurus of Old English (TOE) supplements the data of the HTE and can be used as a discrete resource in the form of an online searchable database.

Primary Aims

The creation and use of thesauri is a long-established method of organizing words for the purpose of understanding the relationship between both closely-related words and those in the language as a whole. The creation of an English language historical thesaurus dates back to 1964 when Professor M.L. Samuels of Glasgow University announced an initiative to produce a comprehensive study of the English vocabulary. The raw data in this archive now consists of 650,000 slips representing material taken from the Oxford English Dictionary and its supplements, and Anglo-Saxon dictionaries. From the perspective of reviewing and illustrating technical methods, it is the development and refinement of the classification structure and the application of it to research which will be of interest to those involved with developing tools and techniques.

The thesaurus is significant to researchers working in a number of areas, including linguistics, stylistic studies, literary history, the history of ideas and cultural studies. It offers scholars unique material that will help them determine not only the range and meaning of words that were available to writers at different periods throughout English history, but also, in a less specific but enormously valuable way, how the prominence and significance of different ideas have waxed or waned or remained consistent, from the period of usage defined as Old English (c.700 – 1100 A.D.) through to the present day. With dates associated with particular word forms and words related by semantic groups, the number of words that relate to any particular object or idea becomes a significant figure in itself, denoting as it does more or less societal and literary focus on that concept.

Another related application of the thesaurus could be the creation of a probability-based method of disambiguating historical word-forms. The precise meaning of a word and how it was used in any particular historical context can be indicated by the quantity and meaning of semantically linked words that were also in use at that particular juncture. Variant spellings are a problem in historical word searches, and context will often be a more informative guide to meaning than the form of the word itself.

Whilst quantitative methods such as this are far from infallible, they give useful indications of trends and likelihood, especially if the methods can be applied to a rich and comprehensive data source. The complex hierarchical structure of the HTE allows for sophisticated qualifiers to be introduced that will weight linkages that appear closer together within that hierarchy. The development of the classification system, which is the principal methodological component that determines the use of the HTE as a case study, involves the designation of the data into three major divisions:

  • The World, including the physical universe, plants and animals
  • The Mind, covering man's mental activities
  • Society, which deals with social structures and artefacts

These top-level divisions are then divided into numbered hierarchical categories that further subdivide into sub-categories that are associated with lengthening numerical strings. Semantic paths are therefore established that associate general concepts with minute details.

Example:

03.03 Armed hostility
03.03.16 Military equipment
03.03.16.01 Weapon
03.03.16.01.01 Club/stick
03.03.16.01.02 Other blunt weapon
03.03.16.01.03 Sharp weapon
03.03.16.01.03.01 Spear/lance
03.03.16.01.03.02 Pike
03.03.16.01.03.03 Halberd
03.03.16.01.03.04 Axe
03.03.16.01.03.05 Scythe
03.03.16.01.03.06 Side-arms
03.03.16.01.03.06.01 Sword
03.03.16.01.03.06.01/01 (.broadsword)
03.03.16.01.03.06.01/02 (.scimitar)
03.03.16.01.03.06.02 Knife/dagger

The position of each word or phrase is determined by chronological evidence of usage and this is supplemented by further information pertaining to part of speech, and associated ‘style and status labels’ that are derived from those in the Oxford English Dictionary. These style labels are two letter designations that denote broad categorical descriptions e.g. aeronautics (ae), prosody (pd), rare (rr) etc. Date information is held in a form reflecting continuity and discontinuity of usage, and along with cross-reference information, a total of 29 fields of information may be associated with each word.

Publications/Further Reading

Thesaurus of Old English online: <http://libra.englang.arts.gla.ac.uk/oethesaurus/>

A pilot online version, prepared under an AHRC ICT Strategy grant, can be seen at: <http://libra.englang.arts.gla.ac.uk/historicalthesaurus/>

The full electronic version will be available for academic research in late 2007. Oxford University Press will publish the paper version in 2008, and is working on linking the project to the online Oxford English Dictionary, which will create an extremely powerful resource for historians of the English language.

The fact that all the material is held in a database makes it adaptable for other purposes. The team is currently working on two spin-offs for the Higher Education Academy English Subject Centre, ‘Learning and Teaching with the Thesaurus of Old English’ (completion December 2006), and ‘Word Webs: Exploring Vocabulary’ (completion November 2007).

Kay, Christian and Jane Roberts, 'Definitions for a New Age', Poetica 62, 2004, 53-68.

Kay, Christian and Jeremy Smith (eds), Categorization in the History of English, Amsterdam: Benjamins, 2004.

Kay, Christian and Irene Wotherspoon, 'Turning the Dictionary Inside Out: Some Issues in the Compilation of Historical Thesauri', A Changing World of Words: Studies in English Historical Semantics and Lexis, ed. Javier E. Diaz Vera, Amsterdam: Rodopi, 2002, 109-35.

Tools and Methods

Tools

MySQL; PHP; Apache. There is, as yet, no defined toolset to use in conjunction with this category system. The technical specification associated with this resource consists of open-source software.

Method Categories

Data Structuring and Enhancement; Data Analysis; Data Publishing and Dissemination; Communication and Collaboration;

Specific Methods

data modelling; markup/text encoding; cataloguing / indexing; searching/querying

Data Formats

MySQL; HTML

Project Website

<http://www.arts.gla.ac.uk/SESLl/EngLang/thesaur/homepage.htm>

Certain areas of the Thesaurus have yet to be completed and these can be identified by clicking on the three top level category areas on the HTE home page.

Thesaurus of Old English online: <http://libra.englang.arts.gla.ac.uk/oethesaurus/>

Staff and Advisors

Principal Staff Member

  • Professor Christian J Kay, Director

Other Staff Members

  • Flora Edmonds, Database Officer
  • Lesley A Haughton, Research Assistant
  • Cerwyss O'Hare, Research Assistant
  • Professor Jane Roberts, Consultant Editor
  • Professor M. L. Samuels, Consultant Editor
  • Irené A W Wotherspoon, Senior Research Assistant

Postgraduate Assistants

  • Marc Alexander
  • Ellen Bramwell
  • Lindsey Goring
  • Johanna Green
  • Rosie Robertson
  • Sonia Tinagli-Macrae
  • Kate Wild

AHDS Methods Taxonomy Terms

This item has been catalogued using a discipline and methods taxonomy. Learn more here.

Disciplines

  • English Literature and Languages
  • Linguistics

Methods

  • Data Analysis - Collating
  • Data Analysis - Collocating
  • Data Analysis - Concording/Indexing
  • Data Analysis - Content analysis
  • Data Analysis - Searching/querying
  • Data Structuring and enhancement - Data modelling - flat/rectangular
  • Data Structuring and enhancement - Data modelling - network
  • Data Structuring and enhancement - Data modelling - object oriented
  • Data Structuring and enhancement - Data modelling - relational
  • Data Structuring and enhancement - Markup/text encoding - descriptive - conceptual
  • Data Structuring and enhancement - Markup/text encoding - descriptive - document structure
  • Data Structuring and enhancement - Markup/text encoding - descriptive - linguistic structure
  • Data Structuring and enhancement - Markup/text encoding - descriptive - nominal
  • Data Structuring and enhancement - Markup/text encoding - presentational
  • Data Structuring and enhancement - Markup/text encoding - referential