Funding for the Methods Network ended March 31st 2008. The website will be preserved in its current state.

Using Large-Scale XML Corpora in Language and Literature

A workshop activity organized by Lou Burnard, Oxford University Computing Services (26 November 2007)

(html) Workshop Programme and Resources
(pdf) (html) Workshop Report
(html) Workshop Site

This one day workshop introduced the technologies needed to unlock the potential uses of large scale XML-encoded language corpora, with a particular focus on the most recent version of the British National Corpus (BNC XML Edition). Participants learned how to explore this particular corpus using a variety of generic XML tools, focusing on (but not limited to) XAIRA, a general purpose software architecture for the linguistic analysis of large XML corpora. They explored the kinds of language learning activities and linguistic analyses best supported by such tools, and discussed the usability of such tools for fundamental linguistic and literary research in large text bases. The course had a strong practical component, and participants were encouraged to provide samples of their own textual materials to experiment with corpus construction and analysis.

The Workshop was taught by Lou Burnard and Ylva Berglund Prytz (Oxford), together with Guy Aston (Forli).

Who took part

The workshop was aimed at two distinct groups of researcher. The first group contains language or literature specialists who are aware of the potential for corpus-based methods in language pedagogy or literary research and want to apply them either with their own corpus material or with the BNC in its new format. The second group contains technical specialists who are aware of the demand for corpus resources and want to gain practical experience of using XML for corpus creation, development, and usage. The workshop aimed to stimulate dialogue between the two groups, and promote a shared understanding of common goals.

AHDS Methods Taxonomy Terms

This item has been catalogued using a discipline and methods taxonomy. Learn more here.

Disciplines

  • English Literature and Languages
  • European Literature and Languages
  • Linguistics
  • History

Methods

  • Data Analysis - Collating
  • Data Analysis - Collocating
  • Data Analysis - Concording/Indexing
  • Data Analysis - Content analysis
  • Data Capture - Usage of existing digital data
  • Data publishing and dissemination - Textual resource sharing
  • Data Structuring and enhancement - Markup/text encoding - descriptive - conceptual
  • Data Structuring and enhancement - Markup/text encoding - descriptive - document structure
  • Data Structuring and enhancement - Markup/text encoding - descriptive - linguistic structure
  • Data Structuring and enhancement - Markup/text encoding - descriptive - nominal
  • Data Structuring and enhancement - Markup/text encoding - presentational
  • Data Structuring and enhancement - Markup/text encoding - referential
  • Strategy and project management - Usability analysis