Documentation and User Manual =============================== Author: Gregor Sieber, gregor_sieber(at)web.de Date: Jan 26, 2005 1. File hierarchy File name conventions: usually, ' ' and '/' are converted to '_'. 1.1 xml/ contains all xml source files contains copyright notes in txt and xml(html) format contains session files in parts/ contains corpus file in corpus/, which reads in the files in parts/ through xml entitiy reference. If a new file is added to the corpus, corpus.xml has to be edited manually or by a script, and a new entity loading the file has to be added. 1.2 html/ contains styles/ with styles.css at the moment, the content of styles.css is included into the head of every generated html document due to problems with display through the server contains html of each session contains summaries for each key in topics/ and content/ contains single event html in single/ format is sessionNameTopicTitle 1.3 smil/ contains smil files for all events of the videos 1.4 xsl/ contains the xsl stylesheet(s) used to generate all html and smil files from the corpus 2. XSL Stylesheet(s) At the moment, one single stylesheet called elisa2html.xsl generates all html and smil files in their respective directories. Dependencies: - xml/corpus/corpus.xml, which depends on the session files in xml/parts/ and xml/corpus/ELISA2_DTD.xml - html/styles/styles.css - xml/copyright_notes.{xml|txt} - xml/elisa_description.xml containing the text on top of the page Have a look at the comments inside the stylesheet for further information. 3. Software 3.1 Web browser The html has been tested on Mozilla, Lynx and Konqueror 3.2 XSLT Processor To apply the XSL transformations (XSLT), a processor is needed. The stylesheet(s) have been explicitly tailored to the Apache project's Xalan-j available from http://xml.apache.org/xalan-j/ Note that the java version is needed for the extension funtions (used: xalan:write) to work. Please make sure all xalan components are included in your java classpath. How to do this depends on your operating system. For unix/linux, set your CLASSPATH environment variable in the shell (e.g. your ~/.bashrc). You may also give the classpath of all necessary files to the java interpreter as an extra argument with the '-classpath' switch. Xalan needs: a Java(TM) JDK or JRE which can be downoloaded from IBM or SUN (or the GNU Blackdown Java project, although presumably this has not been tested). For further dependencies cosult the Xalan-j site. 3.3 PERL Concordancer The concordance engine needs a full PERL installation, and a running Apache server with an appropriately configured CGI interface. The current configuration assumes a cgi-bin/ directory at the server root, and an elisa/txt directory containing all the text files to be examined. It treats all lines startig with '%' as comments. Context lengths: The program does not display previous context (be it measures in words or sentences) that extend across paragraph (that is, event/speaker) boundaries. 4. Applying XSLT Before you start: - make sure all parts of the corpus are loaded by entit reference in xml/corpus/corpus.xml - check that all files (-> dependencies) are in place - Install xalan-j and set your classpath Then: - change to the elisa base directory (whatever/elisa/) - execute java org.apache.xalan.xslt.Process -in xml/corpus/corpus.xml -xsl xsl/elisa2html.xsl - java is the interpreter, the second argument is xalan-j, and the '-in' switch defines the input file for xalan-j, while '-xsl' defines the stylesheet to apply. 5. Changing the stylesheet(s) If you change anything, please note that the content of a stylesheet needs to be valid xml. Thus, any html markup defined there must follow the xhtml DTD (which is, in contrast to plain HTML (subset of SGML), a subset of the XML Specification). 6. The Elisa DTD The DTD enforces the following structure: corpus containing sessions with metadata and events. Events are sections in the interview that are individually viewable by use of smil files.