Office HTML and XML File Formats

When a Microsoft Office document is saved as a Web page, a main HTML file and a number of related files are created. The format of HTML files conforms to the HTML standard and allows the document to be opened and viewed in a variety of Web browsers. In Microsoft Excel, HTML framesets are created to display workbooks that contain several worksheets, and in Microsoft PowerPoint, to display slides and notes simultaneously.

The Main, Slide, and Worksheet HTML Files

The main HTML document file contains the following structure.


<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>

The Head HTML element contains Meta elements that specify the content type, language, programmatic identifier, generator, title, and themes, a Base element that specifies the base path, and Link elements that specify paths to supporting files such as the stylesheet, header and footer text, and binary edit data files. The programmatic identifier specifies the Office application used to edit the page. When a supporting file is opened in an Office application, the Link element that specifies the main file name directs the application to open the main file.


<style>

The Style HTML element contains CSS style definitions.


<xml>

The XML element specifies the beginning of a block of XML elements in the document.


<o:DocumentProperties>
<o:CustomDocumentProperties>

The DocumentProperties and CustomDocumentProperties XML elements contain subelements that describe data for the document's Properties dialog box and user-defined custom properties. The properties differ between Office applications.


<o:OfficeDocumentSettings>

The OfficeDocumentSettings XML element contains shared document settings and Web options.


<script>

The Script HTML element contains the script behind the navigation controls and frameset, and for supporting files, the script to redirect the Web browser to open the main file.


<x:ExcelWorkbook>

In Microsoft Excel documents, the ExcelWorkbook XML element contains a description of the workbook and workbook properties. The ExcelWorksheet subelement is specified and contains a description of the worksheets and their properties. If the workbook contains a single worksheet, the WorksheetOptions subelement specifies the worksheet options. If the workbook contains more than one worksheet, the worksheet options for each worksheet are stored in a separate worksheet HTML file.


<p:Presentation>

In Microsoft PowerPoint documents, the Presentation XML element contains a description of the presentation.


<w:WordDocument>

In Microsoft Word documents, the WordDocument XML element contains a description of the document.


</xml>
<link>

The Link HTML element specifies the paths to supporting files such as the worksheet and slide frame HTML files.


</head>

<body>

The Body HTML element contains the content, such as a single worksheet or slide, that is displayed when the page is viewed in a Web browser, or the frameset elements that specify the supporting files.


</body>
</html>

The Header and Footer HTML File

The header and footer HTML file contains the following structure.


<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>

The Head HTML element contains Meta elements that specify the content type, language, programmatic identifier, generator, title, and themes, in addition to Link elements that specify paths to the main file and supporting files such as the stylesheet and binary edit data files.


<style>

The Style HTML element contains CSS style definitions specific to headers and footers.


<script>

The Script HTML element contains script to redirect the Web browser to open the main file.


</head>
<body>

The Body HTML element contains the header and footer data. Each header and footer is contained in a Div element having a unique ID attribute corresponding to its reference in the main HTML document.

The following Div HTML elements are examples of a section 1 primary header, primary footer, and even header.


<div id="h1">
.
.
.
</div>
<div id="f1">
.
.
.
</div>
<div id="eh1">
.
.
.
</div>

</body>
</html>

The Worksheet Tab Strip and Navigation Frame HTML Files

Worksheet tabs are implemented as cells in a single-row HTML table. Each cell contains the worksheet's name, which is a hyperlink to the worksheet HTML file that is displayed in the worksheet frame. The following example shows a tabstrip.htm file for a workbook called master.htm that contains 2 worksheets called Sheet1 and Sheet2. The tabstrip.htm file contains Meta HTML elements that specify the content type, programmatic identifier, and generator, in addition to a Link element that specifies the path and name of the main file. The script redirects the browser to the main file, and the Style element contains an inline style definition for the Anchor elements that contain the table cell text.


<html>
<head>
<meta http-equiv=Content-Type content="text/html; charset=shift_jis">
<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 9">
<link id=Main-File rel=Main-File href="../master.htm">

<script language="JavaScript">
<!--
if (window.name!="frTabs")
 window.location.replace(document.all.item("Main-File").href);
//-->
</script>
<style>
<!--
A {
    text-decoration:none;
    color:#000000;
    font-size:9pt;
}
-->
</style>
</head>
<body topmargin=0 leftmargin=0 bgcolor="#808080">
<table border=0 cellspacing=1>
 <tr>
 <td bgcolor="#FFFFFF" nowrap><b>
  <small><a href="sheet001.htm" target="frSheet">
   <font face="‚l‚r ‚oƒSƒVƒbƒN"
    color="#000000">Sheet1</font>
  </a></small></b></td>
 <td bgcolor="#FFFFFF" nowrap><b>
  <small><a href="sheet002.htm" target="frSheet">
   <font face="‚l‚r ‚oƒSƒVƒbƒN"
    color="#000000">Sheet2</font>
  </a></small></b></td>
 </tr>
</table>
</body>
</html>

Navigation frame files use a similar structure and mechanism for implementing controls and hyperlinks to worksheet and slide HTML files.

The PivotTable Cache Data XML File

The structure and syntax of elements in the cachedata.xml file comply with the Specification for XML-Data. For an example of a cachedata.xml file, see the PivotCache element topic.

The File List XML File

For an example of a filelist.xml file, see the MainFile element topic.