Compressing XML with multiplexed hierarchical PPM models

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

We established a working Extensible Markup Language (XML) compression benchmark based on text compression, and found that bzip2 compresses XML best, albeit more slowly than gzip. Our experiments verified that TXMILL speeds up and improves compression using gzip and bounded-context PPM by up to 15%, but found that it worsens the compression for bzip2 and PPM. We describe alternative approaches to XML compression that illustrate other tradeoffs between speed and effectiveness. We describe experiments using several text compressors and XMILL to compress a variety of XML documents. Using these as a benchmark, we describe our two main results: an online binary encoding for XML called Encoded SAX (ESAX) that compresses better and faster than existing methods; and an online, adaptive, XML-conscious encoding based on prediction by partial match (PPM) called multiplexed hierarchical modeling (MHM) that compresses up to 35 % better than any existing method but is fairly slow
Original languageEnglish
Title of host publicationData Compression Conference, 2001. Proceedings. DCC 2001.
PublisherInstitute of Electrical and Electronics Engineers
Pages163-172
Number of pages10
ISBN (Print)0-7695-1031-0
DOIs
Publication statusPublished - 2001

Keywords / Materials (for Non-textual outputs)

  • adaptive codes
  • data compression
  • document image processing
  • hypermedia markup languages
  • multiplexing
  • prediction theory
  • PPM
  • XMILL
  • XML compression
  • XML-conscious encoding
  • adaptive encoding
  • bounded-context PPM
  • bzip2
  • encoded SAX
  • extensible markup language
  • gzip
  • multiplexed hierarchical PPM models
  • multiplexed hierarchical modeling
  • online binary encoding
  • online encoding
  • prediction by partial match
  • text compression
  • text compressors
  • Computer industry
  • Encoding
  • Entropy
  • HTML
  • Markup languages
  • SGML
  • Software systems
  • Testing
  • Tree data structures
  • XML

Fingerprint

Dive into the research topics of 'Compressing XML with multiplexed hierarchical PPM models'. Together they form a unique fingerprint.

Cite this