element14 Community
element14 Community
    Register Log In
  • Site
  • Search
  • Log In Register
  • About Us
  • Community Hub
    Community Hub
    • What's New on element14
    • Feedback and Support
    • Benefits of Membership
    • Personal Blogs
    • Members Area
    • Achievement Levels
  • Learn
    Learn
    • Ask an Expert
    • eBooks
    • element14 presents
    • Learning Center
    • Tech Spotlight
    • STEM Academy
    • Webinars, Training and Events
    • Learning Groups
  • Technologies
    Technologies
    • 3D Printing
    • FPGA
    • Industrial Automation
    • Internet of Things
    • Power & Energy
    • Sensors
    • Technology Groups
  • Challenges & Projects
    Challenges & Projects
    • Design Challenges
    • element14 presents Projects
    • Project14
    • Arduino Projects
    • Raspberry Pi Projects
    • Project Groups
  • Products
    Products
    • Arduino
    • Avnet Boards Community
    • Dev Tools
    • Manufacturers
    • Multicomp Pro
    • Product Groups
    • Raspberry Pi
    • RoadTests & Reviews
  • Store
    Store
    • Visit Your Store
    • Choose another store...
      • Europe
      •  Austria (German)
      •  Belgium (Dutch, French)
      •  Bulgaria (Bulgarian)
      •  Czech Republic (Czech)
      •  Denmark (Danish)
      •  Estonia (Estonian)
      •  Finland (Finnish)
      •  France (French)
      •  Germany (German)
      •  Hungary (Hungarian)
      •  Ireland
      •  Israel
      •  Italy (Italian)
      •  Latvia (Latvian)
      •  
      •  Lithuania (Lithuanian)
      •  Netherlands (Dutch)
      •  Norway (Norwegian)
      •  Poland (Polish)
      •  Portugal (Portuguese)
      •  Romania (Romanian)
      •  Russia (Russian)
      •  Slovakia (Slovak)
      •  Slovenia (Slovenian)
      •  Spain (Spanish)
      •  Sweden (Swedish)
      •  Switzerland(German, French)
      •  Turkey (Turkish)
      •  United Kingdom
      • Asia Pacific
      •  Australia
      •  China
      •  Hong Kong
      •  India
      •  Korea (Korean)
      •  Malaysia
      •  New Zealand
      •  Philippines
      •  Singapore
      •  Taiwan
      •  Thailand (Thai)
      • Americas
      •  Brazil (Portuguese)
      •  Canada
      •  Mexico (Spanish)
      •  United States
      Can't find the country/region you're looking for? Visit our export site or find a local distributor.
  • Translate
  • Profile
  • Settings
Artificial Intelligence and Machine Learning
  • Technologies
  • More
Artificial Intelligence and Machine Learning
Blog Developers Created a Compression Algorithm in the 1970s By Adding a Spell Checker to a Unix System Still Used Today
  • Blog
  • Forum
  • Documents
  • Events
  • Polls
  • Files
  • Members
  • Mentions
  • Sub-Groups
  • Tags
  • More
  • Cancel
  • New
Join Artificial Intelligence and Machine Learning to participate - click to join for free!
  • Share
  • More
  • Cancel
Group Actions
  • Group RSS
  • More
  • Cancel
Engagement
  • Author Author: Catwell
  • Date Created: 6 May 2025 7:33 PM Date Created
  • Views 1887 views
  • Likes 3 likes
  • Comments 2 comments
  • pdp
  • cabeatwell
  • spellcheck
  • ai
  • unix
  • innovation
Related
Recommended

Developers Created a Compression Algorithm in the 1970s By Adding a Spell Checker to a Unix System Still Used Today

Catwell
Catwell
6 May 2025

image

A PDP-11 machine. )Image Credit: Wikipedia)

Everyone’s familiar with spellchecking, a handy feature in browsers, text editors, and other software. The tool proved extremely challenging to install as systems lacked sufficient memory to support it at the time. But, a lossless compression algorithm, which is still used today, allowed it to be integrated into a system.

In 1975, AT&T programmers explored ways to use Unix for text processing. However, this required a working spellchecking tool. The easiest way to do this involves putting the dictionary into system memory, and the computer would then look each word up. It’s a tedious task that takes time, and during those years, computers had insufficient memory to handle it.

But, through compression, we can easily load a 250 kB file into 64 kB of RAM. However, this is more challenging than we think. To do this, you would need to use a PC with a superfast multicore processor and lots of RAM. You can create a 256 kB text file with 260,000 characters, compressing it via a compressor and the Lempel-Ziv-Markov chain algorithm to shrink the size to 257 bytes.

AT&T used PDP-11 systems (16-bit microcomputers), which are weaker as they feature less memory (64 kB to a few hundred kB) and processing power. Due to hardware constraints, it would be challenging for the PDP-11 systems to search through a 256 kB text file as the system may struggle to handle that task. It can handle a compressed 257-byte file.

Computer scientist Steve Johnson developed the first Unix-based, disk-backed spellchecking prototype. While functional, the system was slow and made mistakes. Douglas McIlroy developed an algorithm that decreased the amount of memory required to store dictionary words and introduced a data structure, allowing the entire dictionary to load into a few kB of memory. This algorithm only needed 14 bits per word, which meant a 30,000-word dictionary could fit in fewer than 52 kB of memory. Theoretically, the system only required at least 13.57 bits per word.    

A big part of the solution was Golomb coding, a lossless compression still used today in Rice coding, which is used in formats like FLAC, Apple Lossless, and Lossless JPEG.

Have a story tip? Message me at: http://twitter.com/Cabe_Atwell

  • Sign in to reply
Parents
  • michaelkellett
    michaelkellett 4 months ago

    You can create a 256 kB text file with 260,000 characters, compressing it via a compressor and the Lempel-Ziv-Markov chain algorithm to shrink the size to 257 bytes.

    I do not believe that lossless compression of a normal text file by a factor of 1020 has ever been (or ever will be) demonstrated.

    Please cite your source for this assertion.

    (Are you sure that you are not quoting the maximum possible compression of a special case text file - like one consisting repetitions of  the same short phrase  )

    Typically lossless compression of music files (FLAC) achieves about 2:1 compression.

    MK

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
Comment
  • michaelkellett
    michaelkellett 4 months ago

    You can create a 256 kB text file with 260,000 characters, compressing it via a compressor and the Lempel-Ziv-Markov chain algorithm to shrink the size to 257 bytes.

    I do not believe that lossless compression of a normal text file by a factor of 1020 has ever been (or ever will be) demonstrated.

    Please cite your source for this assertion.

    (Are you sure that you are not quoting the maximum possible compression of a special case text file - like one consisting repetitions of  the same short phrase  )

    Typically lossless compression of music files (FLAC) achieves about 2:1 compression.

    MK

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
Children
No Data
element14 Community

element14 is the first online community specifically for engineers. Connect with your peers and get expert answers to your questions.

  • Members
  • Learn
  • Technologies
  • Challenges & Projects
  • Products
  • Store
  • About Us
  • Feedback & Support
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Legal and Copyright Notices
  • Sitemap
  • Cookies

An Avnet Company © 2025 Premier Farnell Limited. All Rights Reserved.

Premier Farnell Ltd, registered in England and Wales (no 00876412), registered office: Farnell House, Forge Lane, Leeds LS12 2NE.

ICP 备案号 10220084.

Follow element14

  • X
  • Facebook
  • linkedin
  • YouTube