How exactly is audio encoded in video

[Print view of the page:]

Whether you provide your own audio and video files on the Internet or want to use existing resources - there are a large number of different playback programs, codecs and file formats available. Here we guide you through the jungle of MPEG, AVI, MKV & Co. On the one hand, the relationships between the various system components are explained. In this way, puzzles will be solved, for example why only certain AVI videos are displayed on your computer - and how you can solve such problems. On the other hand, the advantages and disadvantages of the different formats and methods are explained. In this way you can assess what quality you can expect from an audio or video file and which files are particularly suitable for your purposes.

As an end user, you are particularly familiar with one type of program: the playback programs (or "players"). They play audio or video files and are therefore the software equivalent of CD or DVD playback devices. The program interface contains elements of a remote control: there are buttons for playing, fast-forwarding, rewinding, pausing, etc. Well-known representatives are e.g. B. Windows Media Player, VLC Player or Apple iTunes. Instead of inserting a data carrier, files have to be opened on the software players. Audio and video files can only be opened by a player if it can do something with the file format used.

File formats

The digital data used to represent analog audio or video signals can be organized in various formats. This can best be explained for a single image: There are various ways of storing the individual pixels in a file. For example, whether the image points are stored one after the other from left to right or first from top to bottom in the file is of course a convention that must be specified. The way in which a color value is stored must also be clearly defined. These and many other specifications are determined by the respective file format. In order to store the data, a predefined coding rule is always adhered to, which is ultimately decisive for whether the data can be interpreted correctly. Perhaps the differences between the individual formats can best be understood if you think of them as different data carriers: CDs, large and small records, tapes, etc. can all contain audio data - but you cannot put a record in the CD player! The formats MPEG, Quicktime, or Matroska are just as different. These formats are also known as container formats. The container can easily be imagined as a box, which in turn contains various audio and video codecs. These codecs can both encode and decode files, i.e. compress a signal for transport, and then decompress it again during playback.

Many different codecs for playing audio and video data

In the living room, the various playback devices are often combined in one system so that several devices are not necessary. Different playback programs work in the same way: they are able to read and play different formats. A separate codec is used for each format. These are mini-programs that only do one job - encoding and decoding audio or video information. Each codec can be used to write and read exactly one format. Different codecs are used for different formats - they roughly correspond to the individual technical components of your stereo system. But instead of a device K for playing records and a device C for playing DVDs, a codec M for playing audio in MP3 files and a codec W for playing video according to the MPEG-4 standard in MP4- Files. Most players have already integrated a number of codecs and are therefore able to play several file formats. There is also the possibility that a player learns to understand other file formats by retrofitting additional codecs. Just as you can connect additional devices to your stereo system - such as an old turntable or a high-end CD player - players can be upgraded with plug-ins. A codec plug-in is independent of a specific player and can be used by different players. Additional codecs are required e.g. B. if you want to play newer or rarely used file formats with your player software.

If audio or video files cannot be played, it is usually because the required codec for the respective file format is not available. For example, you cannot play MP3 files with a WMA codec and vice versa. Unfortunately, it is not always that easy to find the right codec. It is true that a codec only ever supports one file format. On the other hand, there are often different codec programs for one file format. To stay with the comparison with the stereo system: the cassette player can only play cassettes, but a cassette fits into different devices from different manufacturers and can also be used in different ways (e.g. slow or fast recording). Codecs are also being developed by various providers. For example, there are numerous codecs for the MP3 format developed by the Fraunhofer Institute for Integrated Circuits. However, these differ less in the decoding of the data (i.e. when playing back), but rather - as explained below - in the coding. Playing MP3 files is therefore usually unproblematic. On the other hand, it is more critical if there are different versions of the same file format. Just like any other software, codecs are also being developed further, with the advantage of better sound quality and the disadvantage that older codec versions no longer understand the new file formats. It becomes particularly confusing when different versions of the file format use the same file extension. The Quicktime file format uses the .mov extension for both QuickTime 6 and 7, for example. But that's not all, because Quicktime is not just a file format, but a complete multimedia architecture that includes a Quicktime player, among other things. The Quicktime Player can now not only play Quicktime files but also other formats. Quicktime often describes very different things and files with the extension .mov can hide not only different versions but also very different formats. For this reason, such files can be played back one time and not another time. It only helps to always install the latest codec, because these are usually backwards compatible, i. H. they can also interpret older formats. In addition, it must be taken into account that the same file extension can mean not only different versions but also completely different file formats. With AVI files, different codecs such as DivX are used for the actual coding of the video data.

So if you want to be sure that all media files can be played, you should have as many codecs as possible ready for your player. Fortunately, most codecs - or at least the decoders - are available for free. Codec collections allow you to refresh your preferred player software in one go.

Encode and compress audio and video data yourself

If you want to make media files available on the Internet yourself, it is important to know the differences between the individual formats a little better. The individual file formats and even the codecs differ in terms of playback quality and the resulting file sizes. Some formats are also suitable for so-called streaming. With this method, media playback can begin before the entire file is available locally on the computer. As soon as the first data has been transferred, playback can begin. Some formats also allow the quality - and thus the required amount of data to be transmitted - to be adjusted depending on the available transmission capacity: fast connections then receive better picture and sound quality.

What all formats have in common is that the audio and video data are written to the files in compressed form. The resulting amounts of data are very large, so the codecs try to reduce the amount of data when encoding. The made-up word codec is therefore often used as a short form for compressor / decompressor. This compression is an important process, especially for video signals, as it allows the enormous amount of data in films to be reduced. However, one should not forget that this is also associated with a loss of quality in the images. The methods of compressing the data are very different. A general distinction is made between lossy and lossless compression.

Lossy Compression

Strictly speaking, lossy compression is a data reduction, i. H. Data volumes shrink by simply discarding data considered irrelevant. It is therefore not possible to faithfully restore the original data. The quality of the audio and video files also depends on how accurately the encoder (part of the codec responsible for encoding the data) identifies data as relevant or irrelevant. Findings from perceptual psychology are used here. Signals that are only weakly or not at all perceived by the human eye / ear are removed or saved with a lower quality. For example, a soft sound immediately before a loud sound is imperceptible to humans (temporal masking). Nor can it distinguish between two tones with very similar frequencies (simultaneous masking). In this way, the data can be simplified without a noticeable reduction in quality for the average listener. The trained ear can, however, make differences in some cases. In addition, it should be noted that further processing of the audio / video files is critical due to the fact that the original data is no longer available. For example, soft tones can no longer be crystallized out and, if necessary, amplified if they have already been filtered out.

Lossless compression

With lossless compression, the original data is retained, it is only written to a file in a more compact manner. In the case of images, for example, instead of saving each individual pixel in a file, only the changes can be taken into account. If, for example, a blue sea is shown, then the file does not have to say “blue-blue-blue ...- blue-blue”, but the specification “123 blue pixels” - the information content does not change. Even with video data, it is often not the individual images that are saved, only the differences between the images. So if the camera shows a still landscape for a few seconds, then this landscape image only needs to be written to the file once. In the case of audio data, similarities between the various channels (left and right stereo channels are often identical) can be used to store data in a more compact manner. In all three examples, the same or only minor changes are the prerequisite for data compression. Unfortunately, since these similarities are not always present in the audio and video data, lossless compression is not as effective as lossy compression.

Common formats

MPEG (Moving Picture Experts Group)

The Moving Picture Experts Group (MPEG) deals with the development of standards in the field of video and audio compression. In 1993 the first MPEG-1 standard was adopted. The aim was to compress the films so much that they could be played at the usual speed of the CD players of the time. These video CDs were of a very poor quality.

Video compression was significantly improved with the new MPEG-2 standard introduced in 1994/95. This standard is widespread today because it is used on commercially available DVDs.

An MPEG-3 standard was never adopted. There were plans to specify such a standard for HDTV quality (high definition television). However, when it was found that the MPEG-2 standard was sufficient for this, this intention was discarded. Incidentally, the widespread MP3 format has nothing to do with the (non-existent) MPEG-3. MP3 was already developed for audio playback on MPEG-1. The "3" stands for the audio level used. MP3 is the abbreviation for MPEG-1 Audio Layer 3 and not MPEG-3.

The MPEG-4 standard has existed since 1999. This standard improves the compression once again. Above all, however, it is much more flexible as it can record a wide variety of audio and video formats. In addition, MPEG-4 supports not only rectangular videos but also audiovisual objects that are combined with one another to create scenes. The focus is on the user's interaction with these objects.

H.264 / MPEG-4 AVC Is a newer generation of MPEG-4 with the addition of AVC (Advanced Video Coding) or also called H.264. H.264 is not a file format, but a standard for highly efficient video compression and thus a codec. The standardized container format is MPEG-4 with the file extension .mp4, but H.264 is not tied to a specific container format. This procedure can also be available in AVI, Matroska or Ogg files. This compression method is increasingly used in HDTV, Blu-Ray films and mobile applications. H.264 is characterized by good video quality, so the process delivers a higher level of detail and richer colors compared to others despite the reduced file size. However, a relatively high computational effort is required for converting and playing media. In addition, this codec is recommended by the video platform YouTube for compressing the videos you have created and which you want to publish on the video platform.

Player: Windows Media Player, Real Player, VLC Player

AVI (Audio Video Interleaved)

The AVI format developed by Microsoft is a so-called container format, i. H. it can record audio and video data streams from various coding methods. It is therefore possible that playback software understands the AVI format, but not the video data it contains. A corresponding codec is therefore required for each data stream for coding or decoding. As a rule, however, AVI files can be played back with most players without any problems.

Player: Windows Media Player, VLC Player, KMPlayer


DivX is a very widespread video codec because it allows extremely high compression with acceptable image quality. Small errors in the display can only occur if the image changes very quickly. With DivX it is possible to shrink films so much that a DVD can be reduced to a CD. Due to the - relatively - small data size, this format has established itself especially on the Internet for exchanging longer films. Due to its widespread use, many DVD players can now interpret this format. The file format is based on the MPEG-4 video standard (see above) and is saved in the AVI container, so it usually has the file extension .avi.

The success of DivX in the home sector has increased so much in recent years that even DVD players in the low-cost segment support DivX. What MP3 is to the music industry, DivX is to the film industry: using this technology, video files can be compressed to such an extent that they can be sent relatively conveniently over the Internet. DivX is based on the MPEG-4 video standard. Correspondingly encoded films (usually AVI files) can be edited and displayed in common video editing and playback programs with the help of a suitable codec.

Player: Windows Media Player, Real Player, VLC Player


The Ogg container file format is a free and software patent unrestricted alternative. In this file format, audio, video and text data can be integrated. The writ codecs for text data, the Vorbis or FLAC audio codecs and the Theora video codec are usually used to compress this data. According to the manufacturers, the Ogg container is particularly well suited for streaming, as it can be streamed without additional adjustments. Ogg is not very common in the video sector, but this file format is becoming increasingly popular in the open source scene.

File extension: .ogg, .oga, .ogv, .ogx


Matroska is a container format for audio and video data as well as text streams (e.g. subtitles) and is also one of the free container formats. Specifications are publicly available and can be used freely in software applications. With this format it is possible, similar to DVDs, to divide files into chapters and to incorporate several audio tracks, e.g. for different languages. There is also the option of using a menu function for user guidance. This container is more common than the container mentioned above. The Matroska format is being used more and more in the HD video sector in particular.

File extension: .mkv, .mk3d, .mka, .mks (2015). Codecs. Last changed on October 29, 2015. Leibniz Institute for Knowledge Media: Accessed on May 24th, 2021