Video Signal Representation Visual Representation

Transcrição

Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
Video Signal Representation
Chapter 2: Basics
• Audio Technology
• Images and Graphics
• Video and Animation
Video signal representation includes:
• Visual representation
• Transmission
• Digitization
2.3: Video and Animation
• Basic concepts
• Television standards
• MPEG
• Digital Video Broadcasting
• Computer-based animation
Several important measures for the visual representation:
1. Vertical Detail and Viewing Distance
Smallest detail that can be reproduced is a pixel
– Ideally: One pixel for every detail of a scene
– In practice: Some details fall between scanning lines
Kell factor: only 70% of the vertical details are represented, due to the fact that some
of the details of the scene fall between the scanning lines (determined by
experience, measurements)
Chapter 3: Multimedia Systems – Communication Aspects and Services
Chapter 4: Multimedia Systems – Storage Aspects
Chapter 5: Multimedia Usage
Chapter 2.3: Video and Animation
Page 1
Visual Representation
The Kell factor of ≈ 0.7 is independent of the way of scanning, i.e. whether
• the scanning is progressive (sequential scanning of lines) or
• the scanning is interlaced (alternate scanning, line 1, line 3, ... line n-1, line 2, line 4, ...)
2. Horizontal Detail and Picture Width
Picture width for conventional television service is 4:3 · picture height
Geometry of the television image is based on the
aspect ratio, which is the ratio of the picture width W
to the height H (W:H).
• The conventional aspect ratio for television is 4:3 =
1.33
• Modern systems use 16:9 = 1.77
Scan lines
A “detail” where only 2 out of 3
components can be represented
700
525
700:525 = 4:3
(NTSC standard)
The viewing distance D determines the angle α
subtended by the picture height H.
This angle is measured by the ratio of the picture height
to viewing distance: tan(α) = H/D.
Page 2
Page 3
H
α
D
Page 4
5. Luminance and Chrominance
Usually not RGB, but YUV (or a variant) is used
3. Total Detail Content of the Image
Total number of picture elements
= number of vertical elements · number of horizontal elements
= vertical resolution2 · aspect ratio
= 5252 · 4/3 (for NTSC)
6. Temporal Aspects of Illumination
Motion is represented by a rapid succession of slightly different still pictures
(frames). A discrete sequence of pictures is perceived as a continuous sequence of
picture (due to “Lucky weakness” of the human brain).
Between frames, the light is cut off briefly.
For a realistic presentation, two conditions are required:
– Repetition rate must be high enough to guarantee smooth motion
– The persistence of vision must extend over the interval between flashes
4. Perception of Depth (3D impression)
• In natural vision: angular separation of the images received by the two
eyes
• Television image: perspective appearance of objects, choice of focal
length of camera lens, changes in depth of camera focus
7. Continuity of Motion
To perceive continuous motion, the frame rate must be higher than 15 frames/sec.
For smooth motion the frame rate should be 24 - 30 frames/sec.
Page 5
Page 6
9. Temporal Aspect of Video Bandwidth
The eye requires a video frame to be scanned every 1/25 second
Scan rate and resolution determines the video bandwidth needed for transmission
During one cycle of video frequency (i.e. 1 Hz) at most two horizontally adjacent
pixels can be scanned
Vertical resolution and frame rate relate to horizontal (line) scan frequency:
vertical lines (b) · frame rate (c) = horizontal scan frequency
Horizontal resolution and scan frequency relate to video bandwidth:
horizontal lines (a) · scan frequency / 2 = video bandwidth
→ video bandwidth = a · b · c/2
(since 2 horizontally adjacent pixels can be represented simultaneously
during one cycle of video frequency)
8. Flickering
Through slow motion a periodic fluctuation of brightness, a flicker effect, arises.
How to avoid this disturbing effect?
• A first trick: display each picture several times
E.g.: 16 pictures per second: very inconvenient flicker effect
→ display every picture 3 times, i.e. with a refresh rate of 3 · 16 = 48 Hz
• To avoid flicker, a refresh rate of at least 50 Hz is needed
• Computer displays achieve 70 Hz of refresh rate by the use of a refresh buffer
• TV picture is divided into two half-pictures by line interleaving
• Refresh rate of 25 Hz (PAL) for the full TV picture requires a scan rate of
2 · 25 Hz = 50 Hz.
Page 7
A computer system with a resolution of a = 1312 and b = 800 pixels out of
which 1024 · 786 are visible and a frame rate c = 100 Hz needs:
• a horizontal scan frequency of 800 · 100 Hz = 80 kHz
• a video bandwidth of 1312 · 80 kHz / 2 = 52.48 MHz
Page 8
Digitization
Video Controller Standards
For image processing or transmission, the analog picture or video must be converted to a
digital representation.
Digitization consists of:
• Sampling, where the color/grey level in the picture is measured at a M×N array of
pixels.
• Quantizing, where the values in a continuous range are divided into k intervals.
For a satisfiable reconstruction of a picture from quantized samples
– 100 or more quantizing levels might be needed
– Very often 256 levels are used, which are representable within 8 Bits
A digital picture consists of an array of integer values representing pixels (as in a simple
bitmap image format)
Page 9
Video Controller Standards
= 0.60 ⋅ R − 0.28 ⋅ G + 0.32 ⋅ B
Storage
capacity
2 Bits/pixel
16 KBytes
Enhanced Graphics Adapter (EGA)
640 x 350
4 Bits/pixel (i.e.
16 colours are possible)
112 KBytes
Video Graphics Array (VGA)
640 x 480
8 Bits/pixel
(256 colours)
307 KBytes
8514/A Display Adapter Mode
1024 x 768
8 bits/pixel
786 KBytes
Extended Graphics Array (XGA)
1024 x 768
640 x 480
8 Bits/pixel
16 Bits/pixel
786 Kbytes
614 KBytes
Super VGA (SVGA)
1024 x 768
24 Bits/pixel
2359 KBytes
Super XGA (SXGA)
1280 x 1024
24 Bits/pixel
3932 KBytes
Ultra XGA (UXGA)
1600 x 1200
24 Bits/pixel
5760 KBytes
Page 10
A Typical NTSC Encoder
Y Matrix
Adder
4.2 MHz
low pass filter
Composite
NTSC out
R
Q = 0.21 ⋅ R + 0.52 ⋅ G + 0.31 ⋅ B
G
Composite signal for transmission of the signal to receivers:
• Individual components (YIQ) are composed into one signal
• Basic information consists of luminance information and chrominance difference
signals
• Use appropriate modulation methods to eliminate interference between luminance
and chrominance signals
Color
presentation
320 x 200
Video format standard for conventional television systems in the USA (since 1954):
NTSC (National Television Systems Committee)
• Picture size: 525 rows, aspect ratio 4:3, refresh rate of 30 frames/sec
• Uses a YIQ signal (in principle nothing but a slight variation of YUV scheme)
= 0.30 ⋅ R + 0.59 ⋅ G + 0.11 ⋅ B
Resolution
C=A·B
Television - NTSC
I
B
Colour Graphics Adapter (CGA)
Y
A
I Matrix
1.3 MHz
low pass filter
Quadrature
Modulation
B
Q Matrix
0.6 MHz
low pass filter
Subcarrier
Page 11
Page 12
NTSC
•
•
•
•
Television - PAL
Required bandwidth to transmit NTSC signals is 4.2 MHz, 6 MHz including sound
The luminance (Y) or “monochrome” signal uses 3.58 MHz of bandwidth
The I-signal uses 1.5 MHz of bandwidth, the Q-signal 0.5 MHz
The I-signal is In-phase with the 3.58 MHz carrier wave, the Q-signal is in Quadrature
(90 degrees out of phase) with the 3.58 MHz carrier wave.
Chromatic
Subcarrier
Picture Carrier
(Luminance)
Sound Carrier
PAL (Phase Alternating Line, invented by W. Bruch/Telefunken 1963)
• Frame rate of 25 Hz, delay between frames: 1000ms / 25 frames per sec. = 40ms
• 625 lines, aspect ratio 4:3
• Quadrature amplitude modulation similar to NTSC
• Bandwidth: 5.5 MHz
• Phase of the R-Y (V) signal is reversed by 180 degrees from line to line, to reduce
color errors that occur from amplitude and phase distortion during transmission.
• The chrominance signal C for PAL transmission can be represented as:
U
V
sin (ωc t ) ±
cos (ωc t )
2.03
1.14
= 0.493 ( B − Y ) ⋅ sin (ωc t ) ± 0.877 ⋅ ( R − Y ) ⋅ cos (ωc t )
C=
I sidebands
Q sidebands
-1
0
1
2
3
4
U = B -Y
V = R -Y
5
Page 13
Page 14
Television Standards
Television Standards
System
Total
Lines
Visible
Vertical Horizontal
Video
Aspect
Lines Resolution Resolution Bandwidth Ratio
Optimal
Viewing
Distance
NTSC -i
525
484
242
330
4.2 MHz
4:3
7m
NTSC -p
525
484
340
330
4.2 MHz
4:3
5m
PAL -i
625
575
290
425
5.5 MHz
4:3
6m
PAL -p
625
575
400
425
5.5 MHz
4:3
4.3 m
-i interlaced, -p progressive (non-interlaced)
“More modern” Television Standards:
• SDTV (Standard Definition TV): low resolution, aspect ratio not specified
• EDTV (Enhanced Definition TV): minimum of 480 lines, aspect ratio not specified
• HDTV (High Definition TV): minimum of 720 lines, aspect ratio of 16:9
Page 15
Page 16
Enhanced Definition TV (EDTV)
High Definition Systems (HDTV)
EDTV systems are conventional systems which offer improved vertical and/or horizontal
resolution by some “tricks”
• Comb filters improve horizontal resolution by more than 30% according to literature
• Separate black and white from color information to eliminate rainbow effects while
extending resolution
• Progressive (non-interlaced) scanning improves vertical resolution
• Insertion of “blank lines” in between “active lines”, which are filled with information
from: above line, below line, same line in previous picture
Other EDTV developments is: IDTV (Improved Definition Television)
• Intermediate level between NTSC and HDTV (High-Definition Television) in the U.S.
• Improve NTSC image by using digital memory to double scanning lines from 525 to
1050.
• One 1050-line image is displayed in 1/60 sec (60 frames/sec).
• Digital separation of chrominance and luminance signals prevents cross-interference
Page 17
HDTV is characterized by:
• Higher Resolution, approx. twice as many horizontal and vertical pixels as
conventional systems (1024×768, up to 1920×1080)
• 24-bit pixels
• Bandwidth 5 - 8 times larger than for NTSC/PAL
• Aspect Ratio: 16/9 = 1.777
• Preferred Viewing Distance: between 2.4 and 3.3 meters
Digital Coding is essential in the design and implementation of HDTV:
• Composite Coding (sampling of the composite analog video signal i.e. all signal
components are converted together into a digital form) is the straightforward and
easiest alternative, but:
cross-talk between luminance and chrominance in the composite signal
composite coding depends on the television standard
sampling frequency cannot be adopted to the bandwidth requirements of
components
sampling frequency is not coupled with color carrier frequency
Page 18
Worldwide, 3 different HDTV systems are being developed:
Alternative: Component Coding (separate digitization of various image components):
• the more important luminance signal is sampled with 13.5 MHz,
• the chrominance signals (R-Y, B-Y) are sampled with 6.75 MHz.
Global bandwidth: 13 Mhz + 6.75 MHz > 19 MHz
• Luminance and chrominance signals are quantized uniformly with 8 Bits.
Due to high data rates (1050 lines · 600 pixels/line · 30 frames/sec) bandwidth is approx.
19 MHz and therefore substandards (systems which need a lower data rate) for
transmission have been defined.
United States
• Full-digital solution with 1050 lines (960 visible) and a scan rate of 59.94 Hz
• Compatible with NTSC through IDTV
Europe
• HD-MAC (High Definition Multiplexed Analog Components)
• 1250 lines (1000 visible), scan rate of 50 Hz
• Halving of lines (625 of 1250) and of full-picture motion allows simple conversion to PAL
• HD-MAC receiver uses digital image storage to show full resolution and motion
Japan
• MUSE is a modification of the first NHK (Japan Broadcasting Company) HDTV Standard
• MUSE is a Direct-Broadcast-from-Satellite (DBS) System, where the 20MHz bandwidth is
reduced by compression to the 8.15 MHz available on the satellite channel
• Full detail of the 1125 line image is retained for stationary scenes, with motion the
definition is reduced by approx. 50%
Page 19
Page 20
System
Total
Lines
Television - Transmission (Substandards)
Scan
Scan
Visible
Vertical Horizontal
Total
Rate (Hz) Rate (Hz)
Lines Resolution Resolution Bandwidth
Camera Display
HDTV data rates for transmission
• total picture elements = horizontal resolution · vertical resolution
• USA: 720,000 pixels · 24 Bits/pixel · 60 frames/sec. = 1036.8 MBits/sec!
• Europe: 870,000 pixel · 24 Bits/pixel · 50 frames/sec = 1044 MBits/sec!
• HDTV with 1920×1080 pixel, 24 Bits/pixel, 30 frames/sec: 1.5 GB/sec!
• Reduction of data rate is unavoidable, since required rates do not fit to “standard
capacities” provided by broadband networks (e.g. 155 or 34 MBits/sec)
• Different substandards for data reduction are defined:
HDTV USA
1050
960
675
600
9.0 MHz
59.94 p
59.94 p
HD-MAC
1250
1000
700
700
12.0 MHz
50 p
100 p
MUSE
1125
1080
540
600
30.0 MHz
60 i
60 i
NTSC i
525
484
242
330
6 MHz
59.94 i
59.94 i
Substandards
PAL i
625
575
290
425
8 MHz
50 i
50 i
Substandard 1
Sampling Rates (MHz)
Luminance
Chrominance
Data rate
(Mbits/sec)
11.25 = 5/6 · 13.5 5.625 = 5/6 · 6.75
180
Substandard 2 10.125 = 3/4 · 13.5 3.375 = 1/2 · 6.75
125
Substandard 3
108
9.0 = 2/3 · 13.5
2.25 = 1/3 · 6.75
-i interlaced, -p progressive (non-interlaced)
Page 21
Page 22
Television: Transmission
Compression Techniques
Further reduction of data rates is required for picture transmission
Sampling gaps are left out (only visible areas are coded):
• Luminance has 648 sample values per line, but only 540 of them are visible.
• Chrominance has 216 sample values per line, but only 180 are visible.
• 575 visible lines: (540 + 180 + 180) samples/line · 575 lines/frame = 517,500
samples/frame
• 517,500 samples/frame · 8 Bits/sample · 25 frames/sec = 103.5 MBits/sec
Still very high data rate:
• Video should have about 1.5 Mbits/sec to fit within CD technology
• Audio should have about 64 – 192 kbit/channel
→ need for further compression techniques, e.g.
• MPEG for video and audio
Reduction of vertical chrominance resolution:
• Only the chrominance signals of each second line are transmitted.
• 575 visible lines: (540 + 90 + 90) samples/line · 575 lines/frame = 414,000
samples/frame
Different source coding:
• Using an intra-frame working ADPCM with 3 instead of 8 Bits/sample
Page 23
Page 24
Classification of Applications
Dialogue and Retrieval Mode
Dialogue Mode Applications
• Interaction among human users via multimedia information
• Requirements for compression and decompression:
End-to-end delay lower than 150 ms
End-to-end delay of 50 ms for face-to-face dialogue applications
Requirements for both dialogue and retrieval mode:
Retrieval Mode Applications
• A human user retrieves information from a multimedia database
• Requirements:
Fast forward and backward data retrieval with simultaneous display
Fast search for information in multimedia databases
Random access to single images and audio frames with an access time less than
0.5 second
Decompression should be possible without a link to other data units in order to
allow random access and editing
• Synchronization of audio and video data
Lip synchronization
• Supporting scalable video in different systems
Format must be independent of frame size and video frame rate
• Support of various audio and video data rates
This will lead to different quality, thus data rates should be adjustable
Page 25
• Economy (i.e. reasonably cheap solutions):
Software realization: cheap, but low „speed“ and low quality
Hardware realization (VLSI chips): more expensive (at first), but high quality
• Compatibility
It should be possible to generate multimedia data on one system and to reproduce
the data on another system
Programs available on CD can be read on different systems
Page 26
Encoding Mechanisms for Video
MPEG
Basic encoding techniques like used in JPEG
Differential encoding for video
1. For newscast, video telephone applications, and soap operas
• Background often remains the same for a long time
• Very small difference between subsequent images
→ Run-length coding can be used
2. Motion compensation
• Blocks of NxN pixels are compared in subsequent images
• Useful for objects moving in one direction, e.g. from left to right
Growing need for a common format for representing compressed video and audio for data
rates up to 1.5 Mbit/sec (typical rate of CD-ROM transfer: 1.2 Mbit/sec)
→ Moving Pictures Expert Group (MPEG)
• Generic Approach → can be used widely
• Maximum data rate for video in MPEG is very high: 1,856,000 bit/sec
• Data rates for audio between 32 and 448 Kbit/sec
→ Video and audio compression of acceptable quality
• Suitable for symmetric as well as asymmetric compression:
Asymmetric compression: more effort for coding (once) than for decoding (often)
Symmetric compression: equal effort for compression and decompression, restricted
end-to-end delay (e.g. interactive dialogue applications)
Other basic compression techniques:
• Color Look-Up Tables (CLUT) for data reduction in video streams
Often used in distributed multimedia systems
• Silence suppression for audio
Data are only encoded if the volume level exceeds a certain threshold
Can be interpreted as a special case of run-length encoding
Page 27
Page 28
MPEG Today
MPEG
• Coding for VCD (Video CD) quality
• Data rate of 0.9 - 2 Mbit/sec
MPEG-2
• Super-set of MPEG-1: rates up to 8 Mbit/sec
• Can do HDTV
MPEG-4
• Coding of objects, not frames
• Lower bandwidth (Multimedia for the web and mobility)
MPEG-7
• Allows multimedia content description (ease of searching)
MPEG-21
• Content identification and management
MP3
• For coding audio only
• MPEG Layer-3
The MPEG Family
MPEG-1
(VCD)
MPEG-2
(DVD, HDTV)
XML
Content Presentation
Content Description
MPEG-21: Multimedia Framework
Content Identification and Usage
Page 29
Page 30
First: MPEG
Compression Steps
Exact image format of MPEG is defined in the image preparation phase (which is similar
to JPEG)
• Video is seen as a sequence of images (video frames)
• Each image consists of 3 components (YUV format, called Y, CB, CR)
Luminance component has twice as many samples in horizontal and vertical axes
than other two components (chrominance): 4:2:2 scheme
Resolution of luminance component maximal 768×576 pixels (8 bit per pixel)
• Data stream includes further information, e. g.:
Aspect ratio of a pixel (14 different image aspect ratios per pixel provided), e.g.
1:1 (square pixel), 16:9 (European and US HDTV) etc.
Image refresh frequency (number of images per second; 8 frequencies between
23.976 Hz and 60 Hz defined, among them the European standard of 25 Hz)
• The encoding basically is as in JPEG: DCT, quantization, entropy encoding, but with
considering several “images”
MPEG-7 :
Multimedia Content Description
Interface
Dublin Core
data model
MPEG-7
data model
RDF
SMPTE
MPEG-4:
Coding of Audio-visual Objects
Page 31
• Subsampling of chrominance information - human visual system less sensitive to
chrominance than to luminance information
→ only 1 chrominance pixel for each 2×2 neighborhood of luminance pixels
• Image preparation – form blocks of 8×8 pixels and group them to macro blocks
• Frequency transformation - discrete cosine transform converts an 8×8 block of pixel
values to an 8×8 matrix of horizontal and vertical spatial frequency coefficients
→ most of the energy concentrated in low frequency coefficients, esp. in DC
coefficient
• Quantization – for suppressing high frequencies
• Variable length coding - assigning codewords to values to be encoded
Additionally to techniques like in JPEG, the following is used before performing DCT:
• Predictive coding – code a frame as a prediction based on the previous / the
following frame
• Motion compensation - prediction of the values of a pixel block by relocating the
block from a known picture
Page 32
Macro Blocks
Motion Compensation Prediction
Still images - temporal prediction yields considerable compression ratio
Moving images - non-translational moving patterns (e.g. rotations, waves, ...) require
storage of large amount of information due to irregular motion patterns
• Therefore: predictive coding makes sense only for parts of the image
→ division of each image into areas called macro blocks
• Macro blocks turn out to be suitable for compression based on motion estimation
• Partition of macro blocks into 4 blocks for luminance and one block for each
chrominance component; each block consists of 8×8 pixels. Size of macro block is a
compromise between (storage) cost for prediction and resulting compression
0
1
4
5
2
3
CB
CR
• Motion Compensation Prediction is made between successive frames
• Idea: coding a frame as prediction to the previous frame is useless for fast changing
sequences
• Thus: consider moving objects by search for the new position of a macro block from
the previous frame
• Code the prediction to the macro block of the previous frame together with the
motion vector
luminance Y chrominance
Page 33
Page 34
Alternative search procedure:
• Store motion vector for a macro block of the previous image
• Move the search window for the next search by the old vector
• Start searching the window by a coarse pattern
• Refine the search for the “best” patterns found
How to find the best fitting position in the new image?
• Search only a given window around the old position, not the whole image!
• Consider only the average of all pixel values, not detailed values!
• Set a fault threshold: stop
searching if a found macro
block fits „good enough“
• Search Pattern can be a spiral:
this procedure in general
does not give the best result,
but is fast
Left: lighter grey blocks are the searched blocks on coarse scale. Best matching fields are examined more detailed
by also searching the 8 neighbored blocks. Right: when we would refine all blocks, such a picture would result; the
lighter a block, the better the matching; the “lightest” block is taken for motion prediction. [Note: in reality not all
blocks are refined, only the “lightest” ones; by this maybe the best block isn’t found.]
Page 35
Page 36
Types of Frames
Types of Images
4 different types of frame coding are used in MPEG for efficient coding with fast random
access
• I-frames: Intra-coded frames (moderate compression but fast random access)
• P-frames: Predictive-coded frames (with motion compensation, prediction to the previous
I- or P-frame)
• B-frames: Bi-directionally predictive-coded frames (referencing to the previous and the
following I- or P-frames)
• D-frames: DC frames (Limited use: encodes only DC components of intraframe coding)
I
B
B
...
Prediction
Time
P
B
B
...
P
Bi-directional
prediction
I
B
B-frame can be decoded only after
the subsequent P-frame has been decoded
.. .
B
I
B
I:P:B=1:2:6
P
Page 37
Page 38
I-Frames
P-Frames and B-Frames
• I-Frames are self-contained, i.e. represent a full image
→ Coded without reference to other images
• Treated as still images
→ Use of JPEG, but compression in real-time
• I-Frames may serve as points of random access in MPEG streams
• DCT on 8×8 blocks within macro blocks + DPCM coding of DC coefficients
• Typically, an I-frame may occur 3 times per second to give reasonably fast random
access
• Typical data allocation:
I-frames allocate up to 3 times as many bits as P-frames
P-frames allocate 2 – 5 times as many bits as B-frames
→ In case of little motion in the video, a greater proportion of the bits should be
assigned to I-frames, since P- and B-frames only need very low number of bits
B
Page 39
P-Frame
• Requires information about previous I-frame and/or previous P-frame
• Motion estimation is done for the macro blocks of the coded frame:
The motion vector (difference between locations of the macro blocks) is specified
The (typically small) difference in content of the two macro blocks is computed
and DCT/entropy encoded
• That means: P-frames consist of I-frame macro blocks (if no prediction is possible) and
predictive macro blocks
I ... P ... P ... P
B-Frame
• Requires information about previous and following I- and/or P-frame
→ B-frame = difference of prediction of past image and following P-/I-frame
• Quantization and entropy encoding of the macro blocks will be very efficient on such
double-predicted frames
• Highest compression rate will be obtained
I B B P B B P
• Decoding only is possible after receiving the
following I- resp. P-frame
Page 40
Group of Pictures (GOP)
D-Frames
MPEG gives no instruction in which order to code the different frame types, but can be
specified by a user parameter.
Intraframe encoded, but only lowest frequencies of an image (DC coefficients) are
encoded
• Used (only) for fast forward or fast rewind mode
• Could also be realized by suitable order of I-frames
• Slow rewind playback requires a lot of storage capacity:
Thus all images in a "group of pictures" (GOP) are decoded in the forward
mode and stored, after that rewind playback is possible
But: each stream of MPEG frames shows a fixed pattern, the Group of Pictures:
• It typically starts with an I-frame
• It typically ends with frame right before the next I-frame
• “Open” GOP ends in B-frame, “Closed” in P-frame
Very flexible: GOPs could be independently decoded, but they also could
reference to the next GOP
• Typical patterns:
IBBPBBPBBI
IBBPBBPBBPBBI
• Why not have all P and B frames?
→ It is clear that with the loss of one frame, a new
full-image (I-frame) is needed to allow the
receiver to recover from the information loss
Typical compression performance:
Type Size Compression
--------------------------------------I
18 KB
7:1
P
6 KB
20:1
B
2.5 KB
50:1
Avg 4.8 KB
27:1
---------------------------------------
Page 41
Layers of MPEG Data Streams
Regulator
+
-
DCT
Quantizer
(Q)
VLC
Encoder
IDCT
+
Motion
Compensation
Frame
Memory
Motion vectors
Input
Predictive frame
Q-1
Pre
processing
Page 42
Coding Process
Frame
Memory
Buffer
Output
Motion
Estimation
Page 43
1. Sequence Layer
Sequence header + one or more groups of picture
Header contains parameters like picture size, data rate, aspect ratio, DCT
quantization matrices
2. Group of Pictures Layer
Contains at least one I-frame for random access
Additionally timing info and user data
3. Picture Layer
I-, P-, B- or D-frame, with synchronization info, resolution, range of motion vectors
4. Slice Layer
Subdivision of a picture providing certain immunity to data corruption
5. Macro block Layer
Basic unit for motion compensation and quantizer scale changes
6. Block Layer
Basic coding unit (8×8 pixels): DCT applied at this block level
Page 44
Hirarchical Structure of Data
Sequence
Layer
Start
code
GOP
Layer
Start
code
Sequence
parameters
Picture
Layer
Start
code
Picture
flags
Slice
Layer
Start
code
Macroblock
Layer
Macro block
address
Picture
dimensions
Aspect
ratio
Audio Encoding
Picture
rate
Buffer
size
Quant.
matrices
GOP 1
GOP 2
…..
GOP 3
GOP g
Audio Encoding within MPEG
• Picture encoding principles can be modified for use in audio as well
(optional) quantization
weighting matrix
Slice 1
Slice
address
Mode
Slice 2
Slice 3
Quantization
value
Macro 1
(optional)
quantization value
Profile
Pic 1
Pic 2
Pic 3
…..
Pic p
• Transformation into frequency domain by Fast Fourier Transform (FFT) similar to the
technique which is used for video
…..
Macro 2
Motion
vectors
Block
Layer
Slice s
…..
Macro 3
Luminance blocks
1
2
3
4
Y
8×8 pixels
CB
• Audio spectrum is split into 32 non-interleaved subbands (for each subband, the audio
amplitude is calculated); noise level determination by psychoacoustical model
“Psychoacustical model” means to consider the human brain, e.g. recognizing only
a single tone if two similar tones are played very close together, or not perceiving
the quieter tone if two tones with highly different loudness are played
simultaneously
Each subband has its own quantization granularity
Higher noise level: rough quantization (and vice versa)
Macro m
Chrom.
blocks
• “Single channel”, “two independent channels” or “Stereo” are possible (in the case of
“stereo”: redundancy between the two signals is used for higher compression ratio)
CR
Page 45
Page 46
Audio Encoding
Audio Encoding
Uncompressed
Audio Data
Psychoacoustical
Model
• Sampling rates of 32, 44.1 and 48 kHz
each subband
corresponds to a frequency
transform coefficient
3 different layers of encoder and decoder complexity are used
• Quantized spectral portions of layers 1 and 2 are PCM encoded
Filter Banks
• Quantized spectral portions of layer 3 are Huffman encoded –
MPEG layer 3 is known as “mp3”
32 Sub-bands
• 14 fixed bit rates for encoded audio data stream on each layer
minimal rate: 32 Kbit/sec for each layer
maximal rate: 448 Kbit/sec (layer 1), 384 Kbit/sec (layer 2),
320 Kbit/sec (layer3)
Control
Quantization
Multiplexer
Huffman Coder
• Variable bit rate support is possible only on layer 3
Compressed
Audio Data
Page 47
Page 48
Audio Data Stream
MPEG-2
General Background on the MPEG Audio Data Stream
• MPEG specifies syntax for interleaved audio and video streams – e.g. synchronization
information
• Audio data stream consists of frames, divided into audio access units composed of slots
• Slots consist of 4 bytes (layer 1, lowest complexity) or 1 byte (other layers)
• A frame always consists of a fixed number of samples
• Audio access unit: smallest possible audio sequence of compressed data to be decoded
independently of all other data
• Audio access units of one frame lead to playing time between 8 msec (48 Hz) and 12
msec (32 Hz)
Page 49
Main Level
720 pixels/line
576 lines
Main Profile
SNR Scalable
Profile
Spatially Scalable
Profile
High Profile
No B-frames
4:2:0
Not Scalable
4:2:0
Not Scalable
4:2:0
SNR Scalable
4:2:0
SNR or Spatial
Scalable
4:2:0 or 4:2:2
SNR or Spatial
Scalable
High-1440
Level
1440 pixels/line
1152 lines
≤ 4 Mbits/s
≤ 4 Mbits/s
≤ 15 Mbits/s
≤ 15 Mbits/s
≤ 60 Mbits/s
Page 50
Scalable Profiles
Simple Profile
≤ 15 Mbits/s
MPEG-2 - Profiles and Levels
Low Level
352 pixels/line
288 lines
Why another MPEG standard?
• Higher data rate as MPEG-1, but compatible extension of MPEG-1
• Target rate of 40 Mbit/sec
• Higher resolution as needed in HDTV
• Support a larger number of applications - definition of MPEG-2 in terms of extensible
profiles and levels for each important application class, e.g.
Main Profile: for digital video transmission (2 to 80 Mbit/sec) over cable, satellite
and other broadcast channels, digital storage, HDTV etc.
High Profile: HDTV
Scalable Profile: compatible with terrestrial TV/HDTV, packet-network video
systems, backward compatibility with MPEG-1 and other standards, e.g. H.261
• The encoding standard should be a toolkit rather than a flat procedure
Interlaced and non-interlaced frame
Different color subsampling modes e.g., 4:2:2, 4:2:0
Flexible quantization schemes – can be changed at picture level
Scalable bit-streams
≤ 20 Mbits/s
≤ 60 Mbits/s
A signal is composed of several streams (layers):
• Base (Lower) layer is a fully decodable image
• Enhancement (Upper) layer gives additional information
Better resolution
Higher frame rate
Better quality
→ Corresponds to JPEG hierarchical mode
≤ 80 Mbits/s
High Level
1920 pixels/line
≤ 80 Mbits/s
≤ 100 Mbits/s
1152 lines
Page 51
Page 52
Scalable Profiles
MPEG-2: Effects of Interlacing
Scaling can be done on different parameters:
• Spatial scaling:
Frames are given in different resolutions
Base layer frames are used in any case
Upper layer frames are stored as prediction from base layer frames
→ Single data stream can include different image formats (CIF, CCIR 601, HDTV ...)
• SNR scaling:
The error on the lower layer given by quantization is encoded and sent on the upper
layer
Page 53
Page 54
MPEG-2 Audio Standard
MPEG-2 Streams
• Low bit rate coding of multi-channel audio
• Up to five full bandwidth channels (left, right, center, 2 surround) plus additional low
frequency enhancement channel and/or up to 7 commentary/multilingual channels
• Extension of MPEG-1 stereo and mono coding to half sampling rates (16 – 24 kHz)
→ improving quality for bit rates at 64 kbits/sec per channel
MPEG-2 Audio Multi-channel Coding Standard:
• Backward compatibility with existing MPEG-1 Audio Standard
• Organizes formal testing of proposed MPEG-2 multi-channel audio codecs and nonbackward-compatible codecs (rates 256 – 448 Kbits/sec)
• Prediction Modes and Motion Compensation
Frame prediction: current frame predicted from previous frame
Dual prime motion compensation:
• Top field of current frame is predicted from two motion vectors coming
from the top and bottom field of reference frame
• Bottom field of previous frame and top field of current frame predicts
the bottom field of current frame
16 × 8 motion compensation mode
• A macroblock may have two of them
• A B-image macro block may have four
Page 55
• MPEG-2 system defines how to combine audio, video and other data into single or
multiple streams suitable for storage and transmission
→ syntactical and semantically rules for synchronizing the decoding and presentation of
video and audio information and avoiding buffer over- or underflow
• Streams include timestamps for decoding, presentation and delivery
• Basic multiplexing step: each stream is added system-level information and packetized
→ Packetized Elementary Stream (PES)
• PESs combined to Program or Transport Stream (supporting large number of
applications):
Program Stream: similar to MPEG-1 stream
→ error-free environment, variable packet lengths, constant end-to-end delay
Transport Stream: combines PESs and independent time bases into single stream
→ use in lossy or noisy media, packet length 188 bytes including header
→ suited for digital TV and videophony over fiber, satellite, cable, ISDN, ATM
Conversion between Program and Transport Stream possible (and sometimes
reasonable)
Page 56
MPEG-4
Core Idea of MPEG-4
Object based Representation
Representation of the video scene is understood as a composition of video objects with
respect to their spatial and temporal relationship (same with audio!)
→ Individual objects in a scene can be coded with different parameters, at different
quality levels and with different coding algorithms
• Originally a MPEG-3 standard for HDTV was planned
→ But MPEG-2 scaling was sufficient; development of MPEG-3 was cancelled
• MPEG-4 initiative started in September 1993
→ Very low bit rate coding of audio-visual programs
• February 1997: description of requirements for the MPEG-4 standard approved
• Idea: development of fundamentally new algorithmic techniques
→ New sorts of interactivity (dynamic instead of static objects)
→ Integration of natural and synthetic audio and video material
→ Simultaneous use of material coming from different sources
→ Model-based image coding of human interaction with multimedia environments
→ Low-bit rate speech coding e.g. for use in GSM
• Basic elements:
Coding tools for audio-visual objects: efficient compression, support of object
based interactivity, scalability and error robustness
Formal methods for syntactic description of coded audio-visual objects
Page 57
Page 58
MPEG-4: Objects and Scenes
Objects of a Scene
Primitive Objects
A/V object
• A video object within a scene
• The background
• An instrument or voice
• Coded independently
Video Objects (VO)
A/V scene
• Mixture of objects
• Individual bitstreams are multiplexed and transmitted
• One or more channels
• Each channel may have its own quality of service
• Synchronization information
natural
synthetically
Audio Objects (AO)
natural
synthetically
Scene Graph
• Graph without cycles
• Embeds objects in a coordinate system (including synchronization information)
• MPEG-4 provides a language for describing objects (oriented at VRML – Virtual
Reality Modeling Language)
• Usable for as well video as animated objects
Page 59
Page 60
MPEG-4 Stream Composition and Delivery
An Example MPEG-4 Scene
compress
compress
compress
compress
Scene Descr.
compress
D
e
l
I
v
e
r
y
D
e
l
I
v
e
r
y
I
n
t
e
r
f
a
c
e
I
n
t
e
r
f
a
c
e
(DAI)
(DAI)
decompress
decompress
decompress
decompress
C
o
m
p
o
s
i
t
o
r
decompress
Object Descr.
Page 61
MPEG-7
• Objectives
A flexible, extensible, and multi-level standard framework for describing (not
coding!) multimedia and synchronize between content and descriptions
Enable fast and efficient content searching, filtering and identification
Define low-level features, structure, semantic, models, collections, creation, etc.
Goal: To search, identify, filter and browse audiovisual content
ObjectDescriptor
{
OD_ID_1
List of
{
Elementary
Stream
Descriptors
}
}
• Description of contents
Descriptors
• Describe basic characteristics of audiovisual content
• Examples: Shape, Color, Texture, …
Description Schemes
• Describe combinations of descriptors
• Example: Spoken Content
ObjectDescriptor
{
OD_ID_2
List of
{
Elementary.....
Page 62
Linking Streams into the Scene
ObjectDescriptorID
Page 63
Page 64
Simple Description
MPEG-21
• MPEG 21
Solution for access to and management of digital media
E.g. offering, searching, buying, Digital Rights Management, …
Page 65
Digital Video Broadcasting
DVB Container
DVB transmits MPEG-2 container
• High flexibility for the transmission of digital data
• No restrictions regarding the type of information
• DVB Service Information specifies the content of a container
NIT (Network Information Table): lists the services of a provider, contains additional
information for set-top boxes
SDT (Service Description Table): list of names and parameters for each service
within a MPEG multiplex channel
EIT (Event Information Table): status information about the current transmission,
additional information for set-top boxes
TDT (Time and Date Table): Update information for set-top boxes
• 1991 foundation of the ELG (European Launching Group)
Goal: development of digital television in Europe
• 1993 renaming into DVB (Digital Video Broadcasting)
goal: introduction of digital television based on
satellite transmission (DVB-S)
cable network technology (DVB-C)
later also terrestrial transmission (DVB-T)
DVB-S Satellites
Multipoint
Distribution
System
DVB-C
Integrated
Receiver-Decoder
Terrestrial
Receiver
DVB-T
DVB
Digital Video
Broadcasting
SDTV
EDTV
HDTV
MPEG-2/DVB
container
Multimedia PC
Cable
Page 66
MPEG-2/DVB
container
HDTV
MPEG-2/DVB
container
MPEG-2/DVB
container
SDTV
EDTV
B-ISDN, ADSL,etc. DVD, etc.
Page 67
single channel
multiple channels
Chapter
2.3: Video
and Animationenhanced definition
high definition
television
multiple channels
multimedia
standard definition
data broadcasting
Page 68
DVB Worldwide
Computer-based Animation
To animate = “to bring to life”
Animation covers changes in:
• time-varying positions (motion dynamics)
• shape, color, transparency, structure and texture of an object (update dynamics) as
well as lightning, camera position, camera orientation and focus
Basic Concepts of animation are
• Input Process
Key frames, where animated objects are at extreme or characteristic positions
must be digitized from drawings
Often a post-processing by a computer is required
• Composition stage
• Inbetween process
• Changing colors
Page 69
Page 70
Composition Stage
Inbetween Process
• Foreground and background figures are combined to generate an individual frame
• Placing of several low-resolution frames of an animation in an array leads to a trail film
(pencil test), by the use of the pan-zoom feature (This feature is available for some
frame buffers)
• The frame buffer can take a part of an image (pan) and enlarge it to full screen (zoom)
• Continuity is achieved by repeating the pan-zoom process fast enough
• Composition of intermediate frames between key frames
• Performed by linear interpolation (lerping) between start- and end-positions
• To achieve more realistic results, cubic spline-interpolation can be used
Rather unrealistic
motion (in most
cases)
Interpolated
frames
Key frames
Page 71
Page 72
Inbetween Process
Inbetween Process
Calculation of successive cubic splines:
Spline 1
x0
more realistic
motion achieved
by two cubic
splines
Spline 2
x1
f(x0)
f(x1)
x0
x1
S2(x)
f(x2)
x2
f(x3)
x3
x4
• si(x) are polynomials of degree 3
• Let s0(x) be given
• Then s1(x) = a3·x3 + a2·x2 + a1·x + a0 is constructed as follows:
1. s is twice continuous differentiable
2. for i = 0, ..., n it is a polynomial of degree 3
s1 ( x1 ) = a3 ⋅ x13 + a2 ⋅ x12 + a1 ⋅ x1 + a0 =ˆ f ( x1 )
s1 ( x2 ) = a3 ⋅ x23 + a2 ⋅ x22 + a1 ⋅ x2 + a0 =ˆ f ( x2 )
s ′ x = 3a ⋅ x 2 + 2a ⋅ x + a =ˆ s ′ x
This line is smooth, because the polynomials have equal primary and secondary
derivatives at the points X0 , X1 ,... ,Xn+1
( 1) 3 1
2
1
1
″
″
s1 ( x1 ) = 6a3 ⋅ x1 + 2a2 =ˆ s0 ( x1 )
1
Page 73
0
( 1)
4 equations for a3, a2, a1, a0
Page 74
Changing Colors
Animation Languages
Two techniques are possible
1. CLUT animation
Changing of the Color Look Up Table (CLUT) of the frame buffer. This changes the
colors of the image.
2. New color information for each frame
Frame buffer: 640 x 512 pixel · 8 Bits/pixel · 30 frames per sec. = 78.6 MBits/sec
data rate for complete update
The first technique is much faster than the second, since changing the CLUT requires
the transmission of only 0.3 - 3 Kbytes (here 2. is more than 300 times faster than 1.)
S1(x)
x2
A function s is called cubic interpolating spline to the points
a = X0 < X1 < ... < Xn+1 = b, if
S3(x)
S0(x)
Page 75
Categories for Animation languages
• Linear-list Notations
Events are described by starting and ending frame number and an action (event)
17, 31, C, ROTATE “HOUSE”, 1, 45 means:
Between frames 17 and 31 rotate the object HOUSE around axis 1 by 45 degrees,
determining the amount of rotation at each frame from table C
• General-purpose Languages
Embed animation capability within programming languages
Values of variables as parameters to the routines that perform animation
e.g. ASAS, which is built on top of LISP:
(grasp my-cube): cube becomes current object
(cw 0.05): spin it clockwise, by a small amount
• Graphical Languages
Describe animation in a more visual way than textual languages
Express, edit and comprehend the changes in an animation
Explicit descriptions of actions are replaced by a picture of the action
Page 76
Controlling of Animation
Techniques for controlling animations (independent of the language which describes the
animation):
• Full Explicit Control
– Complete way of control, because all aspects are defined:
Simple changes (scaling, translation, rotation) are specified or key frames and
interpolation methods (either explicit or by direct manipulations by mouse, joystick,
data glove) are provided
• Procedural Control
– Communication between objects to determine properties
– Physically-based systems: position of one object may influence motion of another
(ball cannot pass a wall)
– Actor-based systems: actors pass their position to other actors to affect their
behavior (actor A stays behind actor B)
Page 77
Tracking Live Action
• Trajectories of animated objects are generated by tracking live action
• Rotoscoping: Film with real actors as template, designers draw over the film, change
background and replace human actors with animated counterparts
• Attach indicators to key points of actor’s body. Tracking of indicator
positions provides key points in the animation model
• Another example: Data glove measures
position and orientation of the hand
flexion and extension of fingers and fingerparts
From these information we can calculate actions, e.g. movements
Page 78
Display of Animation
Kinematics:
• Description using the position and velocity of objects.
• E.g. at time t = 0 the CUBE is at the origin. It moves with the constant acceleration of
0.5 m/s2 for 2 sec. in the direction of (1,1,4)
(0, 0, 0) → (1, 1, 4)
2 seconds
b = 0.5m/s2
Dynamics:
• Takes into consideration the physical
laws that define the kinematics
• E.g. at time t = 0 the CUBE is in position
(0 meters, 100 meters, 0 meters) and
has a mass of 5 kg. The force of gravity
acts on the cube (Result in this case: the
ball will fall down)
Constraint-based Systems
• “Natural” way of moving from A to B is via a straight line, i.e. linearly. However, very
often the motion is more complicated
• Movement of objects is determined by other objects, they are in contact with
• Compound motion may not be linear and is modeled by constraints (ball follows a
pathway)
kinematical description of
the motion of a cube
For the display of animations with raster systems the animated objects have to be scanconverted to their pixmap in the frame buffer. This procedure has to be done at least 10
(better: 20) times per second in order to give a reasonably smooth effect.
Problem
Frame rate of 20 pictures/sec. requires manipulation, scan-conversion and display of an
object in only 50 msec. Scan conversion should only use a small fraction of these 50
msec since other operations (erasing, redrawing, ... etc) have to be done, too
Solution
Double-buffering: frame buffer is divided into two
images, each with half of the bits of the overall frame
buffer (“pipeline”). While the operation (like rotating) and
scan-conversion is processed for the second half of the
pixmap, the first half is displayed and vice versa.
y
z
Preparation
Display
Pic 1
Pic 0
Pic 2
Pic 1
Time
Pic 3
Pic 2
(0, 100, 0)
x
Page 79
Page 80
Transmission of Animation
Conclusions
Symbolic Representation
• Graphical descriptions (circle) of an animated object (ball) + operations (roll)
• Animation is displayed at the receiver by scan-conversion of objects to pixmap
• Transmission rate depends on (transmission rate is context dependent):
size of the symbolic representation structure, size of operation structure
number of animated objects and of commands
NTSC and PAL as television standards
• Widespread, but only belong to Enhanced Definition TV systems
• Needed for better quality: High Definition TV (HDTV)
• Problem: compression is needed for HDTV systems
Pixmap Representation
• Longer times for data transmission than with symbolic representation, because of the
large data size of pixmap
• Shorter display times, because no scan-conversion is necessary at receiver side
• Transmission rate = size of pixmap · frame rate (fixed transmission rate)
MPEG as standard for video and audio compression
• High-quality video/audio compression based on JPEG-techniques
• Additionally: Motion prediction between video frames
• Newer versions (MPEG-4) achieve further compression by considering objects
Video Transmission
• DVB as one standard for broadcasting SDTV, EDTV, HDTV or any MPEG content to
the customer
Animation
• Technique for artificially creating “videos”
Page 81
Page 82

Video Signal Representation Visual Representation

Transcrição

Documentos relacionados

Mobile Communications

Mobile Communications

Chapter 2 - Informatik 4

Limitations of Object-Based Middleware Components

Transfer and Control Protocols Standards of ITU H.261

Formulaire d`inscription Next Generation 2014

A further step towards informal one-party rule. Die Autoren The authors

User Guide

Agile Software Engineering Practice to Improve Project Success

PRAIA DA ROCHA T1

Physiological Insights for Players of Wind

Musik für die Insel

`Woeful taste`: German-language theatre in need of censorship

About this website - St. Elisabeth Hospital, Wittlich

Data sheet