Rob Brown (ice hockey) and Audio time stretching and pitch scaling: Difference between pages

From Wikipedia, the free encyclopedia
(Difference between pages)
Content deleted Content added
No edit summary
 
Otey (talk | contribs)
Added Sinusoidal Modeling section. Moved Time Domain section to be before Speed Reading section. Added brief description of how pitch shifting may be accomplished.
 
Line 1: Line 1:
'''Time stretching''' is the process of changing the speed or duration of an [[audio signal processing|audio signal]] without affecting its [[pitch (music)|pitch]].
{{Infobox Ice Hockey Player
'''Pitch scaling''' or '''pitch shifting''' is the reverse: the process of changing the pitch without affecting the speed. There are also more advanced methods used to change speed, pitch, or both at once, as a function of time.
| image =
| image_size =
| position = [[Winger (ice hockey)|Right Wing]]
| shoots = Left
| nickname =
| height_ft = 5
| height_in = 11
| weight_lb = 185
| played_for = '''[[American Hockey League|AHL]]'''<br>&nbsp;[[Chicago Wolves]]<br>'''[[International Hockey League (1945–2001)|IHL]]'''<br>&nbsp;Chicago Wolves<br>&nbsp;[[Kalamazoo Wings (1974–2000)|Kalamazoo Wings]]<br>&nbsp;[[Indianapolis Ice]]<br>&nbsp;[[Phoenix Roadrunners (IHL)|Phoenix Roadrunners]]<br>'''[[National Hockey League|NHL]]'''<br>&nbsp;[[Chicago Blackhawks]]<br>&nbsp;[[Dallas Stars]]<br>&nbsp;[[Los Angeles Kings]]<br>&nbsp;[[Hartford Whalers]]<br>&nbsp;[[Pittsburgh Penguins]]
| nationality = CAN
| birth_date = {{birth date and age|1968|4|10}}
| birth_place = [[Kingston, Ontario|Kingston]], [[Ontario|ON]], [[Canada|CAN]]
| draft = 67th overall
| draft_year = 1986
| draft_team = [[Pittsburgh Penguins]]
| career_start = [[1987–88 NHL season|1987]]
| career_end = [[2002–03 AHL season|2003]]
}}
'''Rob Brown''' (born on [[April 10]], 1968) is a [[retired]] [[professional]] [[ice hockey]] [[Winger (ice hockey)|right winger]] who played in the [[National Hockey League]] for eleven seasons between [[1987–88 NHL season|1987]] and [[1999–00 NHL season|2000]].


These processes are used, for instance, to match the pitches and tempos of two pre-recorded clips for mixing when the clips cannot be reperformed or resampled. (A drum track could be moderately resampled for tempo without adverse effects, but a pitched track could not). They are also used to create effects such as increasing the range of an instrument (like pitch shifting a guitar down an octave).
Brown was drafted 67th overall by the [[Pittsburgh Penguins]] in the [[1986 NHL Entry Draft]]. His best statistical NHL season was the [[1988–89 NHL season|1988–89 season]], when he played on a line with [[Mario Lemieux]]; he set career highs with 49 goals, 66 assists, 115 points, 24 power play goals, 6 game-winning goals, and a +27 plus/minus rating.Currently Brown serves as Color Commentator for the Edmonton Oilers Pay-Per-View.


==Resampling==
Rob Brown plowed Alyssa Milano and Melissa Walker at the same time!!
The simplest way to change the duration or pitch of a [[digital signal|digital]] audio clip is to [[resampling|resample]] it. This is a mathematical operation that effectively rebuilds a continuous waveform from its samples and then samples that waveform again at a different rate. When the new samples are played at the original sampling frequency, the audio clip sounds faster or slower. Unfortunately, the frequencies in the sample are always scaled at the same rate as the speed. In other words, slowing down the recording lowers the pitch, speeding it up raises the pitch, and the two effects cannot be separated. This is analogous to speeding up or slowing down an [[analog signal|analog]] recording, like a [[phonograph record]] or [[Sound recording#Magnetic Recording|tape]], creating [[The Chipmunks#Recording technique|the chipmunk effect]].


== Phase vocoder ==
==Career statistics==
{{main|Phase vocoder}}
{| BORDER="0" CELLPADDING="1" CELLSPACING="0" width="75%" style="text-align:center"
One way of stretching the length of a signal without affecting the pitch is to build a [[phase vocoder]] after Flanagan, Golden, and Portnoff.
|- bgcolor="#e0e0e0"
! colspan="3" bgcolor="#ffffff" | &nbsp;
! rowspan="99" bgcolor="#ffffff" | &nbsp;
! colspan="5" | Regular&nbsp;Season
! rowspan="99" bgcolor="#ffffff" | &nbsp;
! colspan="5" | Playoffs
|- bgcolor="#e0e0e0"
! Season
! Team
! League
! GP
! G
! A
! Pts
! PIM
! GP
! G
! A
! Pts
! PIM
|-
| 1982–83
| St. Albert Sabres
| [[Alberta Midget Hockey League|AMHL]]
| 61
| 137
| 122
| 259
| 200
| --
| --
| --
| --
| --
|- bgcolor="#f0f0f0"
| 1983–84
| [[St. Albert Saints]]
| [[Alberta Junior Hockey League|AJHL]]
| 1
| 0
| 0
| 0
| 0
| --
| --
| --
| --
| --
|-
| [[1983–84 WHL season|1983–84]]
| [[Kamloops Blazers|Kamloops Jr. Oilers]]
| [[Western Hockey League|WHL]]
| 50
| 16
| 42
| 58
| 80
| 15
| 1
| 2
| 3
| 17
|- bgcolor="#f0f0f0"
| [[1984–85 WHL season|1984–85]]
| [[Kamloops Blazers]]
| WHL
| 60
| 29
| 50
| 79
| 95
| 15
| 8
| 8
| 26
| 28
|-
| [[1985–86 WHL season|1985–86]]
| Kamloops Blazers
| WHL
| 69
| 58
| 115
| 173
| 171
| 16
| 18
| 28
| 46
| 14
|- bgcolor="#f0f0f0"
| [[1986–87 WHL season|1986–87]]
| Kamloops Blazers
| WHL
| 63
| 76
| 136
| 212
| 101
| 5
| 6
| 5
| 11
| 6
|-
| [[1987–88 NHL season|1987–88]]
| [[Pittsburgh Penguins]]
| [[National Hockey League|NHL]]
| 51
| 24
| 20
| 44
| 56
| --
| --
| --
| --
| --
|- bgcolor="#f0f0f0"
| [[1988–89 NHL season|1988–89]]
| Pittsburgh Penguins
| NHL
| 68
| 49
| 66
| 115
| 118
| 11
| 5
| 3
| 8
| 22
|-
| [[1989–90 NHL season|1989–90]]
| Pittsburgh Penguins
| NHL
| 80
| 33
| 47
| 80
| 102
| --
| --
| --
| --
| --
|- bgcolor="#f0f0f0"
| [[1990–91 NHL season|1990–91]]
| Pittsburgh Penguins
| NHL
| 25
| 6
| 10
| 16
| 31
| --
| --
| --
| --
| --
|- bgcolor="#f0f0f0"
| 1990–91
| [[Hartford Whalers]]
| NHL
| 44
| 18
| 24
| 42
| 101
| 5
| 1
| 0
| 1
| 7
|-
| [[1991–92 NHL season|1991–92]]
| Hartford Whalers
| NHL
| 42
| 16
| 15
| 31
| 39
| --
| --
| --
| --
| --
|-
| 1991–92
| [[Chicago Blackhawks]]
| NHL
| 25
| 5
| 11
| 16
| 34
| 8
| 2
| 4
| 6
| 4
|- bgcolor="#f0f0f0"
| 1992–93
| [[Indianapolis Ice]]
| [[International Hockey League (1945–2001)|IHL]]
| 19
| 14
| 19
| 33
| 32
| 2
| 0
| 1
| 1
| 2
|-
| [[1992–93 NHL season|1992–93]]
| Chicago Blackhawks
| NHL
| 15
| 1
| 6
| 7
| 33
| --
| --
| --
| --
| --
|- bgcolor="#f0f0f0"
| 1993–94
| [[Kalamazoo Wings (1974–2000)|Kalamazoo Wings]]
| IHL
| 79
| 42
| 113
| 155
| 188
| 5
| 1
| 3
| 4
| 6
|-
| [[1993–94 NHL season|1993–94]]
| [[Dallas Stars]]
| NHL
| 1
| 0
| 0
| 0
| 0
| --
| --
| --
| --
| --
|- bgcolor="#f0f0f0"
| 1994–95
| [[Phoenix Roadrunners (IHL)|Phoenix Roadrunners]]
| IHL
| 69
| 34
| 73
| 107
| 135
| 9
| 4
| 12
| 16
| 0
|-
| [[1994–95 NHL season|1994–95]]
| [[Los Angeles Kings]]
| NHL
| 2
| 0
| 0
| 0
| 0
| --
| --
| --
| --
| --
|- bgcolor="#f0f0f0"
| 1995–96
| [[Chicago Wolves]]
| IHL
| 79
| 52
| 91
| 143
| 100
| 9
| 4
| 11
| 15
| 6
|-
| 1996–97
| Chicago Wolves
| IHL
| 76
| 37
| 80
| 117
| 98
| 4
| 2
| 4
| 6
| 16
|- bgcolor="#f0f0f0"
| [[1997–98 NHL season|1997–98]]
| Pittsburgh Penguins
| NHL
| 82
| 15
| 25
| 40
| 59
| 6
| 1
| 0
| 1
| 4
|-
| [[1998–99 NHL season|1998–99]]
| Pittsburgh Penguins
| NHL
| 58
| 13
| 11
| 24
| 16
| 13
| 2
| 5
| 7
| 8
|- bgcolor="#f0f0f0"
| [[1999–00 NHL season|1999–00]]
| Pittsburgh Penguins
| NHL
| 50
| 10
| 13
| 23
| 10
| 11
| 1
| 2
| 3
| 0
|-
| 2000–01
| Chicago Wolves
| IHL
| 75
| 24
| 53
| 77
| 99
| 16
| 4
| 13
| 17
| 26
|- bgcolor="#f0f0f0"
| [[2001–02 AHL season|2001–02]]
| Chicago Wolves
| [[American Hockey League|AHL]]
| 80
| 29
| 54
| 83
| 103
| 25
| 7
| 26
| 33
| 34
|-
| [[2002–03 AHL season|2002–03]]
| Chicago Wolves
| AHL
| 59
| 15
| 48
| 63
| 83
| 9
| 1
| 6
| 7
| 6
|- bgcolor="#e0e0e0"
! colspan="3" | NHL Totals
! 543
! 190
! 248
! 438
! 599
! 54
! 12
! 14
! 26
! 45
|}


Basic steps:
==International play==
#compute the instantaneous frequency/amplitude relationship of the signal using the [[Short-time Fourier transform|STFT]], which is the [[discrete Fourier transform]] of a short, overlapping and smoothly windowed block of samples;
*Played for Team Canada in the 1988 World Junior Championships.
#apply some processing to the Fourier transform magnitudes and phases (like resampling the FFT blocks); and
#perform an inverse STFT by taking the inverse Fourier transform on each chunk and adding the resulting waveform chunks.


The phase vocoder handles [[sinusoid]] components well, but early implementations introduced considerable smearing on [[transient (acoustics)|transient]] ("beat") waveforms at all non-integer compression/expansion rates, which renders the results phasey and diffuse. Recent improvements allow better quality results at all compression/expansion ratios but a residual [[smearing]] effect still remains.
'''International Statistics'''
{| BORDER="0" CELLPADDING="3" CELLSPACING="0"
|- ALIGN="center" bgcolor="#e0e0e0"
! Year
! Team
! Event
! GP
! G
! A
! Pts
! PIM
|- ALIGN="center"
| 1988
| Canada
| WJC
| 7
| 6
| 2
| 8
| 2
|}


The phase vocoder technique can also be used to perform pitch shifting, chorusing, timbre manipulation, harmonizing, and other unusual modifications, all of which can be changed as a function of time.
==External links==
*{{hockeydb|619|Rob Brown}}
*{{legendsofhockey|10181|Rob Brown}}


== Time domain ==
<br>
{{start box}}
{{s-ach|aw}}
{{succession box| before = [[Cliff Ronning]] | title = [[Four Broncos Memorial Trophy|WHL West Player of the Year]] | years = [[1985–86 WHL season|1986]], [[1986–87 WHL season|1987]] | after = [[Joe Sakic]]}}
{{succession box| before = [[Luc Robitaille]] | title = [[CHL Player of the Year]] | years = 1987 | after = [[Joe Sakic]]}}
{{succession box| before = [[Tony Hrkac]] | title = [[James Gatschene Memorial Trophy]] | years = 1994 | after = [[Tommy Salo]]}}
{{succession box| before = [[Tony Hrkac]]<br>[[Stephane Morin]] | title = [[Leo P. Lamoureux Memorial Trophy]] | years = 1994<br>1996, 1997 | after = [[Stephane Morin]]<br>[[Patrice Lefebvre]]}}
{{end box}}


[[Rabiner]] and Schafer in 1978 put forth an alternate solution that works in the [[time domain]]: attempt to find the [[periodic signal|period]] (or equivalently the [[fundamental frequency]]) of a given section of the wave using some [[pitch detection algorithm]] (commonly the peak of the signal's [[autocorrelation]], or sometimes [[cepstrum|cepstral]] processing), and [[fade (audio engineering)|crossfade]] one period into another.
{{DEFAULTSORT:Brown, Rob}}
This is called [[time domain harmonic scaling]] or the [[synchronized overlap-add method]] and performs somewhat faster than the phase vocoder on slower machines but fails when the autocorrelation mis-estimates the period of a signal with complicated harmonics (such as [[orchestra]]l pieces).
[[Category:1968 births]]
[[Adobe Audition]] (formerly Cool Edit Pro) seems to solve this by looking for the period closest to a center period that the user specifies, which should be an integer multiple of the tempo, and between 30 [[hertz|Hz]] and the lowest bass frequency. For a 120 [[beats per minute|bpm]] tune, use 48 Hz because 48 Hz = 2,880 cycles/minute = 24 cycles/beat * 120 bpm.{{Fact|date=February 2007}}
[[Category:Calder Cup champions]]
[[Category:Canadian ice hockey right wingers]]
[[Category:Canadians of British descent]]
[[Category:Chicago Blackhawks players]]
[[Category:Chicago Wolves players]]
[[Category:Dallas Stars players]]
[[Category:Hartford Whalers players]]
[[Category:Ice hockey personnel from Ontario]]
[[Category:Kamloops Blazers alumni]]
[[Category:Kamloops Junior Oilers alumni]]
[[Category:Living people]]
[[Category:Los Angeles Kings players]]
[[Category:National Hockey League All-Stars]]
[[Category:National Hockey League players with 100 point seasons]]
[[Category:People from Kingston, Ontario]]
[[Category:Pittsburgh Penguins draft picks]]
[[Category:Pittsburgh Penguins players]]


This is much more limited in scope than the phase vocoder based processing, but can be made much less processor intensive, for real-time applications. It provides the most coherent results for single-pitched sounds like voice or musically monophonic instrument recordings.
[[fr:Rob Brown (hockey sur glace)]]

High-end commercial audio processing packages either combine the two techniques (for example by separating the signal into sinusoid and transient waveforms), or use other techniques based on the [[wavelet]] transform, or artificial neural network processing, producing the highest-quality time stretching.

== Sinusoidal/Spectral Modeling ==

Another alternative method for time stretching relies on a [[Spectral_modelling_synthesis|spectral model]] of the signal. In this method, peaks are identified in frames the [[Short-time Fourier transform|STFT]] of the signal, and sinusoidal "tracks" are created by connecting peaks in adjacent frames. The tracks are then re-synthesized at a new time scale. This method can yield good results on both polyphonic and percussive material, especially when the signal is separated into sub-bands. However, this method is more computationally demanding than other methods.

== Speed reading ==

Time stretching can be used with [[audio book]]s and recorded lectures.
Slowing down may improve comprehension of foreign languages[http://www.enounce.com/whatistsm.shtml].

While one might expect speeding up to reduce comprehension,
Herb Friedman says that "Experiments have shown that the brain works most efficiently if the information rate through the ears--via speech--is the "average" reading rate, which is about 200-300 wpm (words per minute), yet the average rate of speech is in the neighborhood of 100-150 wpm."
[http://www.atarimagazines.com/creative/v9n7/122_Variable_speech.php ]

Speeding up audio is seen as the equivalent of "speed reading"
[http://www.nevsblog.com/2006/06/23/listen-to-podcasts-in-half-the-time/ ]
[http://cid.lib.byu.edu/?p=128 ].

== Other ==

Time stretching is often used to adjust [[Radio commercial]]s
[http://www.tvtechnology.com/features/audio_notes/f_audionotes.shtml] and the audio of [[Television advertisement]]s[[http://www.atarimagazines.com/creative/v9n7/122_Variable_speech.php]] to fit exactly into the 30 or 60 seconds available.
(A [[telecine]] pulldown pattern adjusts the video).

== Pitch scaling ==

These techniques can also be used to [[transposition (music)|transpose]] an audio sample while holding speed or duration constant. This may be accomplished by time stretching and then resampling back to the original length. Alternatively, the frequency of the sinusoids in a sinusoidal model may be altered directly, and the signal reconstructed at the appropriate time scale.

Transposing can be called '''[[pitch (music)|pitch]] scaling''' or '''[[pitch shifting]]''', depending on perspective.

For example, one could move the frequency of every note up by a perfect fifth, keeping the tempo the same.
One can view this transposition as "pitch shifting", "shifting" each note up 7 keys on a piano keyboard, or adding a fixed amount on the [[Mel scale]], or adding a fixed amount in linear [[pitch space]].
One can view the same transposition as "pitch scaling", "scaling" (multiplying) the frequency of every note by 3/2.

Musical transposition preserve the ratios of the [[harmonic]] frequencies that determine the sound's [[timbre]], unlike the ''frequency shift'' performed by [[amplitude modulation]], which adds a fixed frequency offset to the frequency of every note. (In theory one could perform a literal ''pitch scaling'' in which the musical pitch space location is scaled [a higher note would be shifted at a greater interval in linear pitch space than a lower note], but that is highly unusual, and not musical).

Time domain processing works much better here, as smearing is less noticeable, but scaling vocal samples distorts the [[formant]]s into a sort of [[Alvin and the Chipmunks]]-like effect, which may be desirable or undesirable.
A process that preserves the formants and character of a voice involves analyzing the signal with a [[vocoder|channel vocoder]] or [[Linear predictive coding|LPC]] vocoder plus any of several [[pitch detection algorithm]]s and then resynthesizing it at a different fundamental frequency.

==See also==
{{Wikibooks|Phase vocoder and encoder in MATLAB}}
* [[Audio signal processing]]
* [[Pitch control]]
* [[Sound effect]]s

== External links ==

*[http://sourceforge.net/projects/mffmtimescale/ MFFM Time Scale Modification for Audio] Implementation of the WSOLA algorithm for time scale modification of audio without artifact introduction
*[http://www.lownorth.nl/software/products/TimeToy.html Timetoy (Mac OS X)] A fun, easy-to-use timestretcher for Mac OS X by LowNorth
*[http://www.dspdimension.com/admin/time-pitch-overview/ Time Stretching and Pitch Shifting Overview] A comprehensive overview of current time and pitch modification techniques by Stephan Bernsee (Quite old, 1999)
*[http://www.dspdimension.com/admin/pitch-shifting-using-the-ft/ Stephan Bernsee's smbPitchShift C source code] C source code for doing frequency domain pitch manipulation
*[http://www.panix.com/~jens/pvoc-dolson.par The Phase Vocoder: A Tutorial] - A good description of the phase vocoder
*[http://www.ee.columbia.edu/~dpwe/papers/LaroD99-pvoc.pdf New Phase-Vocoder Techniques for Pitch-Shifting, Harmonizing and Other Exotic Effects]
*[http://www.ircam.fr/equipes/analyse-synthese/roebel/paper/dafx2003.pdf A new Approach to Transient Processing in the Phase Vocoder]
*[http://www.fon.hum.uva.nl/praat/manual/PSOLA.html PSOLA Synthesis], [http://www.ee.columbia.edu/~dpwe/papers/HejMus91-solafs.pdf SOLAFS Synthesis] - Two specific methods of time domain [[time domain harmonic scaling|TDHS]] or [[synchronous overlap-add processing|SOLA]] processing.
*[http://www.aes.org/ Audio Engineering Society]
*Original E2 article (http://everything2.com/index.pl?node_id=1074923)
*[http://www.dspdimension.com/data/html/dirac.html DSPdimension: DIRAC library]
*[http://www.time-stretching.com zplane.development: élastique SDKs]
*http://www.bdti.com/faq/dsp_faq.htm - comp.dsp FAQ
*[http://www.surina.net/soundtouch SoundTouch library] - An open-source implementation of time/pitch scaling algorithms. SoundStretch came from here. Used in the cross-platform [[Audacity]] editor.
* [http://keizai.yokkaichi-u.ac.jp/~ikeda/research/picola.html PICOLA and TDHS]
*[http://sourceforge.net/projects/wavmasher wavMasher] - Time and pitch scaling software
*[http://hypermammut.sourceforge.net/paulstretch/ PaulStretch] A program that works for extreme time stretching (like 50x), only
*[http://substance-night.it/4BPS/pitchshift.html 4 Band Shifter] An open source VST plugin based on Bernsee's code that shifts the pitch on 4 independent, user-definable frequency bands.
*[http://sourceforge.net/projects/sbsms/ sbsms] An open source sub-band-sinusoidal modeling library/tool for time-stretching/pitch-shifting
[[Category:Audio engineering]]
[[Category:Digital signal processing]]
[[Category:Sound effects]]

[[de:Pitch Shifter]]
[[it:Timestretching]]
[[sv:Timestretch]]

Revision as of 20:41, 10 October 2008

Time stretching is the process of changing the speed or duration of an audio signal without affecting its pitch. Pitch scaling or pitch shifting is the reverse: the process of changing the pitch without affecting the speed. There are also more advanced methods used to change speed, pitch, or both at once, as a function of time.

These processes are used, for instance, to match the pitches and tempos of two pre-recorded clips for mixing when the clips cannot be reperformed or resampled. (A drum track could be moderately resampled for tempo without adverse effects, but a pitched track could not). They are also used to create effects such as increasing the range of an instrument (like pitch shifting a guitar down an octave).

Resampling

The simplest way to change the duration or pitch of a digital audio clip is to resample it. This is a mathematical operation that effectively rebuilds a continuous waveform from its samples and then samples that waveform again at a different rate. When the new samples are played at the original sampling frequency, the audio clip sounds faster or slower. Unfortunately, the frequencies in the sample are always scaled at the same rate as the speed. In other words, slowing down the recording lowers the pitch, speeding it up raises the pitch, and the two effects cannot be separated. This is analogous to speeding up or slowing down an analog recording, like a phonograph record or tape, creating the chipmunk effect.

Phase vocoder

One way of stretching the length of a signal without affecting the pitch is to build a phase vocoder after Flanagan, Golden, and Portnoff.

Basic steps:

  1. compute the instantaneous frequency/amplitude relationship of the signal using the STFT, which is the discrete Fourier transform of a short, overlapping and smoothly windowed block of samples;
  2. apply some processing to the Fourier transform magnitudes and phases (like resampling the FFT blocks); and
  3. perform an inverse STFT by taking the inverse Fourier transform on each chunk and adding the resulting waveform chunks.

The phase vocoder handles sinusoid components well, but early implementations introduced considerable smearing on transient ("beat") waveforms at all non-integer compression/expansion rates, which renders the results phasey and diffuse. Recent improvements allow better quality results at all compression/expansion ratios but a residual smearing effect still remains.

The phase vocoder technique can also be used to perform pitch shifting, chorusing, timbre manipulation, harmonizing, and other unusual modifications, all of which can be changed as a function of time.

Time domain

Rabiner and Schafer in 1978 put forth an alternate solution that works in the time domain: attempt to find the period (or equivalently the fundamental frequency) of a given section of the wave using some pitch detection algorithm (commonly the peak of the signal's autocorrelation, or sometimes cepstral processing), and crossfade one period into another. This is called time domain harmonic scaling or the synchronized overlap-add method and performs somewhat faster than the phase vocoder on slower machines but fails when the autocorrelation mis-estimates the period of a signal with complicated harmonics (such as orchestral pieces). Adobe Audition (formerly Cool Edit Pro) seems to solve this by looking for the period closest to a center period that the user specifies, which should be an integer multiple of the tempo, and between 30 Hz and the lowest bass frequency. For a 120 bpm tune, use 48 Hz because 48 Hz = 2,880 cycles/minute = 24 cycles/beat * 120 bpm.[citation needed]

This is much more limited in scope than the phase vocoder based processing, but can be made much less processor intensive, for real-time applications. It provides the most coherent results for single-pitched sounds like voice or musically monophonic instrument recordings.

High-end commercial audio processing packages either combine the two techniques (for example by separating the signal into sinusoid and transient waveforms), or use other techniques based on the wavelet transform, or artificial neural network processing, producing the highest-quality time stretching.

Sinusoidal/Spectral Modeling

Another alternative method for time stretching relies on a spectral model of the signal. In this method, peaks are identified in frames the STFT of the signal, and sinusoidal "tracks" are created by connecting peaks in adjacent frames. The tracks are then re-synthesized at a new time scale. This method can yield good results on both polyphonic and percussive material, especially when the signal is separated into sub-bands. However, this method is more computationally demanding than other methods.

Speed reading

Time stretching can be used with audio books and recorded lectures. Slowing down may improve comprehension of foreign languages[1].

While one might expect speeding up to reduce comprehension, Herb Friedman says that "Experiments have shown that the brain works most efficiently if the information rate through the ears--via speech--is the "average" reading rate, which is about 200-300 wpm (words per minute), yet the average rate of speech is in the neighborhood of 100-150 wpm." [2]

Speeding up audio is seen as the equivalent of "speed reading" [3] [4].

Other

Time stretching is often used to adjust Radio commercials [5] and the audio of Television advertisements[[6]] to fit exactly into the 30 or 60 seconds available. (A telecine pulldown pattern adjusts the video).

Pitch scaling

These techniques can also be used to transpose an audio sample while holding speed or duration constant. This may be accomplished by time stretching and then resampling back to the original length. Alternatively, the frequency of the sinusoids in a sinusoidal model may be altered directly, and the signal reconstructed at the appropriate time scale.

Transposing can be called pitch scaling or pitch shifting, depending on perspective.

For example, one could move the frequency of every note up by a perfect fifth, keeping the tempo the same. One can view this transposition as "pitch shifting", "shifting" each note up 7 keys on a piano keyboard, or adding a fixed amount on the Mel scale, or adding a fixed amount in linear pitch space. One can view the same transposition as "pitch scaling", "scaling" (multiplying) the frequency of every note by 3/2.

Musical transposition preserve the ratios of the harmonic frequencies that determine the sound's timbre, unlike the frequency shift performed by amplitude modulation, which adds a fixed frequency offset to the frequency of every note. (In theory one could perform a literal pitch scaling in which the musical pitch space location is scaled [a higher note would be shifted at a greater interval in linear pitch space than a lower note], but that is highly unusual, and not musical).

Time domain processing works much better here, as smearing is less noticeable, but scaling vocal samples distorts the formants into a sort of Alvin and the Chipmunks-like effect, which may be desirable or undesirable. A process that preserves the formants and character of a voice involves analyzing the signal with a channel vocoder or LPC vocoder plus any of several pitch detection algorithms and then resynthesizing it at a different fundamental frequency.

See also

External links