Tuesday, November 17, 2015

Writing Audio Files With Python, part 1

There aren't really great Python audio libraries, at least not any that I can find that are targeted for a complete novice.  Python does have the "wave" library which can at least handle the metadata to write audio in some format.

Before we jump into an example, we should note that there are at least two main ways of describing an audio stream:

  • You can describe "these are the notes that are played at this time".  Think sheet music, or a FFT representation.  This is probably how you think about music.
  • You can describe "this is the 'pressure' of the music at this time" (we'll explain how 'pressure' works later).  This is called "sampling" and is how basically all recordings of music work.  The main problem we face is converting from the first representation to the second representation.
Here's a "Hello World" of using the Python WAV writer:

# Copyright 2015
import wave
SAMPLES_PER_SECOND = 4400
w = wave.open("/media/sf_tl-share/test.wav", mode="w")w.setnchannels(1)w.setsampwidth(1)w.setframerate(SAMPLES_PER_SECOND)w.setnframes(SAMPLES_PER_SECOND * 10)for i in xrange(SAMPLES_PER_SECOND * 10):    w.writeframes("a" if (i%10 < 5) else "A")w.close()
Fairly straightforward.  A few notes:
  • CDs normally have 44100 samples per second.  This is enough to fully encompass the human hearing range.  The Nyquist-Shannon theorem explains why this is the case.  To decrease the file size, we're using only 4400 samples per second here.
  • We are setting the "framedata" to be a byte.  Characters are the easiest way to represent bytes in Python, at least for a Hello World example.  It's clearly better to use integers internally and the struct module when writing them. (WAV files are little-endian, and we're currently using 8-bit sound, represented by 8-bit unsigned integers)
  • This is a "square wave".  We would have a "smoother" sound if we used a sine wave to generate the data points.  We'll get into Fast Fourier Transforms and the explanation of why these sound different later.
  • What are these "samples" anyhow?  For lack of a better word, they're sampling "pressure". Pressure is a way that we can describe the sound wave; the pressure is something that alternates like a wave.  Sound waves are theoretically smooth pitches (i.e. a sine wave at 440Hz is concert A).  This definition feels very circular, but I don't know a better representation.
  • For performance reasons, we need to batch the writes for calling writeframes.  
import numpy as np
import struct
import wave
SAMPLES_PER_SECOND = 4410
w = wave.open("/media/sf_tl-share/test.wav", mode="w")
w.setnchannels(1)
w.setsampwidth(1)
w.setframerate(SAMPLES_PER_SECOND)
w.setnframes(SAMPLES_PER_SECOND * 10)
 
def write_one_second(pitch):
  frames = []
  for i in xrange(SAMPLES_PER_SECOND-30):
    pressure = 32 + 16 * np.sin(2 * np.pi * i * pitch / SAMPLES_PER_SECOND)
    frames.append(struct.pack("<B", pressure))
  for j in xrange(30):
    frames.append(struct.pack("<B", 32))
  w.writeframes("".join(frames))
 
for i in xrange(2):
    write_one_second(440)
    write_one_second(261)
    write_one_second(261* 1.06)
    write_one_second(261 * 1.06 * 1.06)
    write_one_second(440 / 1.06)
w.close()

We get a few artifacts, but it mostly resembles what you'd expect the notes A C C# D G# to sound like, and it runs fairly quickly.

A few notes going forward:

  • It would probably be best to store the whole thing in a numpy array and convert it at the end. That's probably overkill for this example, though.  It does let us do more transforms.
  • As a general rule, 44100 will eliminate a lot of noise artifacts.
  • The "write_one_second" function doesn't let us have multiple notes play at once, but that's an obvious change.

No comments:

Post a Comment