Sound Editing For Film
Preparing a Picture Edit for Sound Editing Mixing
This brief guide applies to any non-linear picture editing system; including Avid, Final Cut Pro and Premiere.
In order that sound for a given project may be gotten to broadcast or theatrical release specifications, it is necessary to spend some time in a room that is designed specifically for sound work. Such a room is acoustically tuned with specially selected and placed speaker systems. One of these rooms would also contain a variety of tools for working with sound, such as an audio editing system, a mixing board, equalizers, compressors, noise reduction units and reverbs. A room that is designed and equipped for creating the final balance of the sound would be called a mix room. A room that is designed for preparing the sound for the final mix, such as editing dialogue, adding sound effects or music is called an edit room. Depending upon the nature of a given project, one or both types of rooms will be needed.
It should be kept in mind that projects that screen in theaters will play at louder volumes than are used during editing. This means that any small problems are magnified for the viewing public. It is considered prudent practice to at least screen the film or video in a theatrical mix room to have a clear idea of how the piece will finally play.
Potential sound issues in a picture editing room
Most picture-editing systems are optimized for, well, picture editing. There are a number of practices that promote good sound that are not strictly necessary for picture editing.
The first area of concern is the listening environment. A good space for sound work is usually acoustically isolated from the rooms around it (sound-proofed) and includes some sort of acoustical treatment to the ceiling, floor and walls of the room. The speakers (monitors) that are used should be of high quality and be a part of a well-designed and implemented audio chain including a mixer and amplifiers. The room should not be noisy, items such as hard drives and other equipment with fans should be isolated from the operator. Other noisy editing stations should not be in operation in the same room.
Regarding the editing workstation itself, it must be of high quality and optimized for audio. Audio interfaces to the system should be specially designed for professional quality transfer of sound. This means the little walkman style jack on the back of the computer isn't going to cut it. If audio is digitized via some digital means such as firewire, a test should be performed by some knowledgeable individual to insure that the particular system being used is not degrading the audio in some way. Getting audio into a system digitally does not insure that the audio is intact. At the very least, one should check the sound on the original tape against the sound that has been transferred into the workstation to see if it sounds the same. At minimum you will need a set of high-quality headphones to check this.
Once the audio is in the workstation, any processing that is done to the audio such as cutting, mixing, and volume changes as well as effects such as EQ, compression and reverb must be of high quality. Many picture editing stations are, not surprisingly, optimized for picture editing and cut corners on the audio. Many effects that are added to audio tracks cannot be undone at a later date, as they may be effectively rendered into the sound files. In some cases, as effects are added to the image and sound being edited, the performance of the computer will begin to degrade. This is especially true of real-time effects in both picture and sound. Telltale signs of system overload include dropped frames or poor rendering of effects on playback. All systems that start to perform poorly begin to do so in a particular order, and it is not uncommon for a picture editing system to let the audio crap out first. Don't feel bad, audio editing systems do the same thing with picture playback.
If your audio has not yet been prepared for mix, a sound edit will be necessary. If your picture edit room is friendly to sound work, a lot can be done here. Adding sound effects and music from CDs can usually be easily accomplished. If voice over or foleys (sound effects that are performed in sync with picture) are required, you must at least have access to a soundproofed voice over booth. Additional sounds that are recorded through a mic in proximity to hard drives and other noise-producing devices are usually not acceptable for broadcast or theatrical purposes.
Preparing Your Elements For A Sound Mix
All sound elements received such as tapes, CD's, hard drives, files, etcetera must be well labeled. If elements such as effects or music are to be added, cue sheets (also called spot lists) should be prepared using the time code numbers from your reference tape to indicate the location of the cues.
There are a few basic rules in sound from which all operations in audio post-production flow. Here they are:
1. The sound edit & mix will take place on a completely separate system from the picture editing system.
Sound that has been moved out of your picture editing system and into an audio mixing system such as Pro Tools may find itself running at a different speed than your picture, or other problems such as missing or distorted audio may occur. Without going into the many scenarios that could cause such mayhem, there is useful tool for figuring out problems when they occur: a reference tape. This is an output of picture and sound from your picture editing system (Avid, Final Cut, etc.) to a tape with timecode that matches the timeline in your picture edit. The tape must represent your work at the time of output, and must have a clear head and tail beep to insure sound and picture synchronization.
Reference Tape Preparation
On your picture editing system, you must provide an academy leader (that string of numbers starting with 8 that counts down to 2) and a "2 beep" (a 1 frame long 1khz chunk of test tone that sits on the timeline with the picture of the number 2, that is also 1 frame long). The timecode and the 2 beep act as a means of synchronizing the sound on a sound editing system with the picture that has been edited on your system. There should be at least 2 minutes of timecode rolling on tape prior to the First Frame Of Action (FFOA), and the first frame of action should begin promptly at 01:00:00:00 timecode. You should leave at least a minute of rolling timecode after the Last Frame Of Action (LFOA) unless you have not yet added final credits, in which case leave several minutes of rolling timecode to allow for credits to be inserted. Don't be cheap with the tape, you will pay in mix time.
The timecode on your reference tape must be contiguous and unbroken, that is to say, it does not stop or pause or skip or stutter or freeze at any point in your program, not even for a fraction of a frame. The picture on your reference tape must also be utterly continuous, with no dropped or stuck frames. The most common cause of such problems on less expensive systems is that the operator is asking too much of the system. Try eliminating any real-time effects, including audio processing such as reverbs, eq’s, etc. Most of these kinds of effects can be re-created quickly and better in the mix. If you must have these effects, then render them as media for playback.
The type of timecode (29.97 non-drop, 29.97 drop frame, etc.) must match the type of timecode used in your editing system. If you have unusual timecode and sync needs (i.e., you shot it in 25fps PAL, you're editing in 29.97 NTSC and you're going to shoot a tape to film at 24fps) don't be a hero, call a professional.
2. Dialogue should be intelligible.
If you (or, preferably, someone who is unfamiliar with your project) cannot understand dialogue that has been recorded, the audio from that scene (the "track") should be cleaned up or replaced
Cleaning up Dialogue
Audio professionals are frequently asked to "clean up" dialogue. It is commonly assumed by the inexperienced movie maker that there are some cool magic devices out there that will take badly recorded material and correct it quickly and simply. The truth is that while there are many tools available to adjust for specific kinds of shortcomings in a recording, few of these remedies are without a price, in both a monetary and sonic sense. In most cases, nothing beats a great production recording as a place to start. In order that any given audio may be made to sound like a movie, your various recorded tracks will be treated in varying ways depending upon what has already been recorded and where it wants to end up.
3. Sound within a scene that does not match should be smoothed out.
This is done by overlapping audio from adjacent shots within a scene and adding small fades between the overlaps. This should be done with all shots in the project. When the action moves from one scene to another, the tone may be permitted to shift, sometimes drastically. This is operation is considered part of your dialogue edit.
If you do nothing else for the sound on your project, you should complete a dialogue edit. This is usually done after the picture has been completely edited. The dialogue edit in its simplest form does not need to be especially complex or time-consuming, and will make any subsequent sound work proceed more quickly. A serviceable dialogue edit can often be done on your video editing system.
The graphics included here are from Pro Tools, but the concepts described here can be applied to any non-linear editing system with multiple audio tracks.
The first step of your dialogue edit will be to split your tracks. This is also sometimes referred to as checkerboarding. The audio regions of each shot should occupy alternating tracks.
If your sound is stereo, or recorded with more than one microphone (like a boom on channel 1 and a wireless on channel 2), keep all audio tracks grouped together side by side and split as before. The grouped pairs of tracks should not be mixed or combined sonically.
Splitting stereo tracks.
The mix will move faster if audio is logically grouped by the way it sounds, the way it was recorded, or the way it should sound. For example, let’s say you recorded a scene with two people having a conversation. A boom is used on each actor individually. This means the mic will be pointing at each actor directly, but at opposing angles in the environment for the differing shots. Place the boom from shot #1 on one track and the boom from shot #2 on another track.
Boom audio on top track, lavalier on bottom.
For a sound that needs special treatment applied in the mix, such as poorly recorded dialogue with heavy background noise or a voice that is to receive an effect like a telephone filter or off camera dialogue, place the audio to be effected on its own track. The audio regions should be cut and placed to match what is happening on the picture.
Boom on top track, lavalier on middle track, sound for special treatment on bottom.
Once tracks have been split, it is useful to create overlaps between them. This allows room tone to be faded from one shot to another in an effort to create smooth transitions between them. Overlaps should not extend to dialogue or other sounds that are not to be included in the film. This part of the edit requires the best sound reproduction that can be had. You may need to listen at a higher than usual volume and you may need to use headphones as well. At this point you are working on the room tone between the dialogue, not the dialogue itself.
Once overlaps have been added, some editing systems allow short fades to be rendered at the transition points.
Adding fades at transitions.
How do you know if the overlaps and fades smooth out the dialogue sufficiently? You listen, that’s how. If you don’t have a edit room that is properly set up for sound (an inferior sound setup would include cheap speakers, lots of machine noise from computers and hard drives going), try listening through a pair of headphones at a volume loud enough to hear the transitions between shots. Don’t add equalization or other effects to make it sound good, though you can adjust volumes between regions. If the transitions seem overly noticeable, you will need to add some tone on an additional track.
Bottom track is added room tone.
If usable room tone was recorded on location, that can be used. The tone needs to be similar to the tone of the audio that is actually being used in the shots. Once again, this is determined by listening. If wild tone recorded on location does not match, the tone will have to be harvested from the existing production, from the little spaces between the words or at the head and tail of shots.
Looping available room tone.
It can be a challenge to make the dialogue tracks sound smooth, after all, the project was likely shot on different days or maybe even in different years. Assembling an apparently seamless stream of sound from a wide variety of recordings is the core of the craft of film sound.
Generally speaking, the dialogue tracks like to play the loudest of all of your tracks in the final mix. If additional sound effects or music are used at the same time as the dialogue, some problems can be concealed to some extent, but it is considered good practice to treat the dialogue track as if it will play alone.
4. Small problems with sound may be hidden.
When audio from adjacent shots cannot be smoothed out sufficiently, additional "room tone" (the sound that the recording makes in between the words) should be added on a track to help hide the inconsistency. The room tone should be similar in character to the production audio. This is also part of your dialogue edit. In some cases, addition of effects or music might help to mitigate audio problems.
5. That which cannot be made intelligible, smoothed out or hidden should be replaced.
This means recording dialogue over again in the post-production facility. When dialogue is replaced, all other sound (room tone, actor movements, other effects) must be replaced as well. The best source for replacement dialogue is usually agreed to be the sound recorded on set, such as from alternate takes of the scene. This audio usually has the best match to the sound and energy that can only be generated on set. One simple and very effective technique that is often overlooked by filmmakers is to record some takes of the scene wild on set without the camera rolling and with the microphone(s) in optimum position. This wild sound can often be used in place of the actual sync audio to avoid costly and stressful ADR, which rarely seems as good from a performance point of view as original takes.
For greatest efficiency, the picture edit should be completed ("picture locked") before the sound work begins. It is possible to begin sound work before picture lock, but additional expense will be incurred in re-conforming your audio edit to your new picture. Once the picture is locked, you need to output your picture and existing sound to a time-coded video tape, preferably a Beta SP. Other acceptable formats include 3/4" U-matic and Digi Beta. DVCAM, and Mini DV tape can be used in a pinch, but the timecode tracks on these formats can be troublesome for higher end equipment. If there is any audio in your picture edit other than production audio (Effects, music, etc.) output your production audio to channel 1 of the reference tape and everything else to channel 2. This tape is to be used for sound edit and mix purposes only.
For some projects the sound may be prepared for mix by the picture editor. These projects would be those where extensive sound editing and additional sound effects are not required. These projects tend to be documentaries and dialogue-heavy pieces. Since most projects consist at the very least of production sound (the audio recorded on location), the preparation of this audio presents the minimum amount of work to be done to prepare for a mix.
Moving the sound from the picture edit to the sound edit
Once the picture has been locked and all possible sound work is done on the picture editing system, the sound elements must be transferred from the picture editing system (Avid, Final Cut Pro, etc.) into the sound editing system (in our case, Pro Tools). It is most preferable to transfer the audio regions as they are placed on the timeline on their various tracks intact into Pro Tools. That is to say if there are 4 tracks of dialogue, 2 tracks of effects and 2 tracks of music in your edit, the edit would be transformed into a Pro Tools session with the same arrangement of audio on 8 tracks. This can currently only be accomplished with an OMF export, supported on Avids and Final Cut Pro version 2 and higher. Naturally, the procedure for accomplishing this export is not perfectly consistent from system to system, so consult your manual. OMF translations can be picky at times, so make sure that all sound files are a single sample rate, either 44.1 or 48 KHZ, and remove any real-time audio effects from your edit for the purpose of the export.
If you are cutting on Premiere or some other system that does not support OMF exports, you will need to bounce out your audio tracks from your picture edit as AIFF files. This process consists of creating a sound file of each individual track of sound that is in your picture edit. If you had a sixty-minute piece with eight tracks of audio, then you would end up with eight, sixty-minute sound files. These files should be either AIF, SD2 or .WAV files. There must be a two beep at the head of each file that corresponds to the academy leader that you will have included in the picture edit and reference tape. How you will generate these files will be particular to the system you are using. Consult your manuals.