CLAN files

Processing

Audio files were processed using LENA software and exported to .csv (5 minute detail), .cha (which created a .wav file of the original audio from the recorder), and .its files.

After creating those exports, we converted the LENA exported .its file to a .cha file with the LENA tiers using the lena2chat command in CLAN. This file was saved in its unadulterated form as "XX_XX_lena.cha" for each child at each month's recording. We then copied the .cha file to create a version to hold annotations.

Silent periods in the file (when the child is asleep) were identified using Audacity and a python script (audiowords.py). More details on audio processing from LENA and silence finding can be found here.

Annotations

Files were annotated using CLAN software (https://dali.talkbank.org/clan/; note that for MacOS, files were annotated using CLAN for OS 10.14 and below, not CLANc).

The process for annotators was as follows:

Open the .cha file in CLAN and link the media file (.wav)
Begin playing the audio using continuous playback function in CLAN (play from beginning of file for 06 & 07 month; play from beginning of top ranked subregion for month 08+)
Listen for a codeable noun
Type the noun and the accompanying utterance codes on the tier in which it is uttered using a space and "&=" to separate the noun from the utterance type code, object presence code, and speaker code (annotation ID is added at later time)
Here is an example of how an annotation would look in a .cha file immediately after being added by the coder (annotation at the beginning of the line that begins with *XXX: which signifies that it is a CLAN speaker tier, as opposed to lines beginning with %xxx:

Additional codeable nouns uttered in the same tier were added as they occurred chronologically in the recording, with one space between each noun.
Coder would continue through each subregion (or the entire recording for 06 & 07 months).
Coders would take note of anything remarkable and write short synopses of each region of the file in a word document called Audio Coding Issues.

Post Annotation Processing

The coder would use CLAN check (esc + L) to locate formatting errors in the file. Common CLAN check errors are described here.

Then, the file would be processed into .csv format. The coder would run a python script called parse_clan2, which would output a csv file with rows for each annotation in the file and columns for the codes and metadata from the CLAN file (tier that the annotation occurs on and the timestamp from the tier).

PreviousTop 3/4 annotated hours NextVideo Files

Last updated 1 year ago