> For the complete documentation index, see [llms.txt](https://seedlings-nouns.bergelsonlab.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://seedlings-nouns.bergelsonlab.com/group-1/audio-files/clan-files.md).

# CLAN files

## Processing

Audio files were processed using LENA software and exported to .csv (5 minute detail), .cha (which created a .wav file of the original audio from the recorder), and .its files.

After creating those exports, we converted the LENA exported .its file to a .cha file with the LENA tiers using the `lena2chat` command in CLAN. This file was saved in its unadulterated form as "XX\_XX\_lena.cha" for each child at each month's recording. We then copied the .cha file to create a version to hold annotations.

Silent periods in the file (when the child is asleep) were identified using Audacity and a python script ([audiowords.py](https://github.com/SeedlingsBabylab/audiowords)). More details on audio processing from LENA and silence finding can be found [here](https://gitbook.bergelsonlab.com/data-pipeline/audio-processing).

## Annotations

Files were annotated using CLAN software (<https://dali.talkbank.org/clan/>; note that for MacOS, files were annotated using CLAN for OS 10.14 and below, not CLANc).

The process for annotators was as follows:

* Open the .cha file in CLAN and link the media file (.wav)
* Begin playing the audio using continuous playback function in CLAN (play from beginning of file for 06 & 07 month; play from beginning of top ranked subregion for month 08+)
* Listen for a [codeable noun](/noun-annotations.md#noun)&#x20;
* Type the noun and the [accompanying utterance codes](/noun-annotations.md#utterance-type) on the tier in which it is uttered using a space and "&=" to separate the noun from the utterance type code, object presence code, and speaker code (annotation ID is added at later time)
* Here is an example of how an annotation would look in a .cha file immediately after being added by the coder (annotation at the beginning of the line that begins with `*XXX:` which signifies that it is a CLAN speaker tier, as opposed to lines beginning with `%xxx:`

<figure><img src="/files/D4vnngECmTHwwMmTDUsJ" alt=""><figcaption><p>In this example, everything that is outside of the red box is original to the exported lena.cha file (created from the .its file)</p></figcaption></figure>

* Additional codeable nouns uttered in the same tier were added as they occurred chronologically in the recording, with one space between each noun.&#x20;
* Coder would continue through each subregion (or the entire recording for 06 & 07 months).
* Coders would take note of anything remarkable and write short synopses of each region of the file in a word document called Audio Coding Issues.

## Post Annotation Processing

The coder would use CLAN check (esc + L) to locate formatting errors in the file. Common CLAN check errors are described [here](https://gitbook.bergelsonlab.com/data-pipeline/audio/clan-check).&#x20;

Then, the file would be processed into .csv format. The coder would run a python script called [parse\_clan2](https://github.com/SeedlingsBabylab/parse_clan2), which would output a csv file with rows for each annotation in the file and columns for the codes and metadata from the CLAN file (tier that the annotation occurs on and the timestamp from the tier).


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://seedlings-nouns.bergelsonlab.com/group-1/audio-files/clan-files.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
