Praat: Extract data

I post here a very basic script which I use to extract data from some given (stress given) praat data.

It runs on files, folders (directories) and it can accumulate data. It has options to save in in different formats, even if just 2 now with some little controls on the output. It can be scripted for large amount of data.

For new comers praat is a linguistic tool more at praat. (you can try also praat.org which links back here).

The script requires that you segments, or better, have a very nice labeling of the data.

#! /bin/praat
########################################################################
# Script by: Fabio Mariotti & Marion Braendle
# License: GPL (Latest GPL version: http://www.gnu.org/copyleft/gpl.html)
# CopyRight: Fabio Mariotti & Marion Braendle
# $version: 1.0
########################################################################
#
# This script is under development. Use at your own risk.
#
# The script run on:
# - Depending on input option:
#   - All selected pairs (TextGrid / wav) files
#   - All files in a directory
#   - All Files in the given directory and its sub directories
#     (less tested)
#
# - It assumes you have tiers with labels
#
# - It will try to get:
#
#   - Labels on the given tier (empty labels are ignored)
#
#   - You can input regex to select the labels
#     (not suggested as you can always post-process it)
#
#   - Timing information on the current interval
#     - namely: start and end points, duration
#     - including previous, next labels on the tier.
#
#   - Formant information
#     - Here there are some hardcoded parameters:
#       - time_step = 0.0025
#	- max_number_of_formants = 5
#	- window_length = 0.025
#	- n_of_points = 10
#     - Formants data are given in that points and there is no mid point
#     	evaluation. (please use the script: get_mean_format.sh)
#       If you need a more controlled setting you should be able to edit this
#	script and setting the parameters.
#
#   - Output:
#     - The script will output 2 files for each pair (TextGrid/Wav) with the
#     	 same name. In particular these are called:
#   	 <basefilename>.formants.data
#   	 <basefilename>.timing.data
#   	 Any information there is <space> separate
#	 The script does not use any fancy praat tools or parameter, The output, should be
#	 as close as to STD values.
#
#   - post processing:
#     - Indeed we do use some scripts to get relevant data.
#     - Check get_mean_format.sh in the next post.
#
########################################################################
# Processing notes
########################################################################
#
# The directories are processed first and saved in a list as sound
#   files to be processed.
#
# I have a new version wich can check other tiers labels but it is still
#   experimental. On one side it needs some standards. What is in tier 1?
#   As names are nice but again not standardasied.
#
#  Anything which is not recognised or assumed to be not relevant is
#    marked as "_".
#    Within this lines
#     ">" marks a beginning in general
#     "<" marks an ending in general
#     Please note that theese simbols are in principle "space-less".
########################################################################
#
# Clear info box
clearinfo
#
########################################################################
# Intro form
########################################################################
form Produce data files for plotting and Analysis
comment Files work directory (All files are created here)
text workdir /home/<USER>/<DATA>/
sentence logfile createdataplot.log
optionmenu vlogfile 2
option log only activity
option create a full log file
optionmenu remfile 1
option Remove Files
option Append to files
optionmenu doplot 1
option Just produce data files
option Do draw tracks
optionmenu runon 1
option Run on Selected
option Run on sound files (and TextGrid) on a directory
option Run on sound files (and TextGrid) on a directory and subdirs
optionmenu numtier 1
option 1
option 2
option 3
option 4

comment Direcory to search for Sound files (if required)
text run_dir /home/<USER>/<DATA>/
comment Term in the name of the segment to analyse or "all" for all
sentence terms all
boolean Regex 0
## Use this default for generic vowels match
## sentence terms "^'?(a|e|i|o|u|A|E|I|O|U):?"
## boolean Regex 0
comment Formant data (F:5500/M:5000)
positive Maximum_formant_female_male 5500
comment Output format
optionmenu data_format 1
option gnuplot
option raw text
comment advanced options (Use 1.0 now)
real segment_margin 1.0
endform
#
########################################################################
# Initialization
########################################################################
#
printline ### Parameters:
#
# Window margins for selection
wmargin = segment_margin
printline Script: Segment window margin: 'wmargin'
# Number of points for the output data: actually it is -1
n_of_points = 10
printline Script: Number of points on the output data files: 'n_of_points'
#
# Formants data
time_step = 0.0025
printline Formants: Fixed time step: 'time_step'
max_number_of_formants = 5
printline Formants: max number of formants: 'max_number_of_formants'
maximum_formant = maximum_formant_female_male
printline Formants: Maximun (Hertz) Format (ScriptInput): 'maximum_formant'
window_length = 0.025
printline Formants: Window length: 'window_length'
preemphasis_from = 50
printline Formants: Preemphasis from: 'preemphasis_from'
#
# Others
soundext$ = "wav"
printline Sound File Extension defined in the script is: 'soundext$'
textgridext$ = "TextGrid"
printline Text Grid File Extension defined in the script is: 'textgridext$'
tiernum = numtier
printline Tier Number is: 'tiernum'
#
# Form parameters
#
# report initialization parameters
printline ### Form Parameters:
printline Work Directory 'workdir$'
printline Log file 'workdir$''logfile$'
printline ### EndParameters
#
########################################################################
# Start code
########################################################################
#
# If asked for remove log file
if remfile == 1
  filedelete 'workdir$''logfile$'
endif
#
########################################################################
# Check if the log file exists:
########################################################################
#
if fileReadable (workdir$+logfile$)
  pause The result file 'logfile$' already exists in 'workdir$'! Do you want to overwrite it?
  filedelete 'workdir$''logfile$'
  printline Logfile 'workdir$''logfile$' deleted.
endif
#
fileappend "'workdir$''logfile$'" # Log file for CreateDataPlot procedure 'newline$'
#
########################################################################
# Process directory selection
########################################################################
#
if runon <> 1
  # This lns variable tracks the number of files loaded.
  lns = 0
  mdirectory$ = run_dir$
  printline processing directories: start from 'run_dir$'
#
  if runon = 2
    Create Strings as file list... list 'mdirectory$'*.'soundext$'
    numberOfFiles = Get number of strings
    printline Number of files in directory 'mdirectory$' is 'numberOfFiles'

    for soundfile to numberOfFiles
      select Strings list
      soundfilename$ = Get string... soundfile
      Read from file... 'mdirectory$''soundfilename$'
      lns = lns + 1
      filename$ = left$ (soundfilename$, (length(soundfilename$) - length(soundext$) -1 ))
      printline FileName 'filename$'
      soundname$ = filename$
      select Sound 'filename$'
      lsound'lns' = selected("Sound")
      Read from file... 'mdirectory$''soundname$'.'textgridext$'
      printline Sound selected 'soundname$'
    endfor

    select Strings list
    Remove
  else
#
    Create Strings as directory list... dlist 'mdirectory$'*
    numberOfDirs = Get number of strings
    realnd = numberOfDirs - 2
    printline number of dirs 'numberOfDirs'
    printline number of Data dirs 'realnd'
#
    for nd from 3 to numberOfDirs
      select Strings dlist
      directory$ = Get string... nd
      printline processing directory 'mdirectory$''directory$'

      Create Strings as file list... list 'mdirectory$''directory$'/*.'soundext$'
      numberOfFiles = Get number of strings
      printline Number of files in directory 'directory$' is 'numberOfFiles'

      for soundfile to numberOfFiles
        select Strings list
        soundfilename$ = Get string... soundfile
        Read from file... 'mdirectory$''directory$'/'soundfilename$'
        lns = lns + 1
        filename$ = left$ (soundfilename$, (length(soundfilename$) - length(soundext$) -1 ))
        printline FileName 'filename$'
        soundname$ = filename$
        select Sound 'filename$'
        lsound'lns' = selected("Sound")
        Read from file... 'mdirectory$''directory$'/'soundname$'.'textgridext$'
        printline Sound selected 'soundname$'
      endfor

      select Strings list
      Remove

    endfor
#
    select Strings dlist
    Remove
  endif
#
# Here we select again all the sound files loaded
  for ls to lns
    plus lsound'ls'
  endfor
endif
#
#
########################################################################
# Store selection
########################################################################
# Get number of sounds
ns = numberOfSelected ("Sound")
printline Number of selected sounds: 'ns'

# Store on a vector selected sounds
for i to ns
  sound'i' = selected ("Sound", i)
endfor

printline Start running on selected sound files
########################################################################
# Main loop on sound files
########################################################################
printline Start running 'ns'
for i to ns

  # Select i Sound
  select sound'i'
  soundname$ = selected$ ("Sound")
  printline Operating on sound file 'soundname$'
  # Select i TextGrid EXPECTED to be there
  printline Requesting TextGrid file 'soundname$'
  select TextGrid 'soundname$'
  printline TextGrid file 'soundname$' Selected

  # Get and process intervals in tier 1
  numberOfIntervals = Get number of intervals... tiernum

  printline Number of intervals for TextGrid file 'soundname$' are: 'numberOfIntervals'

  if remfile == 1
    printline Removing 'workdir$''soundname$'.timing.data
    printline Removing 'workdir$''soundname$'.formants.data
    filedelete 'workdir$''soundname$'.timing.data
    filedelete 'workdir$''soundname$'.formants.data
  endif
  fileappend "'workdir$''soundname$'.timing.data" #TimeInfo SoundFile  label   duration start end 'newline$'

########################################################################
# Secondary loop on intervals
########################################################################
  for interval to numberOfIntervals
    #Select TextGrid to allow finding labels
    select TextGrid 'soundname$'
    printline processing interval number 'interval'
    label$ = Get label of interval... tiernum interval
#
    preint=interval-1
    postint=interval+1
    if preint = 0
      prelabel$=">"
    else
      prelabel$ = Get label of interval... tiernum preint
      prelabel$ = replace$(prelabel$," ","_",0)
    endif
    if length(prelabel$) = 0
      prelabel$ = "_"
    endif
    if postint > numberOfIntervals
      postlabel$="<"
    else
      postlabel$ = Get label of interval... tiernum postint
      postlabel$ = replace$(postlabel$," ","_",0)
    endif
    if length(postlabel$) = 0
      postlabel$ = "_"
    endif
#
########################################################################
# Interval label selection
########################################################################
    # Process only if label is not empty
    doterm =0
    if regex = 0
      if label$ <> "" and terms$ == "all" or terms$ <> "all" and startsWith(label$,terms$)==1
        doterm = 1
      endif
    else
      ireg = index_regex(label$,terms$)
      printline Query for regex 'terms$' gives 'ireg'
      if label$ <> "" and terms$ == "all" or terms$ <> "all" and index_regex(label$,terms$)>0
        doterm = 1
      endif
    endif
    #
    if doterm == 1
      printline Processing not empty label 'label$'

      # if the interval has a not empty label, get its start and end:
      start = Get starting point... tiernum interval
      end = Get end point... tiernum interval
      duration = end - start
      # Add the Window Margins to
      wstart = start - wmargin
      wend = end + wmargin
      wduration = wend - wstart

      # Write timing information
      if vlogfile > 1
        fileappend "'workdir$''logfile$'" #TimeInfo SoundFile  label   duration start end 'newline$'
        fileappend "'workdir$''logfile$'" Timing 'soundname$' 'label$' 'duration' 'start' 'end' 'newline$'
      else
        fileappend "'workdir$''logfile$'" #Writing 'soundname$' timing  data for 'label$' at 'start''newline$'
      endif
      fileappend "'workdir$''soundname$'.timing.data" 'soundname$' 'label$' 'duration' 'start' 'end' 'prelabel$' 'postlabel$' 'newline$'
      #fileappend "'workdir$''soundname$'.timing.data" 'soundname$' WWW 'duration' 'start' 'end' pWWW poWWW 'newline$'

      # Select the sound file and produce formants for the selected window
      select Sound 'soundname$'
      # WARN: I have no info on this function I only guess what it does
      Extract part... wstart wend Rectangular 1 yes
      Rename... window

      ### Resample... 10000 50

      # Formants analysis on the window
      select Sound window
      To Formant (burg)... time_step max_number_of_formants maximum_formant window_length preemphasis_from
      Rename... formants
      ### TEST HERE ###
        #printline TEST IT!!!!!!!!!!!!
	#Track... 3 550 1650 2750 3850 4950 1 1 1
	#Rename... formanttracks

      #
########################################################################
# Check for plot: We do it here in one line
########################################################################
      if doplot > 1
        Draw tracks... start end 3200 yes
      endif
      #

      # Determine time step size
      tstepsize = duration / n_of_points
      ptime = start
      if vlogfile > 1
        fileappend "'workdir$''logfile$'" #FormantsData SoundFile label nstep wtime FormatsFreqs 'newline$'
      else
        fileappend "'workdir$''logfile$'" #Writing 'soundname$' formant data for 'label$' at 'start''newline$'
      endif
      if data_format == 1
        fileappend "'workdir$''soundname$'.formants.data" #FormantsData SoundFile label nstep wtime FormatsFreqs  'soundname$' 'label$' 'newline$'
      endif
########################################################################
# Here 2nd inner loops to collect formants data
########################################################################
      for nstep from 0 to n_of_points
        wtime = nstep * tstepsize
        ptime = ptime + tstepsize
        if vlogfile > 1
          fileappend "'workdir$''logfile$'" Formants 'soundname$' 'label$' 'nstep' 'wtime'
        endif
        fileappend "'workdir$''soundname$'.formants.data" 'soundname$' 'label$' 'nstep' 'wtime'
        for nf from 1 to max_number_of_formants
          # COMMENT: consider here to use the command track... for diphhongs
          #          This will strongly depend on the analised sound.. put it in the future version with options
          vf = Get value at time... 'nf' ptime Hertz linear
          if vf = undefined
            if vlogfile > 1
              fileappend "'workdir$''logfile$'"  0.0
            endif
            fileappend "'workdir$''soundname$'.formants.data"  0.0
          else
            if vlogfile > 1
              fileappend "'workdir$''logfile$'"  'vf'
            endif
            fileappend "'workdir$''soundname$'.formants.data"  'vf'
          endif
        endfor
        if vlogfile > 1
          fileappend "'workdir$''logfile$'" 'newline$'
        endif
        fileappend "'workdir$''soundname$'.formants.data" 'newline$'
########################################################################
# 2 inner loops END HERE
########################################################################
      endfor
#
      Remove
      select Sound window
      Remove
        # Gnuplot require 2 newlines to separate data blocks
        if data_format == 1
          fileappend "'workdir$''soundname$'.formants.data" 'newline$' 'newline$'
        endif
#
    endif
########################################################################
# End loop on intervals
########################################################################

  endfor  

########################################################################
# End loop on sound files
########################################################################

endfor

Yep!
How to use it.

I will post next few scripts and some examples. (I will need to record myself for a good copyleft case. sorry Was not prepared to this.)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.