Praat: Extract data
I post here a very basic script which I use to extract data from some given (stress given) praat data.
It runs on files, folders (directories) and it can accumulate data. It has options to save in in different formats, even if just 2 now with some little controls on the output. It can be scripted for large amount of data.
For new comers praat is a linguistic tool more at praat. (you can try also praat.org which links back here).
The script requires that you segments, or better, have a very nice labeling of the data.
#! /bin/praat ######################################################################## # Script by: Fabio Mariotti & Marion Braendle # License: GPL (Latest GPL version: http://www.gnu.org/copyleft/gpl.html) # CopyRight: Fabio Mariotti & Marion Braendle # $version: 1.0 ######################################################################## # # This script is under development. Use at your own risk. # # The script run on: # - Depending on input option: # - All selected pairs (TextGrid / wav) files # - All files in a directory # - All Files in the given directory and its sub directories # (less tested) # # - It assumes you have tiers with labels # # - It will try to get: # # - Labels on the given tier (empty labels are ignored) # # - You can input regex to select the labels # (not suggested as you can always post-process it) # # - Timing information on the current interval # - namely: start and end points, duration # - including previous, next labels on the tier. # # - Formant information # - Here there are some hardcoded parameters: # - time_step = 0.0025 # - max_number_of_formants = 5 # - window_length = 0.025 # - n_of_points = 10 # - Formants data are given in that points and there is no mid point # evaluation. (please use the script: get_mean_format.sh) # If you need a more controlled setting you should be able to edit this # script and setting the parameters. # # - Output: # - The script will output 2 files for each pair (TextGrid/Wav) with the # same name. In particular these are called: # <basefilename>.formants.data # <basefilename>.timing.data # Any information there is <space> separate # The script does not use any fancy praat tools or parameter, The output, should be # as close as to STD values. # # - post processing: # - Indeed we do use some scripts to get relevant data. # - Check get_mean_format.sh in the next post. # ######################################################################## # Processing notes ######################################################################## # # The directories are processed first and saved in a list as sound # files to be processed. # # I have a new version wich can check other tiers labels but it is still # experimental. On one side it needs some standards. What is in tier 1? # As names are nice but again not standardasied. # # Anything which is not recognised or assumed to be not relevant is # marked as "_". # Within this lines # ">" marks a beginning in general # "<" marks an ending in general # Please note that theese simbols are in principle "space-less". ######################################################################## # # Clear info box clearinfo # ######################################################################## # Intro form ######################################################################## form Produce data files for plotting and Analysis comment Files work directory (All files are created here) text workdir /home/<USER>/<DATA>/ sentence logfile createdataplot.log optionmenu vlogfile 2 option log only activity option create a full log file optionmenu remfile 1 option Remove Files option Append to files optionmenu doplot 1 option Just produce data files option Do draw tracks optionmenu runon 1 option Run on Selected option Run on sound files (and TextGrid) on a directory option Run on sound files (and TextGrid) on a directory and subdirs optionmenu numtier 1 option 1 option 2 option 3 option 4 comment Direcory to search for Sound files (if required) text run_dir /home/<USER>/<DATA>/ comment Term in the name of the segment to analyse or "all" for all sentence terms all boolean Regex 0 ## Use this default for generic vowels match ## sentence terms "^'?(a|e|i|o|u|A|E|I|O|U):?" ## boolean Regex 0 comment Formant data (F:5500/M:5000) positive Maximum_formant_female_male 5500 comment Output format optionmenu data_format 1 option gnuplot option raw text comment advanced options (Use 1.0 now) real segment_margin 1.0 endform # ######################################################################## # Initialization ######################################################################## # printline ### Parameters: # # Window margins for selection wmargin = segment_margin printline Script: Segment window margin: 'wmargin' # Number of points for the output data: actually it is -1 n_of_points = 10 printline Script: Number of points on the output data files: 'n_of_points' # # Formants data time_step = 0.0025 printline Formants: Fixed time step: 'time_step' max_number_of_formants = 5 printline Formants: max number of formants: 'max_number_of_formants' maximum_formant = maximum_formant_female_male printline Formants: Maximun (Hertz) Format (ScriptInput): 'maximum_formant' window_length = 0.025 printline Formants: Window length: 'window_length' preemphasis_from = 50 printline Formants: Preemphasis from: 'preemphasis_from' # # Others soundext$ = "wav" printline Sound File Extension defined in the script is: 'soundext$' textgridext$ = "TextGrid" printline Text Grid File Extension defined in the script is: 'textgridext$' tiernum = numtier printline Tier Number is: 'tiernum' # # Form parameters # # report initialization parameters printline ### Form Parameters: printline Work Directory 'workdir$' printline Log file 'workdir$''logfile$' printline ### EndParameters # ######################################################################## # Start code ######################################################################## # # If asked for remove log file if remfile == 1 filedelete 'workdir$''logfile$' endif # ######################################################################## # Check if the log file exists: ######################################################################## # if fileReadable (workdir$+logfile$) pause The result file 'logfile$' already exists in 'workdir$'! Do you want to overwrite it? filedelete 'workdir$''logfile$' printline Logfile 'workdir$''logfile$' deleted. endif # fileappend "'workdir$''logfile$'" # Log file for CreateDataPlot procedure 'newline$' # ######################################################################## # Process directory selection ######################################################################## # if runon <> 1 # This lns variable tracks the number of files loaded. lns = 0 mdirectory$ = run_dir$ printline processing directories: start from 'run_dir$' # if runon = 2 Create Strings as file list... list 'mdirectory$'*.'soundext$' numberOfFiles = Get number of strings printline Number of files in directory 'mdirectory$' is 'numberOfFiles' for soundfile to numberOfFiles select Strings list soundfilename$ = Get string... soundfile Read from file... 'mdirectory$''soundfilename$' lns = lns + 1 filename$ = left$ (soundfilename$, (length(soundfilename$) - length(soundext$) -1 )) printline FileName 'filename$' soundname$ = filename$ select Sound 'filename$' lsound'lns' = selected("Sound") Read from file... 'mdirectory$''soundname$'.'textgridext$' printline Sound selected 'soundname$' endfor select Strings list Remove else # Create Strings as directory list... dlist 'mdirectory$'* numberOfDirs = Get number of strings realnd = numberOfDirs - 2 printline number of dirs 'numberOfDirs' printline number of Data dirs 'realnd' # for nd from 3 to numberOfDirs select Strings dlist directory$ = Get string... nd printline processing directory 'mdirectory$''directory$' Create Strings as file list... list 'mdirectory$''directory$'/*.'soundext$' numberOfFiles = Get number of strings printline Number of files in directory 'directory$' is 'numberOfFiles' for soundfile to numberOfFiles select Strings list soundfilename$ = Get string... soundfile Read from file... 'mdirectory$''directory$'/'soundfilename$' lns = lns + 1 filename$ = left$ (soundfilename$, (length(soundfilename$) - length(soundext$) -1 )) printline FileName 'filename$' soundname$ = filename$ select Sound 'filename$' lsound'lns' = selected("Sound") Read from file... 'mdirectory$''directory$'/'soundname$'.'textgridext$' printline Sound selected 'soundname$' endfor select Strings list Remove endfor # select Strings dlist Remove endif # # Here we select again all the sound files loaded for ls to lns plus lsound'ls' endfor endif # # ######################################################################## # Store selection ######################################################################## # Get number of sounds ns = numberOfSelected ("Sound") printline Number of selected sounds: 'ns' # Store on a vector selected sounds for i to ns sound'i' = selected ("Sound", i) endfor printline Start running on selected sound files ######################################################################## # Main loop on sound files ######################################################################## printline Start running 'ns' for i to ns # Select i Sound select sound'i' soundname$ = selected$ ("Sound") printline Operating on sound file 'soundname$' # Select i TextGrid EXPECTED to be there printline Requesting TextGrid file 'soundname$' select TextGrid 'soundname$' printline TextGrid file 'soundname$' Selected # Get and process intervals in tier 1 numberOfIntervals = Get number of intervals... tiernum printline Number of intervals for TextGrid file 'soundname$' are: 'numberOfIntervals' if remfile == 1 printline Removing 'workdir$''soundname$'.timing.data printline Removing 'workdir$''soundname$'.formants.data filedelete 'workdir$''soundname$'.timing.data filedelete 'workdir$''soundname$'.formants.data endif fileappend "'workdir$''soundname$'.timing.data" #TimeInfo SoundFile label duration start end 'newline$' ######################################################################## # Secondary loop on intervals ######################################################################## for interval to numberOfIntervals #Select TextGrid to allow finding labels select TextGrid 'soundname$' printline processing interval number 'interval' label$ = Get label of interval... tiernum interval # preint=interval-1 postint=interval+1 if preint = 0 prelabel$=">" else prelabel$ = Get label of interval... tiernum preint prelabel$ = replace$(prelabel$," ","_",0) endif if length(prelabel$) = 0 prelabel$ = "_" endif if postint > numberOfIntervals postlabel$="<" else postlabel$ = Get label of interval... tiernum postint postlabel$ = replace$(postlabel$," ","_",0) endif if length(postlabel$) = 0 postlabel$ = "_" endif # ######################################################################## # Interval label selection ######################################################################## # Process only if label is not empty doterm =0 if regex = 0 if label$ <> "" and terms$ == "all" or terms$ <> "all" and startsWith(label$,terms$)==1 doterm = 1 endif else ireg = index_regex(label$,terms$) printline Query for regex 'terms$' gives 'ireg' if label$ <> "" and terms$ == "all" or terms$ <> "all" and index_regex(label$,terms$)>0 doterm = 1 endif endif # if doterm == 1 printline Processing not empty label 'label$' # if the interval has a not empty label, get its start and end: start = Get starting point... tiernum interval end = Get end point... tiernum interval duration = end - start # Add the Window Margins to wstart = start - wmargin wend = end + wmargin wduration = wend - wstart # Write timing information if vlogfile > 1 fileappend "'workdir$''logfile$'" #TimeInfo SoundFile label duration start end 'newline$' fileappend "'workdir$''logfile$'" Timing 'soundname$' 'label$' 'duration' 'start' 'end' 'newline$' else fileappend "'workdir$''logfile$'" #Writing 'soundname$' timing data for 'label$' at 'start''newline$' endif fileappend "'workdir$''soundname$'.timing.data" 'soundname$' 'label$' 'duration' 'start' 'end' 'prelabel$' 'postlabel$' 'newline$' #fileappend "'workdir$''soundname$'.timing.data" 'soundname$' WWW 'duration' 'start' 'end' pWWW poWWW 'newline$' # Select the sound file and produce formants for the selected window select Sound 'soundname$' # WARN: I have no info on this function I only guess what it does Extract part... wstart wend Rectangular 1 yes Rename... window ### Resample... 10000 50 # Formants analysis on the window select Sound window To Formant (burg)... time_step max_number_of_formants maximum_formant window_length preemphasis_from Rename... formants ### TEST HERE ### #printline TEST IT!!!!!!!!!!!! #Track... 3 550 1650 2750 3850 4950 1 1 1 #Rename... formanttracks # ######################################################################## # Check for plot: We do it here in one line ######################################################################## if doplot > 1 Draw tracks... start end 3200 yes endif # # Determine time step size tstepsize = duration / n_of_points ptime = start if vlogfile > 1 fileappend "'workdir$''logfile$'" #FormantsData SoundFile label nstep wtime FormatsFreqs 'newline$' else fileappend "'workdir$''logfile$'" #Writing 'soundname$' formant data for 'label$' at 'start''newline$' endif if data_format == 1 fileappend "'workdir$''soundname$'.formants.data" #FormantsData SoundFile label nstep wtime FormatsFreqs 'soundname$' 'label$' 'newline$' endif ######################################################################## # Here 2nd inner loops to collect formants data ######################################################################## for nstep from 0 to n_of_points wtime = nstep * tstepsize ptime = ptime + tstepsize if vlogfile > 1 fileappend "'workdir$''logfile$'" Formants 'soundname$' 'label$' 'nstep' 'wtime' endif fileappend "'workdir$''soundname$'.formants.data" 'soundname$' 'label$' 'nstep' 'wtime' for nf from 1 to max_number_of_formants # COMMENT: consider here to use the command track... for diphhongs # This will strongly depend on the analised sound.. put it in the future version with options vf = Get value at time... 'nf' ptime Hertz linear if vf = undefined if vlogfile > 1 fileappend "'workdir$''logfile$'" 0.0 endif fileappend "'workdir$''soundname$'.formants.data" 0.0 else if vlogfile > 1 fileappend "'workdir$''logfile$'" 'vf' endif fileappend "'workdir$''soundname$'.formants.data" 'vf' endif endfor if vlogfile > 1 fileappend "'workdir$''logfile$'" 'newline$' endif fileappend "'workdir$''soundname$'.formants.data" 'newline$' ######################################################################## # 2 inner loops END HERE ######################################################################## endfor # Remove select Sound window Remove # Gnuplot require 2 newlines to separate data blocks if data_format == 1 fileappend "'workdir$''soundname$'.formants.data" 'newline$' 'newline$' endif # endif ######################################################################## # End loop on intervals ######################################################################## endfor ######################################################################## # End loop on sound files ######################################################################## endfor
Yep!
How to use it.
I will post next few scripts and some examples. (I will need to record myself for a good copyleft case. sorry Was not prepared to this.)