Connecting the Leetcode Valid Parentheses Problem to Parsing MusicXML files

While exploring stacks and their connection to music, I have learned that stacks are great data structures to parse files and can be applied to parse MusicXML files. MusicXML is a standard open format for exchanging digital sheet music. MusicXML was based primarily on two academic music formats to capture how music is formatted on the printed page. MusicXML differs from MIDI, another music notation interchange format, in that MIDI is useful for performance applications, but not musical notation. MIDI does not represent stem direction, beams, repeats, slurs, measures, and many other aspects of notation.

Learning Data Structures and Algorithms with Music is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

XML is a system-neutral Internet format for representing structured data. Unlike HyperText Markup Language (HTML), XML forces users to adhere to strict data-models and provides syntactical error-checking and emphasizes elements of content. XML is a useful data model for music because it is designed to represent complex, structured data, which makes it a good fit for musical scores. The nice thing about XML is that there are XML parsers everywhere so there is no need to write a low-level parser to read the language.

XML parsers can be utilized to parse MusicXML files. And what data structure do they utilize? Stacks! XML syntax must have the following:

  • All XML elements must have a closing tag.

  • XML tags are case-sensitive.

  • All XML elements must be properly nested.

  • All XML documents must have a root element.

  • Attribute values must always be quoted.

XML parsers utilize the stack structure because of the requirement that they are properly nested and have a closing tag. In XML, the word between the angle brackets is called a “tag name” because it identifies what type of element it is. Here is an example of the MusicXML language:

<note>
  <pitch>
    <step>C</step>  ← Must close step before pitch
  </pitch>          ← Must close pitch before note
</note>             ← Must close note

Nesting and closing tags are the things that can be checked with a parser using a stack because stacks have a Last In, First Out (LIFO) property that can check for the closing elements in the order of the most recent hierarchical element. The universal stack algorithm is:

  1. If there is an opening element: push to stack

  2. If there is a closing element: pop and check if it matches

  3. At the end of input: stack should be empty

This algorithm is good for XML parsers and it’s also good for the Leetcode problem, Valid Parentheses. Parsers that utilize stacks are useful for programming languages, such as Python and HTML, and mathematical expressions. I’ll walk through the connections between the Valid Parentheses Leetcode problem and parsing MusicXML.

Subscribe now

Le’t’s start with the Valid Parentheses Leetcode problem. The problem asks, given a string containing just parentheses, brackets, and braces, determine if the input string is valid. It describes what valid means:

  1. Open items must be closed by the same type of item.

  2. Open items must be closed in the correct order.

  3. Every closed item has a corresponding open item of the same type.

The trick here is noticing the data structure that fits the structure of a valid string. Here, to close by the same type and to close in the correct order is exactly the syntax needed for the XML language, too, which is nested. The best data structure to handle this nested syntax is a stack.

In order to implement the stack, we need a dictionary that maps the opening items to the closing items. For this problem, we can define the dictionary explicitly.

pair_dict = {'(': ')', '{':'}', '[':']'}

One gotcha is that we will use the pop() function to check the mapping, but anytime you use pop(), you have to check if the list is empty before you pop() it. Python’s pop() function will throw an error if a list is empty when you try to pop() it. Given the universal stack algorithm, here is a solution to the Valid Parentheses Leetcode problem:

class Solution:
    def isValid(self, s: str) -> bool:
        stack_list = []
        pair_dict = {'(': ')', '{':'}', '[':']'}
        opening_brackets = pair_dict.keys()
    
        for char in s:
            if char in opening_brackets:
                stack_list.append(char)   
            else:
                if len(stack_list) == 0:
                    return False 

                check = stack_list.pop()
                
                if char == pair_dict[check]:
                    pass
                else:
                    return False
        
        return len(stack_list) == 0

Let’s walk through this solution. First, I initialize the stack and define the dictionary of matching items. I pull the opening items by using keys(). Then I loop through the string and if any element of the string matches the opening items, I push the element using append() to the stack. If you start out with a closing item in the string, then the item won’t be appended and the stack is empty, resulting in a False because the string is not valid. If you encounter a closing item in the string that’s not the first element, then we pop() an element from the stack. This element from the stack will be the last element because we aren’t using an index with the pop() function. I use the dictionary to check to see if the last element of the stack, which is an opening element, maps to the closing element that we have encountered in the string. If it does map, then we continue on to the next element in the string. If it is a nested structure, then it should map. If it doesn’t map to the closing element, then the string is not valid and I return False. Lastly, after looping through the string, I double check to see if the stack is empty. If the stack is empty, then the string is valid.

What I love about this Leetcode problem is how it’s connected to a real-world issue that we encounter frequently and is inherent in coding languages like Python. Let’s see how this Leetcode problem translates into practice using musical notation as an example.

Subscribe now

When I want to convert sheet music of interest to MusicXML data, I use the open source software Audiveris. Audiveris is free open-source Optical Music Recognition software that can convert scanned sheet music into editable MusicXML. Audiveris detects musical elements such as notes, clefs, rests, time signatures, key signatures, and lyrics. The MusicXML data can then be read into the music21 Python library for further processing. To demonstrate the process, I downloaded “Old McDonald” from a website that offers free sheet music, Easy Sheet Music. I fed this into Audiveris and it provided to me the corresponding XML file with the MusicXML data that I called ‘oldmcdonald.xml.’ This is the file that we can read into music21:

from music21 import *

oldmcd = converter.parse('oldmcdonald.xml')  
oldmcd.show()

This is perfect! Music21 uses a built-in XML parser in Python, which uses a stack data structure to iterate over the elements and read in the musical notation. If there are any errors, it will raise a ParseError. Some programs may still read the file, but it will be incomplete. Therefore, parsers such as these are important for catching bugs.

The only difference with this XML parser algorithm from the Leetcode solution that I shared is that we need to update the matching dictionary to tags. Since we may not know ahead of time what the tag names will be, we need to use regular expression (regex) functions to pull tag names according to a certain pattern. The goal is to check that the tag names match. Here, the main pattern that we want to match is:

  • < : we want the mandatory opening angle bracket

  • : we want to capture the slash after < indicating a closing tag, if it exists

  • we need letters indicating the tag name

  • drop the other stuff we don’t care about

  • : we want the mandatory closing angle bracket

Subscribe now

First, I read the entire XML file using Python’s built-in open() function as a string that I call ‘content’:

with open('oldmcdonald.xml', 'r') as file:
    content = file.read()

Reading the entire file as a string will make it easier to use regex on it. Then, I use the findall() function where I pass the pattern and content to find all of the patterns that I want to match.

all_tags = re.findall(r'<(/?)([a-zA-Z]+)[\s>]', content) 

This regex pattern says the following:

  1. r: tells regex to treat it as a “raw string” which is helpful for regex patterns.

  2. (/?): tells regex that the forward slash is optional, but capture it as a group as either blank or with the forward slash. We indicate that we want this group using the parentheses.

  3. ([a-zA-Z]+): tells regex that we want a group that is capturing letters (lower case and upper case). The ‘plus’ sign tells regex that we want one or more of these letters. This group is the tag name.

  4. [\s>]: tells regex to capture whitespace (\s) or the closing angle bracket (>). The brackets tell regex to match either whitespace OR the closing angle bracket.

Here are a few lines of all_tags which is a list of tuples:

For this problem, what we really care about is the presence of the optional closing bracket. Once we enumerate all of the pattern matches, we can loop through it and see if the pattern is an opening tag name or a closing tag name, by checking the optional group. Then we go through the same algorithm that we used with Leetcode, except that instead of matching the pair in the dictionary, we can simply match the tag name that we are pushing to the stack because the tag name is the same regardless of whether it is an opening tag or a closing tag.

Here is an implementation that takes all_tags as input:

def validateMusicXML(all_tags): 
    stack_list = [] 
    for i, name in enumerate(all_tags): 
        if name[0] == '': # then it's an opening tag 
            if name[1] == '': # if the opening is empty too - nothing to parse 
                raise IndexError('The parsing is off') 
            else: stack_list.append(name[1]) # push the tag name 
        else: # first double check to see if the stack is already empty before you try to pop it 
            if len(stack_list) == 0: 
                raise IndexError('The parsing is off') 
            else: # pop and check 
                check = stack_list.pop() 
                if check == name[1]: # if they are the same 
                    pass 
                else: 
                    raise IndexError('The parsing is off')
    
    if len(stack_list) != 0:
        raise IndexError('The parsing is off')
    else:   
        return 'The parsing works!'     

Here, name is the tuple. We don’t need the index i. Just like before, we check if it’s an opening item using name[0]. If name[0] is blank, then it’s an opening item. I do a second check - if the tag name is empty (name[1] == '' ), I throw an error because an empty file is not valid in this implementation. If there is a tag name with the opening item, I push it to the stack. Otherwise, if it’s a closing item (which we know because there will be a slash ‘\’ in the first element of the tuple), first, I double check to make sure the stack isn’t empty. Then, I pop() the item from the stack and check it against the tag name (name[1]). If the names match, then the loop continues forward. If not, there is an error. Lastly, after the loop is complete, I check to make sure that the stack is empty.

So there you have it! The Leetcode question, Valid Parentheses, can be related to the common problem of parsing music notation files, and nested files in general!

Learning Data Structures and Algorithms with Music is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.