top of page

Machine Perception of music

Justin Dong

     Music, an art form almost as ancient as humanity itself, possesses the unique ability to evoke potent emotions despite lacking a visual component. From early humans finding harmony by striking sticks against walls, to the sophisticated piano compositions of classical era musicians, music has consistently demonstrated its power to stir a spectrum of emotions, from joy and nostalgia to melancholy and awe. Yet, despite the unifying nature of music, preferences for it are not universal and often vary from person to person. This paper aims to explore the intersection of music and machine learning, investigating how technology can illuminate the patterns behind our individual music preferences. To delve into the distinction between human and machine processing of music, it's important to explore the primary objectives of using computers and machine learning in this field. The three main topics - computer accompaniment, music understanding, and music synthesis - all have intricacies requiring distinct approaches, but they converge in demonstrating how computers process music. For this paper, my primary focus will be on exploring computer accompaniment and music understanding.

          Computer accompaniment (think electronic pianos and soundboards) primarily uses an algorithmic approach to process and output music. This involves using signal processing techniques to transform musical pieces into a sequence of symbols, and then estimating a pitch. In practice, this approach works really well in replicating acoustic music and providing accurate accompaniment. However, it also highlights the inherent simplicity of machine perception of music. While computers excel at executing predefined algorithms and processing precise musical features, their understanding of the complexities of music remains limited. They primarily rely on mathematical representations, which can capture certain aspects of music but may fall short of grasping the full depth of human musical perception.

          The simplicity of machine classification is demonstrated in a (somewhat outdated) study done on machine attempts at style classification. In 1999, a team of students and professors at Carnegie Mellon University attempted to use advanced machine learning algorithms to recognize music classifications, such as “frantic” or “slow” music. By decomposing musical compositions into low-level features such as note counts, pitch, duration, and more, the researchers were able to train classifiers to yield around a 98% success rate in correctly identifying music styles based on four simple styles. At first glance the project may seem to demonstrate the proficiency of machine music recognition - but we must also consider some of the limitations realized by this experiment. The four styles recognized by the machine  were very simple and consistent throughout - meaning  the computer was unable to recognize tonal shifts, something humans can do with near-perfect certainty.  Additionally, styles of music are one of the more simple concepts in music categorization - how would a computer approach classifying higher-level concepts such as music genres? Despite the historical limitations of computer classification of abstract music concepts, modern techniques have produced far better results which show how much potential the field holds. In a study  published in the International Research Journal of Engineering and Technology, researchers used signal- processing techniques to model sound patterns in a 2d spectrogram. They then used a convolutional  neural network, a model widely used for image processing, to classify these representations of sound as an image. Using this method, the model achieved a high accuracy rate of 88.54%, beating out other feature-based models such as logistical regression models and simple neural networks by over 20%.

          Even with modern methods achieving more accurate results, it is important to acknowledge that even the most complex neural networks we can construct right now are still far from perceiving music with the complexity at which humans perceive music. Whereas humans are inherently capable of having  complex preferences, feeling ranges of emotions to different songs, and creating meaningful music, ma- chines still require training and tuning to imitate human perception. The reasoning for this is that  music perception to humans is a uniquely human experience - everyone has their own music preferences that are affected by an unknown magnitude of factors. While machines can externally model music perception, it would be near impossible to create an accurate predictor that can completely generalize human music perception due to how individualistic music is to us.  

          Still, that isn’t to say machine learning will never provide interesting insights on the topic. Ongo- ing research in the field of music processing has been making great strides in bridging the gap between  human and machine perception. Advanced machine learning techniques such as deep learning hold great potential to provide us with a better understanding of music perception. By leveraging the strengths of both humans and machines, we can expect to witness exciting developments that push the boundaries of musical creativity and deepen our understanding of this universal art form.


Dannenberg, R. “[PDF] Artificial Intelligence , Machine Learning , and Music Understanding: Seman- tic Scholar.” [PDF] Artificial Intelligence , Machine Learning , and Music Understanding — Semantic  Scholar, 1 Jan. 1970,


Agrawal, Raghav. “Music Genre Classification Project Using Machine Learning Techniques.” Analyt- ics Vidhya, 6 Apr. 2022, using-machine-learning-techniques/.  


Chillara, Snigdha, Kavithanbsp; A S, et al. “Music Genre Classification Using Machine Learning Algo- rithms: A Comparison.” Https://Www.Irjet.Net/, May 2019, V6I5174.pdf.

bottom of page