This website is a research portal for piano rolls digitized from the Stanford Libraries’ collection of over 15,000 piano and organ rolls. As of November 2019, the website includes 456 Welte-Mignon T-100 rolls (colloquially called “red welte” rolls since they typically use red paper). The process of converting images of the piano rolls into MIDI files has been summarized in our paper at the International Society for Music Information Retrieval’s (ISMIR) 2019 conference in Delft, The Netherlands, where it received the best-paper award. The homepage of this website contains a searchable list of the digitized rolls with links to various resources for each roll:
- SW: Searchworks — this is the online card-catalog system of Stanford University. Additional searching of the rolls (particularly rolls that have not yet been digitized) can be done from this website. MARC21 records for the digitized rolls are included in the SUPRA MIDI file repository.
- The images of the piano rolls are accessible in various formats. If you click on the icons in the list on the frontpage, you will be able to view the rolls using a front-end to IIIF-hosted images of the rolls in the Stanford Digital Repository along with a playback button for listening to the roll's music (integrated audio/visual interfaces forthcoming).
- IA links go to pages giving detailed automatic image analysis of the piano roll images, with analyses of edge tears, problematic holes, left/right drift analysis along the length of the rolls (used for straightening rolls when extracting notes).
- There are also links for the digitized rolls in various formats:
- The icons link directly to the raw TIFF images of rolls (usually about 200-800 MB in size). This is the green channel of the full-color TIFF images that are used to extract musical information from the rolls.
- The Mexp links point directly to "expression" MIDI files that are used to generate the audio versions of the rolls.
- The Mraw links point directly to the "raw" MIDI files that are used to create the expression MIDI files. The raw files contain all individual holes on the piano roll, whereas the expression MIDI files merge multiple holes into single notes and interpret the expression holes into note dynamics as well as pedaling.
- The MP4 links point directly to MP4/M4A/AAC compressed audio files generated by software synthesis from the expression MIDI files.
- The MP3 links are similar, but made available for applications that cannot read MP4 files.
In addition to downloading individual digitized files on this website, there are two other ways of getting access to the files as a group:
The SUPRA dataset is available on Github in the pianoroll/SUPRA repository. This repository contains all of the raw and expression MIDI files in their most recent versions.
The SUPRA-RW dataset, version 1.1 is available in the Stanford Digital Repository. This version of the data also includes the WAVE files used to create the compressed audio files found on this website.
The audio files on this website as well as in the SUPRA-RW dataset are exactly aligned with the accompanying MIDI files, making them suitable for studies on score-to-audio alignment. Also see the webpage describing how to use the audio and MIDI files in Sonic Visualiser. In addition, the tick values in the MIDI files refer to pixel rows in source TIFF image, so exact alignment with the scan of the piano roll is also possible (you need to add the metadata value FIRST_HOLE found in the MIDI file to get the exact row; otherwise, the first tick in the MIDI file is in reference to the first musical hole in the image).
Software for converting the TIFF images to raw MIDI files is available in the Github repository pianoroll/roll-image-parser, and the software to convert from raw MIDI to expression MIDI is available in the Github repository pianoroll/midi2exp. Both the software and the extracted MIDI files and audio renderings are licensed under CC BY-NC-SA 4.0. Additional software for processing the MIDI files is available in the pianoroll/midiroll repository, such as converting the Type-1 MIDI files with tempo meta-messages into Type-0 MIDI files with timings embedded in the tick values rather than by tempo.