June 14, 2017

Data Is Plural — 2017.06.14 edition

Supreme Court transcripts, corporate prosecutions, business owners, antibiotic resistance, and Los Angeles pot dispensaries.

 

Supreme Court transcripts. Oyez.org bills itself as, among other things, “a complete and authoritative source for all of the [Supreme] Court’s audio since the installation of a recording system in October 1955.” The site has an API and releases all its material — including timestamped transcripts of oral arguments — under a Creative Commons license. A least two GitHub repositories have aggregated the transcripts and make them easy to bulk-download. For each segment of audio, the transcripts list the start/end time, the speaker, and the text. Related: PuppyJusticeAutomated, a YouTube channel that (a) must be seen to be understood and (b) uses the Oyez API. Previously: CourtListener (DIP 2016.04.13) and The Supreme Court Database (DIP 2016.02.23). [h/t Walker Boyle + Reddit user 21cannons]

 

Federal corporate prosecutions. Last week, the University of Virginia School of Law launched an expanded version of its Corporate Prosecution Registry. The revamped database includes “detailed information about every federal organizational prosecution since 2001, as well as deferred and non-prosecution agreements with organizations since 1990” — more than 3,000 cases so far. Previously: Good Jobs First’s Violation Tracker (DIP 2015.11.11). [h/t Tom Jackman]

 

Business owners. The Census Bureau’s Survey of Business Owners and Self-Employed Persons “provides the only comprehensive, regularly collected source of information on selected economic and demographic characteristics for businesses and business owners by gender, ethnicity, race, and veteran status.” The most recent data comes from 2012. The survey has been conducted every five years since 1972, but data from before 1992 is “available only in printed form.” Related:30% Of The Black-Owned Businesses In New York Disappeared In 5 Years,” by my colleague Cora Lewis.

 

Antibiotic resistance. ResistoMap is an interactive visualization of antibiotic drug resistance, based on more than 1,500 bacteria genome samples from people’s intestinal tracts. The data behind the visualization is available to download. It’s partly based on two prior datasets: McMaster University’s Comprehensive Antibiotic Resistance Database (“a bioinformatic database of resistance genes, their products and associated phenotypes”) and the University of Gothenburg’s BacMet (“an easy-to-use bioinformatics resource of antibacterial biocide- and metal-resistance genes”). [h/t Carlos Somohano]

 

L.A. pot dispensaries. The Los Angeles City Controller has released a map of the city’s openly-operating medical marijuana businesses. You can access a spreadsheet of the 191 dispensaries that comply with Proposition D, which the city passed in 2013. Additionally, you can find hundreds of (active and inactive) dispensaries by filtering the city’s business registrations to those whose primary NAICS category is listed as “medical marijuana collective.” [h/t Zack Quaintance]

 

Clarification: Last week’s serious workplace injuries dataset reflects "federal OSHA states only.” It excludes “injuries in state plans," which cover private sector employees in 21 states.

 

Dataset suggestions? Criticism? Praise? Send far-out feedback to jsvine@gmail.com, or just reply to this email. Looking for past datasets? This spreadsheet contains them all.

 

View this edition your browser.