Wednesday, July 31, 2013

Hackathons, R Studio, and Batman Villain Word Clouds

A couple weeks ago (months ago? What year is it? Summer is weird) I attended a Humanities-focused Hackathon at the Wisconsin Institute for Discovery [my friend Andrew and I are pictured, even! We're the poster children of humanities hackers!]. It was super interesting and a lot of fun, and I made some neat word clouds that I figure I'd share with y'all.



Process: We used R Studio to manipulate the information we brought/information from the internet into more concrete visualizations. This'll become clearer as I describe what I did, so I'll just jump in.

One of the skills we learned was how to aggregate information off of Wikipedia, especially the kind that's in chart form, such as episode listings. Because I am me, I decided to draw information from the List of Batman: The Animated Series episodes and the List of The Batman episodes. Since both articles list the villains that appear in each episode, and because the villains are, like, the best part of Batman, I decided to analyse this data. I didn't have any specific thesis in mind; I was just playing around with the data and seeing what I came up with!

So I compiled the data, cleaned it up a little, and create four word clouds. A quick word on reading word clouds: The larger the word, the more often it appears on a list. Words of the same color appear roughly the same number of times. The order/shape of the cloud is randomly generated and has no meaning. Names are often broken up (Harley doesn't appear right next to Quinn because they are two separate words, and I haven't learn how to connect them yet...). The first two clouds were made with the same parameters ("show x many names", etc.)

1) Frequency of Appearance in Batman: The Animated Series episodes
So this is a visual representation of the names that appeared most often on the Villains column of the B:TAS Wikipedia article.

No surprises: The Joker is the most common villain! I thought it was weird that the Penguin was the next most popular, but it makes sense in industrial context. B:TAS was made due to the popularity of the Tim Burton Batman movies, the second of which, of course, starred the Penguin.

Surprises: Rupert Thorne? Really? If I had to make a guess, I'd say he appears a lot because he's a mob boss who funds a lot of other villains, and he shows up often when the Gotham City PD are around, because they can take care of him.

You might notice that Catwoman isn't even ON this word cloud, because she appears in so few episodes that she didn't make the cut. I was totally surprised by this, because like the Penguin, she was part of the second Tim Burton flick, so why doesn't she appear more prominently? ALSO, my childhood recollection of the show is that she shows up all the time, although this might be a bias on my part. I mean, I loved Catwoman, so if I thought Catwoman would show up in an episode, I definitely stuck around. Whereas if I thought it was a Penguin episode, meh, maybe I'll go outside or something.

HOWEVER. This cloud is misleading; Catwoman DOES appear in episodes where she isn't LISTED on Wikipedia as a villain, such as "The Cat and the Claw, Part 2"and "Cat Scratch Fever." I wanted to point that out as being an issue with the source data. In future projects, I'd prefer to aggregate my own information, so I can control for that sort of thing.

2) Frequency of Appearance in The Batman episodes
OH HAI JOKER. Back again, eh? Well, that's not a surprise! Joker is Batman's kyptonite. Or his.. Lex Luthor? And look who else is back! It's The Penguin! This is pretty insane to me- is the Penguin really that important of a Batman villain? Or is he just considered kid-friendly enough for the cartoons?

Punch and Judy appear larger too; they are the Joker's mute henchpeople. They don't HAVE voice actors because they never speak on screen, but the appear often because the Joker appears often. Maybe should exclude them from the list, since they don't actually contribute much? If I had a thesis, maybe they'd get excluded.

Hugo Strange looms large, that's weird. Catwoman warrants an appearance, but I'm not familiar enough with The Batman to make a judgement about that.

An interesting note: so TB was on air at the same time as the first two Nolan-Batman films, and yet there doesn't seem to be the same kind of cross-pollination of villains between the two. The first film focused on some plain-clothes villain, with Ra's al Ghul appearing, and the second film focused on the Joker and Two Face. However, neither Ra's al Ghul NOR Two Face show up on the cloud. Bizarre!

Future Dream Project: create a thorough listing of ALL the Batman villains appearing across medium (comic books, movies, cartoons, video games) based on the time period they were created, and compare the rise and fall of each.

A note on the two word clouds: B:TAS had 109 episodes. The Batman has 65 episodes. Yet if you look on the word clouds, B:TAS has a small number of very prominent villains, while The Batman has a large number of villains that appear multiple times. Given the number of episodes they had, I would assume it'd be the opposite- B:TAS would have a wider variety of villains and TB would be more limited. What I think is happening is that TB re-used its villains more often, while B:TAS had a stock stable of very popular villains, supplemented with a lot of one- or two-shot villains, which diluted the numbers.

3) Compared Frequency of Appearance in both B:TAS and TB combined
As stated, this word cloud concatenates the villain frequency from both shows into one list. As expected, The Joker and Penguin are the most popular! Oh, and Poison Ivy's looking pretty good! She wasn't the most popular on either list, but with the two combined, she makes a pretty decent showing. Clayface, Killer Croc, the Riddler, and Harley Quinn show up a decent amount of times.

And then there are a couple weird names on the list- Maxie Zeus appears, like, three times total between the two. BUT he does appear in both shows, which is why he gets listed. Same for Tony Zucco, I believe. It doesn't really mean they're popular, just that they happen to appear in both episodes, so they get an honorable mention.

4) Contrasted Frequency of Appearance in both B:TAS and TB combined



This word cloud is a little weirder. What is gathered here is a contrast of appearances. The larger the font of the names, the more often they appeared in one show but not the other. FOR EXAMPLE, Punch and Judy are HUGE on this cloud because they appear frequently in TB, but NOT AT ALL in B:TAS. Rupert Thorne showed up a lot in B:TAS, but only once in TB (I think he's killed off? By gravity or something, not by Batman).

Lex Luthor and Mercy Graves both appear a couple times in TB as a donation from Superman's rogue gallery. Harley Quinn appears far more often in B:TAS, which kind-of makes sense, since she was created for it. Two Face appears MUCH more often in B:TAS, which points back to what I said re: TB/Nolan-Batman crossovers.

Really, there's not much more to say about this chart. It's pretty straightforward.


SO, that's a sampling of some of the super cool things I can now do with R. And this is just, like, the most basic stuff that they taught us. It took me about 10 minutes to make all of these word clouds (not counting "how do I get rid of 'the'"-based troubleshooting). It's important to note, of course, that though this is an amazing tool towards creating more scientific, numbers-based analysis of cultural objects, it still requires a lot of knowledge of the object itself, and a lot of brain-powered analysis to make sense of it.

BUT it's cool because I got to make sweet graphs!

No comments:

Post a Comment