Ross talks about using Google’s AI suite of natural language processing and machine learning APIs in order to read every article a journalist has ever created and to write content they care about….all using a robot to help us run programmatic PR campaigns at scale.
[Announcer] A big applause please for Ross!
So, my name is Ross Tavendale. I'm the managing director of Type A Media and we're known in the industry for doing four day work weeks. Because we do four day work weeks we need to kinda find a lot in our efficiencies and start documenting all of those processes.
So let's get into how we do that for our week. So, quick caveat about today's talk. I seen this great Tweet by a guy who said, "If it's in Python, it's machine learning, "if it's in a deck, it's probably AI." Obviously he's being facetious, but he's saying that it's not actually AI it's a machine learning the actual language process.
And reading from the top, it would be, "Using natural language process "and machine learning to extract entities, "classify content and craft better pitches "to programatically reach out to journalists at scale."
However, that's a bit of a mouthful, so we just have to use AI to make sense for journalists. Typically we've been doing quite well over the last few years. I've interviewed about 42 PR people and hired about 10 of them. And it's brought me to a place where I've got a lot of opinions about the PR industry as a whole.
The biggest one is I think PR people are kinda full of shit to a degree. Any PR people in the room? Fuck, okay, sorry. You're good, shit not bad shit. That's the PR response to that one. And I'll tell you why, I never hear this in an interview. No one will ever come up to me and say, "I can get you results regardless "of the vertical I'm working in." Never happens. But what they do say is . The journalists I know are great at relationships, that's all they say, every single on of them. But, we are link builders, not really PR.
Does anyone say they're a link builder, and not PR?
One, thank you for that. We're a dying breed and it's kinda sad. But listen, from a PR point of view right, so we take your fifty contacts you've got amazing relationships with, and let's say we pitch every single one of them, and all of them come back and say, "We love you, you're amazing, have the right." And this is the BBC, The Journal and The Guardian. What do we do now? We've just burned your entire list and all your contacts.
Month two, can't get another campaign started so it's kinda useless. I really don't think that traditional PR is actually set up for link building, I don't really think digital PR is set up for it either.
Here's why, so Gorkana is a media data base so it's essentially giving pitch presences for journalists but it also gives you the updates to their jobs that they are moving to week to week. And in a year there is about five thousand new just in the UK alone.
And what we're seeing is more and more people are going freelance and they're working monthly beats. And when we say monthly beats we mean, perhaps you were assigned as a fashion journalist but now people freelance more to broaden their scope. You're going into lifestyle, and you speak about fashion, travel, food, etc. Also, I was looking a couple of journalists in the Metro, and anyone here work for the Metro? Good! They're more topical copywriters than they are journalists at the Metro.
So, if you were looking at this woman's page and she was like "Oh what's happening in "Corrie this week?" And it's top ten weeks to do home beauty, and then it's the political situation in the Gaza Strip. And I'm like, "That's quite a broad amount "of copying to do." But not really journalist.
One of them actually came back to us. So we give them some data, and we give them fresh lead, so they're already to go. And I sent him a follow up, and he said, "I'm not here to do your job for you, "write me that article." This is an editor in a mainstream magazine. And I'm like, "PR doesn't write the news, "you write the news." I thought that was really weird.
But, we hire journalists and turn them into PRs, and when they come to talk to us they say like, "Yeah man, I used to do like six to ten articles "on a daily basis. "And I'd get pitched "about one hundred times every single week." These aren't editors by the way, these are just general writers for these places.
So, the biggest complaint that we had was all these pitches that we're getting, none of them fit our beat. And their beat is like what their topic is that they actually talk about.
But with all these changes it's actually pretty much impossible to get a beat for freelance. So there's tools like Gorkana that try to bridge the gap. But these is what Gorkana looks like. It's a time warp to 1990. It's absolutely disgusting. So we've got here 25,000 journalists we'd consider in the UK alone. And we need to deep dive into that to find the perfect one for to hit the right beat. Even if we drill into that we look at fashion, there's still 4,000 people working consumer fashion in the UK. And after drill in, we've got fashion, is it male fashion, female fashion, female fashion okay, is it high street, is it runway, what is it? And then we put that in different age groups. To actually find the right person absolutely difficult. So we've got some big problems!
Relationship depth, don't have it. We can't really do an outreach scale, and we don't have good pitch accuracy. Welcome to . No never mind. So we're looking around in a room with everyone in the company, and we're kinda racking our brains. We've got a data scientist, we've got a developer, we've got our PRs, and we're like, "There must be an easy solution "to matching this sort of stuff."
So, Jack, our data scientist pipes up, and he's like, "I've got it, it's actually really simple. "All you need to do is compare our press release "to every article that's ever been created "in the history of the universe." And I'm like, "Ludicrous!" Just like, what a fucking ridiculous thing to say to someone but he's like, "Oh no actually, Google can already do it."
There's something called Radar which is robots, and developers, and reporters, daytime reporters you'd say. And what they're doing is using natural language processing machine learning, taking data sets, and making news for local press, all with robots. They're using something called Cloud Natural Language in order to do that. And that's something I'm going to be talking to you today about. The cool thing with Cloud Natural Language is it extracts entities for you.
And now, if you're not super technical in API imagine if you've got a press release, with just a bunch of topics and things we could actually pull out and categorize. It also gives a cealine score, which is essentially a ranking. So if I'm releasing pitches about Brighton SU, it would be Brighton, ranking. SU, ranking. It would be Kelvin, ranking. And so on and so forth. So let's have a little example here.
A gentleman with a beard. I can pick one, any beards, the guy in the fourth row. Dude, you're looking like my person. What's your name, with the beard? Name? Come on dude, I'm just gonna call you Pete the Pirate. I'm calling you Pete the Pirate. So, Pete the Pirate, at the back there, he would write about treasure, he'd write about wenches, he would write about beards, and he would write about rum. Good solid pirate based stuff. And he'd write a bunch of these different articles on these topics, and then an editor would come and say, "Well actually, beards are really hot "in the pirate community right now, "write more about beards, lots about beards."
Any women with a hat on, no? What's you're name?
[Jasmine] Jasmine.
Jasmine? Jasmine the hat lady, she writes about hats, glasses, jobs, and pay. And her editor comes to her and says, "Actually, the topic about jobs is really hot right now, "you need to write more about jobs, jobs, jobs." So like, her entity starts changing toward jobs.
Any men over fifty? No, okay, we'll call him Bald Bill. So Bald Bill writes about business, he writes about millennials and complains about them. Also jobs and that's meant to say ED education or ED something. But he's over fifty so who knows? But then his editor comes to him and says, "Write more about business, business, business." So his entity becomes business.
And so on and so forth for all these people. What we can then do is we can start to stack these journalists up based on the major entities they talk about across every single article that they're writing about.
So I can get an activated write score of the major entities they talk about. Which is kinda cool. So if I was to write a press release, let's say for example it was about this event, Study Reviews Drinking at work events damages your job prospects. I imagine a lot of you probably have a couple pints at lunch, and you're about to go to the after party as well. This is a relatively absent thing to write about. I find if we run that through Natural Language Processing, I'm going to get a bunch of different entities coming up. I'm gonna get alcohol, I'm gonna rum, whiskey, eventually vomit, it's gonna go all the way through different things like lane manager, we're gonna be looking at working at home, and then potentially getting married.
So, if I was to compare that press release to all these journalists, who is the best fit? Is it Pete the Pirate, who talks about rum and whisky? Well technically it is right? Because alcohol is quite big in that press release, actually, the nuance of that press release is actually about work and alcohol. So although the old way of categorizing and finding journalists, which is topically, would make that correct, that's a terrible pitch.
Because that's not Pete the Pirate's beat. Pete the Pirate talks about being a pirate! He just happens to mention rum and whiskey. The best person is actually the person at the end, which I'll call Becky O'vay, or this one. The reason being she talks about whiskey, and rum and politics at work. So just because she's not directly involved with drinking in the workplace, when we take all of her different entities, with all of her different rank scores, she's the one who comes out on top.
Okay, that's fine. So we've run this through Google sheets, we took a journalist from the Guardian, and we took their top fifty articles and ran it through entity analysis, and then took the idelated score. I got kind of excited because you've got football and score, I'm like, "Probably a football journalist." And I'm a little bit kinda annoyed because my axis is like Midlands and stuff like that and the axis is probably, the technology is probably just not there yet.
But, when we realize the journalist is actually a football writer who covers the Midlands and I got really fucking excited, I'm like, "Oh my God, this is actually a perfect "viewpoint of what this person is writing about." So, in terms of actually getting entities for the journalists, that's easy.
The hard bit is to do it with every single article that's every been written. So, we tried to interview because, you know, what else are you gonna to do when you're afraid you're off right? So, we tried to download ten years of the Guardian.
Now, I'm an ex U so I started with streaming frog and how to custom extraction. We tried to use Python to fill out but it was proving to be very slow. Because the Guardian are absolutely lovely, they have an open API. You can download ten years of the Guardian content completely for free, and it pipes out with beautiful structured data. So we literally downloaded ten years of the Guardian in about a week. So that's pretty cool right?
You use news API's to get all of this content out. So there must be more API's than new. So I was looking the Globe and Mail, and the Sun, and all that sort of stuff, Then I came across something called News API, and I might as well tell you what News API does. It's an API that one can use.
News API is Google users API which gets me 30,000 news publications globally. And this is what the output looks like. So paint it and pull down one of these apples, and it gets me the author, and it gets me the content. This is beautiful for me because what I can then do is I can get author, imagine a spreadsheet, cell one, Author, cell two, Content.
Now, I'm running Natural Language Processing and I'm getting the first key word and it's ranking, the second key word and it's ranking, third, fourth, fifth, and I can do that 50 times for one article. Next article, all the key words and their ranking. I can do that for every single article journalist has ever done, and it's literally a push of a button. And then I can take an aggregated view of that and see exactly what that guy is writing about.
So Jack was like, "We could probably "download all this stuff." And I was like, "Cool, just do it." He's like, "Probably cost you about a quarter "of a million pounds to do it." And I'm like, "We're doing well "but we're not doing that fucking well, "so probably not, any way we can "do it for kinda half of this?"
And he's like, "Ah, if we take the "top 15 articles from every journalist, "essentially which is just over "a quarter million journalists, "we could probably do it for about 15 grand."
I'm like, "Do that then." So, he done it, and we've now got a database of just over a quarter million journalists, with all of their entities and all of their ranking.
And I found out something called the Event Registry does the same thing and it costs about 3 dollars a month. That hurts, I looked into it. They don't allow you to get any entities out of it, it's not really useful for journalists as such, but the raw data is kinda of in there in some form or another and it's really cheaper. So if you're in PR or you're into any media analysis, I'd highly recommend having a look at Event Registry.
Aright so, new problem, we've got all the journalists, and all the entities, but we still need something so you can match it to all these guys. So, how are we gonna do it? Jack pipes up again and he's like, "Football manager." And I'm like, "You're definitely going nuts. "Football manager, what we talking about now?" And he's like, "Well, think about it! "You have these radar diagrams like this right? "And it's like their speed, their arches, their performance, "and stuff like that. "And another person over here. "It's essentially just the entities with the ranking. "So entity one, three, four, five, "we can put the press release "on top of all of the journalists, "we can kinda match it like that." I'm like, "Okay, that's pretty cool. "How's that gonna look?"
We can use something call elastic search. Now elastic search is just normal faceted search but you can put a weighted ranking to an interview and in layman's terms that means you can take a key word and give it a ranking. And we have a database filled with key words, with rankings connected to journalists. So we can put that together really, really easily.
I then asked him, "All right man, can I get "a front end on this thing, so that my PR guys "can actually use it instead of like coming "to pipe into this thing all the time?" He's like, "Yeah, sure, leave it go me."
Comes back a week later and he's got a picture of a cat on it and he called it Needle. And I'm like, "This guy is definitely high or something. "What's the, what's the cat with the cape about? "The little thing?" And he's like, "Oh well, it's called Needle because "finding a journalist," and I swear to God he said this to me, "is like finding a needle in a haystack." and I was like, "Beautiful, beautiful, amazing."
So this is not a public tool, we're not selling it as anything like that, but we do need help to train the machine learning side of it, so if you want access to our data base for free to run your own campaigns, all you need to do is answer a bunch of questions for me just so we can make it a little bit better.
Just email me, [email protected], and I'll get you access to it so you can run your own campaigns. I'll also send you all the Google sheets as well. So, hopefully that will help you kinda of get into a place where you can save a bunch of time and match better the journalist you're pitchng, and also one day you can also get to a place where you can also work a four day week.
Thank you very much.