Building open source tools in Pharma | Kyle Austin & Martin Brown | Data Science Hangout

Transcript#

This transcript was generated automatically and may contain errors.

Hey there, welcome to the Paws at Data Science Hangout. I'm Libby Herron, and this is a recording of our weekly community call that happens every Thursday at 12 p.m. U.S. Eastern Time. If you are not joining us live, you miss out on the amazing chat that's going on. So find the link in the description where you can add our call to your calendar and come hang out with the most supportive, friendly, and funny data community you'll ever experience.

I'm so excited to move on then and introduce our featured leaders today at the Hangout. We are talking to Kyle Austin, Principal Programmer Analyst at Thermo Fisher Scientific, and Martin Brown, Statistical Science Director at PPD. They both work at PPD, which is a part of Thermo Fisher Scientific, and we will get into what that means. PPD is the CRO, or Contract Research Organization, arm of Thermo Fisher Scientific.

So I think if yeah, if you're looking at it really narrowly and you're looking at just sort of specific static tabular outputs, for example, then maybe there's not so much difference, but that's not where we're going to be. It's not where we are now where yeah, the sorts of ways of digesting data is getting bigger and bigger. So we need the tools to use in different situations to be able to pivot, to be able to understand and take things.

Upskilling and transitioning into pharma

To get into statistical programming, you don't necessarily have to have a background in pharma. You don't. There are definitely differences. I obviously have always lived in this world and known this world. From the programming side of it, it's not that bad, I don't think. I started as an intern here and I was brought up in the company. I was taught the company ways. I learned CROs from a CRO.

Yeah, there should be no concern about moving across from somewhere else. Because as we've been saying, like the skills, yeah, are very much the same, really. That being a good statistician, particularly, it shouldn't really matter what your data is. And being able to look at your data, get into it, understand the context, treat it properly, do all the normal stuff in terms of getting to know it initially first, and then sort of moving into sort of planning your analysis and all those sorts of things.

I suppose in terms of upskilling, I suppose the key thing is just making sure you sort of understand more about the pharma industry, understand more about clinical trials, and the phases and how it all works. I think that's far easier now than it was when I first interviewed and they gave me a little pamphlet sort of telling me about things. Yeah, it's so easy to sort of go out there and see all about clinical trials.

A lot of these things you can do completely on your own, which is amazing. You can yeah, download sort of RStudio and simulate some data and sort of get going. So yeah, use all these things that are there for you. And I think if you go into an interview and say, I've been doing this sort of on my own time and people are going to be impressed.

Data governance and client separation

From a governance standpoint, our system is locked down. We have a very, very, very strict policy. Obviously, there's blind and unblind people who don't know. Blinded data is you have no idea who's on what treatment. Unblinded data is we actually know who's on treatment. And that alone is a very like totally separate silos, storage areas locked down based on access.

If I'm a lead on a study, I'm only going to have access to this area and I can grant access to folks who have access to certain study areas. We have an independent system that we have, our BTI is what we call it, but it's where all of the data is stored locally. Where all the data comes in, it gets imported, exported, all of our coding is done in there, but we do it by client, by study. Every single area is locked down.

Unblinded access is handled in a completely different way. There's something called a URAF, which is a form that specifically states these 10 people are allowed to have access to this data. And only those 10 people will be allowed to have access to the data. So it's, from a governance standpoint, that's very easy. We do that. We just lock our data down.

Data quality across the clinical trial process

Yeah, so the amazing thing about data quality is that, yeah, it's a hugely great sort of area to really sort of improve all the way through the entire clinical trial process. So you can come at it right at the beginning and sort of think to really sort of improve data quality, then your design has got to be good, instructions, the clinical team carrying out things properly. Because as the data comes in, you can do lots of exploratory analysis to see are there issues here.

This is a sort of an amazing area in clinical monitoring where trying to do, trying to look at your data, trying to understand the quality, sort of from all the way from sort of detecting fraud, if fraud is occurring, all the way to just improving your processes as the trial sort of goes on.

When I started as an intern, I used to do DM reports, which is like the basic thing of like data comes in, we're writing out reports, which is surfacing things. I mean, it's anything from checking for, are there any UTF-8 characters in this data? Is that going to be problematic? I used to work on things called missing pages reports where ECRFs are something that's created where we have this major thing. You make these missing pages reports and it's going, hey, these people didn't finish out, didn't fill out these certain forms.

I know one of the dashboards I work on, we service STTM and that data is looked at by our medical monitors. And then they go back and they'll review things like, okay, cool. We're seeing this, we're seeing patterns here. We're seeing an issue with this site. We have to go back and retrain. So there's definitely lots of tools that we've developed in house that we utilize across all phases.

Now on study startup, a lot more of the team is involved when they're building out those CRFs, which I think is fantastic. I know I've been personally on studies at startup where it typically wasn't where the programmers and the statisticians were part of that conversation. And now they are, which is crucial, I think, because it helps make sure that from the offset, you're setting these studies up with the knowledge base of everybody.

What is Teal?

Yeah, when I was sort of getting into R and trying to sort of learn it more, I'd sort of heard about R Shiny and how brilliant it would be in the space of creating all of our outputs. So a lot with R outputs, you sort of create one type of table, and then you often then create a million repeats for different reasons. Say you want a different subgroup with the same table, or you want to look at serious adverse events rather than adverse events, so you end up sort of recreating lots and lots of repeats.

With R Shiny dashboards, it was that real thought that we can go interactive with this, and then we don't need to sort of create so many sort of outputs and it becomes interactive. But the joy was that sort of Roche at the time had been developing the Teal package and ecosystem to do sort of shiny programming and doing it from a pharma perspective. Being able to use this ecosystem to then create particular dashboards that, yeah, create our clinical outputs in different ways, in a modular system, being able to pull in data, being able to filter by it.

So a lot of the work then that I was sort of doing and trying to think how do I then create my app to do it for millions of different studies, Teal had sorted out sort of for me. So it started out from Roche and it's become a great example of a cross pharma, yeah, sort of package and sets of packages. So it sort of takes a lot of the work out from somebody wanting to get started.

So if you wanted to just create a very simple application from scratch, you can, yeah, quickly loading clinical data and then you can sort of take some of the modules which are rebuilt and then quickly create sort of simple summaries, simple figures, but then it's also flexible enough that you can then use the ecosystem to create your own modules. So that's what Carl and I are sort of doing right at the moment for sort of creating a particular dashboard out of the Teal ecosystem and looking to sort of share that with clients.

Career advice

R. If you want to get into this industry now, learn R. SAS will be helpful, but I'm just saying. And I know because when I was in school 14 years ago, R was for academics. So I ignored it. I focused on SAS path because that was the career. Now, mind you, knowing both is great. Learn R, learn Python, learn SAS for the empathetic value of being able to work with other people who know SAS as well.

I think, yeah, get excited about data and hopefully everybody on this call is excited about data and that's why they're sort of here. It's sort of the best time to be either a statistician or a data scientist or a programmer. It's an amazing time to be alive in terms of what's available to us. So it's really just throwing yourself into that.

So it's really trying to think, what am I interested by? What part of that really excites me? And throw yourself into it. Ask questions, come to things like this. Just give yourself projects. So I think, yeah, some project that you're excited to do, if you had to do it on your own time or otherwise, yeah, just throw yourself into that and be sort of taken along with the ride and you'll meet amazing people and yeah, you'll have fun.

It's sort of the best time to be either a statistician or a data scientist or a programmer. It's an amazing time to be alive in terms of what's available to us.

Building open source tools in Pharma | Kyle Austin & Martin Brown | Data Science Hangout

Transcript#

Introducing Kyle Austin

Introducing Martin Brown

What is a CRO?

Types of data and day-to-day work

Making the case for open source

Upskilling and transitioning into pharma

Data governance and client separation

Data quality across the clinical trial process

What is Teal?

Career advice