Transcript#
This transcript was generated automatically and may contain errors.
Hey there, welcome to the Paws at Data Science Hangout. I'm Libby Herron, and this is a recording of our weekly community call that happens every Thursday at 12 p.m. U.S. Eastern Time. If you are not joining us live, you miss out on the amazing chat that's going on. So find the link in the description where you can add our call to your calendar and come hang out with the most supportive, friendly, and funny data community you'll ever experience.
I'm so excited to move on then and introduce our featured leaders today at the Hangout. We are talking to Kyle Austin, Principal Programmer Analyst at Thermo Fisher Scientific, and Martin Brown, Statistical Science Director at PPD. They both work at PPD, which is a part of Thermo Fisher Scientific, and we will get into what that means. PPD is the CRO, or Contract Research Organization, arm of Thermo Fisher Scientific.
Introducing Kyle Austin
So I would love, Kyle, if you could introduce yourself for us. Hi, I'm Kyle Austin. I'm a Principal Programmer Analyst. I've been in the clinical research game for about 15 years and 20 days. 15 years, my 15-year anniversary at PPD. I started there as an intern. I am now a principal. I've grown with, my career has grown alongside the company, which has been fantastic. I have a bachelor's degree in mathematics and statistics, and I was going to get one in computer science, but then I got offered this wonderful job that I have been in since I've been out of college.
I started off doing clinical programming. I have transitioned into a more school development role over the last five years. I originally did clinical trial data and analysis. I would do everything from every bit of things that I could possibly do, from data reports at the beginning to developing in STTM specs, STTMs, working on Atom, table listings, figures, all that good stuff. And I've transitioned now into a role where I am doing more tool development. I'm working more with the backend of developing tools for our users in-house as well as externally.
As far as for fun, I am a gigantic nerd across the board. I have almost 2000 comics. I collect those. I have a pull box. I read, I read everything, books and comic books. I play video games. I used to play a lot of D&D. I don't as much anymore, but I definitely, I have a board game closet behind me. Yeah, I am just your blanket nerd across the board.
Introducing Martin Brown
My name is Martin Brown. I'm a statistical science director, based in the UK. Very similar to Kyle, I started here as a graduate, so 18 sort of years ago and yeah, I haven't got rid of me yet, which is great. I'm a statistician. I did maths and statistics at the university and yeah, was looking for work. And unfortunately this job came up sort of at the last, last minute, really somebody dropped out. So I then interviewed on the Thursday and started the following Monday, which was crazy quick.
Over the sort of more recent past been part of statistical science where we support people internally, externally on more complex stats sort of issues. And then part of my role is then sort of involved looking after the stats side of software. So I've worked with a lot of different sort of software types and yeah, sort of worked my way finally sort of back to R and got involved in that.
Yeah, so I love Lord of the Rings. So I can't sort of leave that out. I've sort of read it, I've listened to it, I've watched it. So I've seen the stage show. I really like running cause yeah, it sort of gets me out of this office and now my screen. I did the London marathon a couple of weeks ago, which was amazing.
What is a CRO?
So yeah, the purpose of a CRO is they are hired out by pharmaceutical companies to do the number crunching. It's the simplest way to put it, do the number crunching, do all the research, but to ensure that the statistical methods and everything are being applied. It starts with something called the statistical analysis plan, which is a SAP, goes from there. You build out your study data's analyzed.
The biggest thing I always told my kids is what do I do for a living? I make sure it's those. Like when you watch commercials and you see all those adverse events, they list off, those lists are provided by people like me and Martin. The biggest things we have to make sure that we get everything together and submit it to the FDA for approval. That is the biggest part of our job is taking drugs or any other thing.
The purpose of CRO is that we are independently contracted by companies to do the research, analyze it, and then provide all of the statistical analysis at the end to decide, yes, this is safe and effective, or the big things, it's safety and efficacy. Depending on there's phases of clinical trials, depending on what phase you're working on, it may be, are we looking for safety? We're looking for efficacy.
Kyle's sort of gone through the biostats side of things in terms of being contracted to do the analysis and the design, but yeah it can go from anything from looking after the sites, looking after the safety, doing the data management pieces, writing the clinical study report. So anything to do with clinical trials can be sort of contracted out to us. And then the sort of big thing about being part of Thermo Fisher is actually we can sort of do everything end to end. So we can even give all of the medicines that are required for the trial, we can give all the equipment.
Types of data and day-to-day work
This is the joy of this the CRO, rather than a farmer, for example, that essentially, we've got data going around all the time. And with any given day, you could be jumping between so many different studies, particularly sort of mine, where I'm sort of consulting more on the stats for these different studies. So yeah, so basically, in terms of data types, there'll be the basic sets of data for any clinical trial, as Karl mentioned, sort of adverse events, lab data, demographics. So you get a standard set of data across all of those studies.
But then depending on the indication and the trial area, then your other data sets might look different. So for asthma, for example, then you might be sort of looking at yeah, how much people are breathing out versus something like sort of oncology, where yeah, you're looking at tumour growth and sorts of bits and pieces. So yeah, so we have the great joy, really, of not being bored. So there'll always be sort of something different to take a look at and think, how best do we sort of analyse this particular data?
I work on dashboard development on two different dashboards that we have in-house, two different products we offer, one with Martin, one without Martin. But they use the same base of an STTM, which is standard transformation, standard data transformation mapping, which is important. And it's a standard, so we can utilise that to make dashboards that we can utilise across all of our studies.
I helped build one of the dashboards we have for that, which is intense. It's a lot because it's multiple layers. We have this large lasagna plot we were building. Another application Martin and I are currently building is, we're using R Shiny. So it's definitely seeing the two different build styles I've worked on, many different, I think at this point, I'm up to six or seven programming languages I work on within a given week. SAS, R, Phoebe.net, Iron Python, et cetera, SQL.
It gets to a point where I think at this point in my career, I have so many tools in different languages, I definitely, syntax is fun. So I like to tell my kids, you know, languages are fun. Don't worry about memorising syntax, worry about knowing how the structure flows through that language. The advice I'll give to anybody who's into programming, it's more or less the same, but you always got to remember, do you start at zero or one for counting, which is a fun piece.
Making the case for open source
I always think the big sort of positive really is having more, isn't it? SAS is great, and SAS has a load of great parts to it that, yeah, for a long, long time you can see that a lot's gone into it, particularly in terms of statistical procs. It's very solid and really, really good. So yeah, partly, yeah, but there are going to be things and ways that the SAS isn't going to do so well that other things and newer coding and things are going to do better and be more flexible about.
So definitely as a CRO, definitely taking on different types of projects and as things seem to get wider and go out in different directions, that's where being multilingual is totally important. And I've seen in cases where yeah, I've created things that are far better in R than in SAS, and then other cases where yeah, it could be the other way around and SAS could be the best thing to use.
So I think if yeah, if you're looking at it really narrowly and you're looking at just sort of specific static tabular outputs, for example, then maybe there's not so much difference, but that's not where we're going to be. It's not where we are now where yeah, the sorts of ways of digesting data is getting bigger and bigger. So we need the tools to use in different situations to be able to pivot, to be able to understand and take things.
So I think if yeah, if you're looking at it really narrowly and you're looking at just sort of specific static tabular outputs, for example, then maybe there's not so much difference, but that's not where we're going to be. It's not where we are now where yeah, the sorts of ways of digesting data is getting bigger and bigger. So we need the tools to use in different situations to be able to pivot, to be able to understand and take things.
Upskilling and transitioning into pharma
To get into statistical programming, you don't necessarily have to have a background in pharma. You don't. There are definitely differences. I obviously have always lived in this world and known this world. From the programming side of it, it's not that bad, I don't think. I started as an intern here and I was brought up in the company. I was taught the company ways. I learned CROs from a CRO.
Yeah, there should be no concern about moving across from somewhere else. Because as we've been saying, like the skills, yeah, are very much the same, really. That being a good statistician, particularly, it shouldn't really matter what your data is. And being able to look at your data, get into it, understand the context, treat it properly, do all the normal stuff in terms of getting to know it initially first, and then sort of moving into sort of planning your analysis and all those sorts of things.
I suppose in terms of upskilling, I suppose the key thing is just making sure you sort of understand more about the pharma industry, understand more about clinical trials, and the phases and how it all works. I think that's far easier now than it was when I first interviewed and they gave me a little pamphlet sort of telling me about things. Yeah, it's so easy to sort of go out there and see all about clinical trials.
A lot of these things you can do completely on your own, which is amazing. You can yeah, download sort of RStudio and simulate some data and sort of get going. So yeah, use all these things that are there for you. And I think if you go into an interview and say, I've been doing this sort of on my own time and people are going to be impressed.
Data governance and client separation
From a governance standpoint, our system is locked down. We have a very, very, very strict policy. Obviously, there's blind and unblind people who don't know. Blinded data is you have no idea who's on what treatment. Unblinded data is we actually know who's on treatment. And that alone is a very like totally separate silos, storage areas locked down based on access.
If I'm a lead on a study, I'm only going to have access to this area and I can grant access to folks who have access to certain study areas. We have an independent system that we have, our BTI is what we call it, but it's where all of the data is stored locally. Where all the data comes in, it gets imported, exported, all of our coding is done in there, but we do it by client, by study. Every single area is locked down.
Unblinded access is handled in a completely different way. There's something called a URAF, which is a form that specifically states these 10 people are allowed to have access to this data. And only those 10 people will be allowed to have access to the data. So it's, from a governance standpoint, that's very easy. We do that. We just lock our data down.
Data quality across the clinical trial process
Yeah, so the amazing thing about data quality is that, yeah, it's a hugely great sort of area to really sort of improve all the way through the entire clinical trial process. So you can come at it right at the beginning and sort of think to really sort of improve data quality, then your design has got to be good, instructions, the clinical team carrying out things properly. Because as the data comes in, you can do lots of exploratory analysis to see are there issues here.
When I started as an intern, I used to do DM reports, which is like the basic thing of like data comes in, we're writing out reports, which is surfacing things. I mean, it's anything from checking for, are there any UTF-8 characters in this data? Is that going to be problematic? I used to work on things called missing pages reports where ECRFs are something that's created where we have this major thing. You make these missing pages reports and it's going, hey, these people didn't finish out, didn't fill out these certain forms.
I know one of the dashboards I work on, we service STTM and that data is looked at by our medical monitors. And then they go back and they'll review things like, okay, cool. We're seeing this, we're seeing patterns here. We're seeing an issue with this site. We have to go back and retrain. So there's definitely lots of tools that we've developed in house that we utilize across all phases.
Now on study startup, a lot more of the team is involved when they're building out those CRFs, which I think is fantastic. I know I've been personally on studies at startup where it typically wasn't where the programmers and the statisticians were part of that conversation. And now they are, which is crucial, I think, because it helps make sure that from the offset, you're setting these studies up with the knowledge base of everybody.
What is Teal?
Yeah, when I was sort of getting into R and trying to sort of learn it more, I'd sort of heard about R Shiny and how brilliant it would be in the space of creating all of our outputs. So a lot with R outputs, you sort of create one type of table, and then you often then create a million repeats for different reasons. Say you want a different subgroup with the same table, or you want to look at serious adverse events rather than adverse events, so you end up sort of recreating lots and lots of repeats.
With R Shiny dashboards, it was that real thought that we can go interactive with this, and then we don't need to sort of create so many sort of outputs and it becomes interactive. But the joy was that sort of Roche at the time had been developing the Teal package and ecosystem to do sort of shiny programming and doing it from a pharma perspective. Being able to use this ecosystem to then create particular dashboards that, yeah, create our clinical outputs in different ways, in a modular system, being able to pull in data, being able to filter by it.
So a lot of the work then that I was sort of doing and trying to think how do I then create my app to do it for millions of different studies, Teal had sorted out sort of for me. So it started out from Roche and it's become a great example of a cross pharma, yeah, sort of package and sets of packages. So it sort of takes a lot of the work out from somebody wanting to get started.
So if you wanted to just create a very simple application from scratch, you can, yeah, quickly loading clinical data and then you can sort of take some of the modules which are rebuilt and then quickly create sort of simple summaries, simple figures, but then it's also flexible enough that you can then use the ecosystem to create your own modules. So that's what Carl and I are sort of doing right at the moment for sort of creating a particular dashboard out of the Teal ecosystem and looking to sort of share that with clients.
Career advice
R. If you want to get into this industry now, learn R. SAS will be helpful, but I'm just saying. And I know because when I was in school 14 years ago, R was for academics. So I ignored it. I focused on SAS path because that was the career. Now, mind you, knowing both is great. Learn R, learn Python, learn SAS for the empathetic value of being able to work with other people who know SAS as well.
I think, yeah, get excited about data and hopefully everybody on this call is excited about data and that's why they're sort of here. It's sort of the best time to be either a statistician or a data scientist or a programmer. It's an amazing time to be alive in terms of what's available to us. So it's really just throwing yourself into that.
So it's really trying to think, what am I interested by? What part of that really excites me? And throw yourself into it. Ask questions, come to things like this. Just give yourself projects. So I think, yeah, some project that you're excited to do, if you had to do it on your own time or otherwise, yeah, just throw yourself into that and be sort of taken along with the ride and you'll meet amazing people and yeah, you'll have fun.
It's sort of the best time to be either a statistician or a data scientist or a programmer. It's an amazing time to be alive in terms of what's available to us.