# transcript4.txt

In this video, we are going to be talking about sampling. Sampling is extremely important in order to gather data. So throughout this video, we are going to be talking about a sample of planning, sample methods and sample distributions. And say, these are our three very important in order to gather reliable data. And so our data is only as good as our methods. And how we plan out to gather our sources and what our distribution looks like. And so if you've ever heard that expression, that statistics can say whatever you want to say, that is due to poor sampling planning methods and distribution of the data. So we're going to cover a couple of issues and possible ways around that. So before we move forward, we need to go through a couple of the basic vocabulary terms. So a population is the set of individuals or other entities to which we wish to be able to generalize our findings. Our target population as a set of elements larger than our different from the population that were sampled. And to which the researcher would like to generalize any study findings. A sample is a subset of elements from the population. And elements are just the people within the population or the units were, then whichever type doesn't have to be people that we are trying to investigate, right? We have a, what we call a sampling frame. This is the list from which the elements of the population are selected. Our sampling unit is any simple unit sample from the population. We have primary settling sampling units and secondary sampling units, primary sampling units. Quintiles selecting in the first stage of the sample, and secondary, and fog selecting in the second stage, the sample. And so with this, we my pick a preliminary group. And then we would pick maybe our primary group out of a particular group. And so those would be the units from which we will be pulling from sample generalizability, we essentially want to generalize. So when we just talked about the z-score video, the whole point of the normal bell curve is to be able to generalize. And so if we have a poor sample, then we can not need the assumptions based on the bell curve. Set of sample generalizability is the ability to generalize from my side subset, which is our sample of a larger proportion to that population itself. Cross population generalizability or external validity is the ability to generalize from findings about one group, population or setting to other groups, populations or settings. And so this is our ultimate goal, would be to do u cross population generalizability. That typically just being able to get a sample generalizability is still very, very difficult to get. The ultimate goal would be to do cross population. All right, and so in order to do this, we need a representative sample. A representative sample is a sample that looks like the population from which it was selected and all respects that are potentially relevant to the study. So this could be a number of variables such as gender or socio-economic status or economic economy values such as income levels are, they are employed or not. And the distribution or characteristics among the elements, I have a representative sample that's the same as the distribution at those characteristics among the total population. And so we would hope to have people who are representative and the number that we would hope to see in the population. And so if we have a overall poor population, then we would want our sample to be overall poor. If we have a overall population with a bachelor's degree, then we would want to have a sample that has an overall standpoint population of a bachelor's degree and hopes to have a representative of the larger population as a whole. And an unrepresentative sample seminal characteristics are over-represented or under. In it. And so this is something that we do typically see and we are able to deal with them with statistics, but using much more advanced levels and we'll probably get through in this class. So our goal though, for this class, is to learn about how to get representative samples. And hopes that we don't have to worry about having to do more complicated statistics. So we will almost always have sampling error. Sampling error is any difference between the characteristics of a sample and the characteristics of the population from which it was drawn. The larger the sampling error, the less representative populate the sample is of the population. And so sampling error is very important because if our error is too large, then our data, it is useless to them. We're not able to make any generalizations whatsoever. And so the methods for collecting is crucial in order to make generalizations that we have very small sampling errors. So we have two different methods for general, for sampling. So we have probability sampling. This is a method that allows us to know in advance how likely it is that any element of a population will be selected for the sample. And so there are a couple of probability selections that we could use. So the likelihood that an element will be selected from the population for inclusion in the sample. In a census, the probability of selection is one because everyone will be selected. And so, you know, this isn't necessarily to the census does try to reach everybody, but that's not necessarily able to. An example of a probability, selection. Probability can be done with replacement or without replacement, would be that I had a teacher one time who had everybody in the cup hit their names and whoever was withdrawn from the cup had to actually teach the lecture for the day. And so everybody had to be prepared and to teach the particular lecture on that particular day if their name was pulled. And so that person's name went back into the pot every single time. So we did actually have one student that was pulled three times. And so you also have to consider, are you going to allow for the same person to be counted twice or not? And would that you're sampling loss. Random selection. Every element of the population has a known and independent chance of being selected. There are a number of forms, non-probability sampling as well that we will talk about after we finished talking about probability sampling. Probability sampling allows researchers to select steady subjects to be statistically representative. Populations they want to learn about. To generalize to the larger the sample and the more homogeneous the population, the more confident we can have in a sample is representative. And so there are methods to figuring out how many is needed. And your study. Because even though the book say that the larger the sample, the better, that is not always true because if we have an extremely large sample, then the ions that we are going to find something significant is increased. And so we also, I want to have a homogeneous population because then we are able to narrow down the elements that we're wanting to study, where it may not be the other variables which are playing an influence and whatever element is that we're trying to study. And so we just have to be very careful when making sure that we do have the right amount of homogeneous this and be sized sample that we need to be effective. All right, So the major types are simple random sampling, systematic random sampling, cluster sampling, and stratified random sampling. All right, so simple random sampling identifies cases strictly on the basis of chance. Id, a random number table or a random digit dialing. And so there are actually programs out there that we can put in, say the, for it to pick four random numbers. And so if we know the area that we're want to study and we know the area code and the other information, then we would the style that information plus whatever random numbers come out. And so that will give us our general location of where those people are located. That it would be random. And so another example of this is in Excel, and which if you use that function five or six times, then you may have a random sample. So if you do it multiple times, it might not be random. There's some talk about how it uses the formula for its randomness. But typically it's a way of getting our random number. It can be done whether or that replacement kind of similar to what I was talking about with that cup. So that eventually after her third time of being picked, she was actually no longer and the cup and so she was saved from being sampled again. So replacement sampling is where she or her name was put into the cup each time. And so this allowed for her to be picked multiple times. Each element, so and replace my soulmate and replacement sampling. Each element is returned to the sample frame from which it is selected so that it may be sampled. Again. Systematic random sampling. The first element is selected randomly from a list, and then every element thereafter is selected. And so to calculate the sample, the sampling interval, you take the total number of cases in the population and divide it by the number of cases required for the sample. So let's say that we have a population of 750 students and I wanted to randomly sample a 150 of them. And so I would need to go through, and I would start with number one. And so let's say that I would pick the first data and then I would go pick the fifth, it, and then I pick the 10 and the 15 and so on and so forth. And so you had to be careful with this and that you would want the 750 to be randomized before you start to do your systematic random sampling. Otherwise, you may not really have a true random sample. So identify the first case to be selected in number within the sampling interval is randomly selected and you still select the first case. And so you may want to start in the middle, or you may want to start at the end or somewhere in between. And so essentially, you just got to really consider how big your population is and how representative you want it to be for picking that first case. And the selection as ON cases, every nth case, the selected where n is the sampling interval. It's very similar to like we're talking about where with the 750 students and we don't want to sample a 150, you then pick every fifth student to be into my sample population. So systematic random sampling is almost all sampling situations. Systematic random sampling yields what is essentially a random sample except in populations or sampling frames. With yesterday said this simply means that the sequence of elements of the population varies in some regular, periodic pattern. And so what that's talking about would be if we had say a whole lot females and we put them in rank order where half the female and male, then we may end up with more females are more male then would be represented within the population. Or say, if we put them in order GPA. And then we want mistake GPA randomly. Again. If we put them in systematic order, then it would not be representative of that it will no longer be a random sample. So you have to make sure that they're not in a regular periodic pattern, that you need to randomize them. Even if you do select randomly a starting point, It's still need to worry about randomization. So stratified random sampling uses information known about the total population prior to sampling to make the process more efficient, it is a two-step process. First, we need to distinguish all elements in the population. I eat the sampling frame according to their value on some relevant characteristic. That characteristic forums, the sample strata. Each element must belong to one and only one stratum. And then from there, we take the sample elements randomly from web. Then each strata are level or layer of groups. Alright, and so an example of this would be if, again, I was wanting to be able to figure out my 150 students, and I wanted to make sure that I only had each did at one time. I would look at a 750 students and see maybe what time they had a cup class. And so maybe I have the number that I need an eight o'clock classes. And then I would randomly pick one person from each eight o'clock class to be able to Part of the study. And so from there, I might end up with, so maybe we have more than 750 students from the eight o'clock class. Then maybe we would break them up into doing a stratified version of it and says, then, we can go back to the systematic random sampling and pick those people based on just the number. And so it would be pretty random. And so stratified random sampling is often used when we are trying to particularly study schools and classrooms. And the big question here would be our students to take 08:00 AM classes statistically different than students who maybe take APM classes. And so you would still have to consider that when you're wanting to do your random sampling. And so maybe you would have a layer CJ students. And then from there, you could pick one student from every criminal justice class that is taught. And that would be a pretty random level, particularly if you did it without. In the back end says that it cannot only be drawn one time and therefore not represented again. And so that's what this is talking about here. The HL on that must belong to one and only when stratum. And so we'd had to pick a student from each class. There was only in minutes that they can only be represented one time. If we represent had been multiple times, then they would have an extra employ lens on our sampling process. And so then it would no longer necessarily need be bell curve. So we just have to be very careful with that. So proportionally ratified trampling, each strata would be represented exactly in proportion to its size and the population. And so what this is talking about is, if we know, let's save with the criminal justice students that we know that it is 5050 male and female, then we would want to make sure that our sample consists of 50 emails, pretty percent males, and 50 percent the mouth. And if we know that there is maybe 30 percent is freshmen and 20 percent in software, then we would want to make sure that we have 40 percent of our population and 30 percent as freshmen. And so that way we would have the proportions the same. And so we would even go even further to make sure that, you know, of the 50 and 30, then we would have 15 females and 15 mouths. And the Freshman population, if we're going for a 100. And so that is essentially what we're talking about. We're proportionate ratified sampling. So disproportionate. I'd sampling is the proportion of each stratum that is included in the sample and is intentionally varied from y is in the population. The probability a selection of every case is known, but equal between stratas. Useful when researchers need to make sure that small groups are included. And so here, if we have a disproportionate stratified sampling, we look at race between criminal justice students. And so we have very few minorities within the criminal justice population. And so we would want to make sure that minorities are included in order to get a more diverse sample. That with very few, Wade, I must have to include each of them into the sample to get it to be representative. And including people of different groups. So an example would be when my criminology classes, so I taught to criminology classes last semester. And the 70 students, only one was African-American. And so. I wanted to include an African-American into my population, then I would have to probably use this at work, disproportionate ratified sampling. Otherwise, the odds of having an African American in the population would be very slim. Say, multi-stage cluster sampling, sampling in which elements are selected and two or more stages. And so here we have two different clusters. Said the first cluster, mixed groups of elements of the population. Each element appears in one and only one cluster at a time. A random sample of clusters is selected. Within clusters. Each. So like cluster, a random sample of elements is selected. And so an example that this may be that when we do assessments, we will take every student who is in a particular class. So an example would be my criminology class. And so that is a sample of criminal justice duets. And from that sample, they are tasked with writing a end of the term paper. And so within those students who actually complete the paper, then I am able to use that data to assess their progress within the program. And so the second set of clusters would be the only ones who have followed through with the program. So if they transferred later or something along those lines, then they will not be with them. The second cluster. And we would only start tracking metadata when that class starts. The multi-stage cluster sampling maximizes the number of clusters, liked it, and minimizes the number of individuals within each cluster. Useful when I sample frame is not available or too expensive to cover. And so here we would maybe use, again, the fact that we, we can't really be very expensive to reach out and try and get every criminal justice major to be able to take a survey or to be able to do a a, a like we were just talking about that, to be able to do an assessment. And so this is a random assessment that we do with multistage clustering as far as also we have it divided and to where students are assessed at freshman, sophomore, junior, and senior levels. And so it would be hard to get everybody who is a freshman and everyone who is a sophomore and everyone who's a junior and senior to actually be able to do these particular activities. And so this multistage clustering allows for us to get an idea of how students are progressing without actually being able to reach the whole population because it would just be too difficult to have everybody do the same assessment every single year. So non-probability sampling, each member of a population has an unknown probability of being selected. This is useful when you cannot obtain probability sampling or you have a small population, or it is a exploratory study. Four types are availability, quota, judgment, and snowball. So we're going to talk about each of these availability sampling elements are slighted because they are available, are easily defined. Also known as haphazard, accidental or convenience sampling. And so like when I did my dissertation, I used a convenient sampling. Convenient, so sorry, my dissertation chair always got onto me the way I pronounce that word. And so essentially what that means as teachers have access to students. And so therefore, often times we will interview or survey students because they're readily available to teachers. And so even though they may not be representative of the general population, is a population that we are, we have easily availability to you. So quota sampling elements are selected to ensure that the same. Representative certain characteristics in proportion to their prevalence in the population. So this would be a quota. This is similar to stratify probability sampling that June my less rigorous and precise and selection procedures. The quota sample is representative of the particular characteristics from which one? Why does happens that it is not known if the sample is representative in terms of any other characteristics. And so we might not know about males versus females, but we might know about minority status or things along those lines. And so we might say that we're going to have five females and five African Americans. But we went break it down like we went before. Further, we need three females are three females who are African-American, or three females who are African-American. Or freight three mouth. So essentially the same person could count as both are. So kind of like with other quotas where a minority, who is a female, What kind of need both quotas. And so we're not really sure of it is actually representative of that particular one. And so sometimes you might have a person in the population who might be four or five of the quotas and then, uh, but definitely not be a good representation of what overall sample would look like of the overall population. I, I'd like to propose a sampling. A sample element is selected for our purpose usually because the unique position of the sample elements, many involve studying the entire population of some limited group or a subset of a population. Information should be knowledgeable about the cultural area or experience being studied. Willing to talk representative of the range and points of view. And interview should be conducted and completeness and saturation occurs. And so an example of Venus Wave be a few years ago, I did a study on college student definitions of hazing. And so I wanted students who were in a very particular subsets of a population. So I wanted to make sure that I had students that came from the banding, BAM, students that came from sports, and students that came from sororities and fraternities and things along those lines. And so before I was able to pick these populations, I had to do a lot of research on hazing to be able to understand the different characteristics of what could be considered hazing and what their traditional methods were. Said that I would know which groups to pick. And so I also had to be able to find students who are willing to talk about their definitions of hazing. And so this is a consideration because are the students who are willing to talk about it different from the students who are not. And so, you know, again, this limits our ability to generalize. And so to try and account for this, I decided to actually have a subset of students who are in absolutely no organizations to see if there was a difference between the 1s and no organization and organizations as well. And so doing that, I was able to represent a range of viewpoints. And so the interviewer should we conducted until completeness and saturation occurs. Okay, So in that particular study, I interviewed 20 students that after I was able to analyze and transcribe the first 15, I actually owe ahead to the point of saturation pretty quickly. And so I not actually transcribe the rest of them because I knew that they were going to say the same thing. And so essentially the data was just there really wasn't any reason to keep going because I was hearing the same thing over and over again based on if the student was male or female and if they were in an organization or not. And so those were the two factors that we're constantly playing a role. And so it didn't matter about other ones, It's just males versus females. And any organization or not an organization. Those hit saturation very, very quickly. So snowball sampling, sample elements are selected as they are identified by successive environments. Useful for hard to reach your hard to identify populations for which there is no sampling frame, but the members of which are somewhat interconnected. One problem of this technique is that the internal contacts may shape the entire sample and her close access to some members of the population of interest. Respondents driven sample giving Google gratuities to respond is to recruit pairs. And so is simply this can be very dangerous for researchers. And an example of that is where we have actually studied arm drivers. And so in order to study armed robbers, we have to first find an armed robbery that is willing to talk to us. And so most of them are not going to refer another arm driver to us without some kind of incentive. So let's say we have money involved in there and so on. One of their friends would be able to come in and be able to talk to you and said, Oh, maybe if that friend had more friends that they needed, that would be willing to do it for 20 bucks or whatever it is that you're offering. And then maybe two of their friends would be able to. And so eventually kind of snowballs into a larger population. So another group that we often use, snowballing sampling width would be prostitution. Because again, it is a subgroup that it's very hard to get them to talk to you. But it's also, depending on who you start with. You might have a wide variety or a very narrow variety, rather prostitutes. And so we know that there are differences between street walkers and call girls. And so if we start with a street walker, then our whole study is probably going to be about street walkers. And we would not have the range of population that we would need to cover prostitution and general. And so that is one of the things that we have to be extremely wary of is that snowball sampling. It's hard to reach generalizability with. But it is good for being able to study hard to reach and hard to identify populations. It can also be very dangerous, particularly with giving gratuities because sometimes they become dependent on that money. And so, and one case, a particular person actually tried to robbed at gunpoint. The researcher advocate it as the researcher, not have money on them and the person had not actually giving them a person to speak to them. And so he was not entitled The money either. And so he was robbed at gunpoint to get the money and said, oh, this, again, if you're dealing with native populations, you got to be extremely careful. So random sampling error refers to the difference between a sample statistic and the true value of the population to slow it solely to chance. Sampling distribution is a hypothetical distribution of our random samples that we drawn from a population. So sampling error bias could be random or systematic. Statistics estimates the population, then a sample result is representative of the population using sampling distributions. And so again, we are looking at, is our sample similar to the bell curve. And so if a is similar to the bell curve, then we are more likely to be able to relate it to their representative of the population as a whole. So random versus systematic would also be, you know, if we want to do something on purpose. And so let's say that we wanted to study a person's view on abortion. And so we decided to randomly ask people right as they come out of church, then we would have a huge sampling error because people are probably not going to be honest with us on their views of abortion right after they walk out church. And so that type of error would be pretty major. And our ability to generalize. So the sampling distribution estimates the margin of error and confidence intervals around the sampling statistic. The sampling distribution for many statistics is a normal or bell shape. And so that's why we want to have the bell shape within our sampling. Characteristics. Competence limits are the upper and lower bounds of the confidence intervals and indicate how much confidence we can be placed. And that estimate. Commonly used confidence intervals are ninety five percent, ninety nine percent, and ninety nine, 1.5. So in the social sciences, we recognize that it is very hard for us to hit 99 percent or 99.9%. And so we typically go for 95% and said that tells us that we can be 95 percent sure that X did make a difference and y. And so we would only be wrong about 5% of the time. And so for our types of study, that would be okay. So maybe in a more scientific study is study a medicine, then we would want it to be a higher percentage of times that they are correct and the lower percentage of times that they are incorrect. And so these though, are just the typicals. And the social sciences. We, we recognize that there are just too many interval variables that come into play that we can't really control for. So we accept being wrong 5% of the time as an acceptable level of error. Systematic sampling error refers to depress sense between the sample and population due to some problem with the sampling method. And so here we have to consider the return rate. And so if we want to do a survey, the electronics, then we know that we're probably going to have a very low return rate, which would open up to what is different between the people who did return the surveys and people who did not. And how do we know that they did complete the surveys? And then honest manner. So major sources of systematic error include poor sampling frames. And so here you want to know how many emails that you sent out or how many forums that you sent out in order to, you know, your non-response rate. And so there are formulas that will allow for us to decide if we have too high of a non-response rate. And then there are advanced ways of calculating the non-response rate. And so bet if we have a whole lot of nonresponse rates been, we may not be able to use the data at all. And so if we can't use the data at all or if we have to use a lot more complicated math, then yeah, it's going to be really difficult. So sampling is extremely critical and making sure that you follow sampling methods to being able to do statistics because the sampling error and the like we just talked about, the sampling distribution if we cannot generalize it to the normal curve and we violate too many assumptions, then be statistical output, output that we have is useless. So sampling is vital to being able to use statistics successfully. So some analysis, the level of social life on which the research question is bogus, such as intervals or groups related to units of observation. The cases about which measures actually are obtained in a sample. And so essentially, we just have to be very careful about who, what, when, where, and why we pick the people within our sample or the objects or whatever it happens to be that we are trying to study to comment follies is ecological and read activism. So in the first one, a researcher draws conclusions about individual level processes from group level data. And the other one is the researcher draws conclusions about group level processes from individual level data. And so essentially either one of them could be incorrect, particularly if we do not have a symmetrical bell curve for our population. So it could go either way from the group level data or the individual level data and how we are trying to draw conclusions and our generalizability. And that covers sampling. Thank you.