Sampling Strategies For Monitoring HIV Risk
As described in the preceding chapter, "Uses of Behavioral Data for Program Evaluation," using repeated surveys to monitor trends in behaviors that put people at risk of contracting or passing on HIV infection and other sexually transmitted infections (STIs) is an important aspect of evaluation strategies for HIV/AIDS prevention and care programs.
Consensus has coalesced around the idea that monitoring risk behaviors ("behavioral surveillance") should be undertaken for both the general population and for selected subpopulations whose behaviors or life circumstances put them at risk for HIV transmission (sex workers, men who have sex with men, injecting drug users, mobile populations) or because of their vulnerability to risk (youth)1. Guidance about undertaking target group surveys and discussions of relevant indicators, questionnaire design, and validity and reliability issues are presented in other chapters of this book. Procedures for undertaking general population HIV risk behavior surveys are described elsewhere2.
The method used to choose respondents or subjects is a crucial aspect of any survey effort. Indeed, the credibility of efforts to monitor trends in HIV/AIDS risk and protective behaviors on the basis of repeated surveys will depend as much as anything on the method(s) used to choose survey subjects. Because most of the groups of interest for HIV target group surveys are difficult to locate or enumerate, sampling presents a formidable challenge.
This chapter provides guidance on sampling for repeated surveys designed to monitor trends on HIV risk-related behaviors in key subpopulations. The chapter begins with a discussion of major sampling issues and problems for target group surveys. Two prototype designs that should cover most target group survey needs are described, followed by illustrative applications of the prototype designs. Sample size requirements for repeated target group surveys are then considered. The final section considers several related survey and sample design issues for sub-population surveys.
Major Issues
An important challenge in conducting meaningful target group surveys is to devise sampling plans that are both feasible and capable of producing unbiased estimates (or, more realistically, estimates with acceptably small levels of bias) for population subgroups that are not easily captured in conventional household surveys. First generation HIV risk behavioral surveys in target groups by and large resorted to informal or non-probability sampling approaches. However, recent efforts have attempted to put risk behavior monitoring on a more solid scientific footing by adopting more rigorous sampling methods.
Perhaps the central sampling issue to be addressed is the desirability and feasibility of using probability sampling methods in such undertakings. The major advantages of probability over non-probability sampling are twofold: It is less prone to bias, and it permits the application of statistical theory to estimate sampling error from the survey data themselves3. It is these features that make probability sampling methods the preferred choice whenever feasible. The major disadvantage is that a list or sampling frame is needed. While there are ways to make the task of developing sampling frames less costly and time consuming, it will nevertheless involve greater time and expense than would the adoption of a sampling approach not requiring a sampling frame.
The primary attraction of non-probability sampling methods is that they are less time consuming and costly to implement. However, there are several important drawbacks. The first is the risk of sampling bias. Where a list of sampling units is not available from which to select a sample following fixed rules, there is the danger that certain types of subjects will be disproportionately included in and others disproportionately excluded from the sample. The second is the issue of replicability, which is of key importance for surveys intended to monitor behavioral trends over time. Where sample selection criteria are not defined in operationally precise terms such that they may be replicated in subsequent survey rounds, there is the danger that measured trends may be confounded by changes in sampling methodology. Finally, there is the problem that because such methods are not driven by statistical theory, there is no objective basis for assessing the precision or reliability of survey estimates.
In the final analysis, the issue reduces to one of the relative importance of "defensible" survey findings for the purposes for which the data are sought. In the event of unexpected findings, the use of non-probability sampling methods may leave a program vulnerable to questions about the "representativeness" or "unbiasedness" of the data. This is not to say that in any particular undertaking, a survey based upon non-probability sampling methods will not produce the same results as a probability survey. There is, however, greater credibility risk associated with the use of non-probability sampling methods.
In view of the need for accurate information on behavioral trends for HIV/AIDS prevention programs, a case may be made for moving from non-probability to probability sampling methods to the extent feasible. However, is probability sampling feasible for the population subgroups of interest for HIV/AIDS programs? Although probability sampling is more demanding, recent experience indicates that with modest levels of technical support, several national HIV/AIDS programs have been able to make the transition to the use of more rigorous sampling.
As will be demonstrated in this chapter, the basic ideas of probability sampling may be extended in a fairly straightforward manner to cover most of the population subgroups of interest for HIV risk behavior surveys. However, the use of probability sampling methods will not be feasible for some types of target groups, notably those whose members do not tend to congregate in fixed locations and for whom it is thus difficult to develop a sampling frame. For such groups, the use of non-probability sampling methods is the only alternative.
As a practical matter, sampling for target group surveys will require:
- the use of different sampling strategies for different target groups;
- the collection of data in non-household settings for most target groups;
- the use of conventional probability sampling approaches in non-conventional ways; and
- the occasional use of non-probability sampling methods (in situations where probability methods are infeasible).
Prototype Sampling Schemes
Two prototype sampling schemes for HIV target group surveys are described in this section. The first is an extension of conventional cluster sampling to groups who are difficult to enumerate4. The second is a more rigorous form of snowball sampling known as targeted sampling5.
Cluster Sample Design
The prototype probability sample design for subpopulation surveys is a two-stage cluster design. Adaptations on the basic design should satisfy the sampling requirements for a majority of target groups surveys.
Defining Clusters
Central to the extension of cluster sampling methods to surveys of difficult-to-enumerate population subgroups is a flexible definition of a "cluster." A cluster, or more precisely a primary sampling unit or PSU, is any aggregation of elements of interest (such as persons, households, or target group members) that can be unambiguously defined and used as a sampling unit from which to select a sample of elements of interest. Many readers will be familiar with the use of geographic areas as clusters or PSUs in household surveys.
For the purposes of target group surveys, PSUs may be defined as any identifiable site or location where target group members congregate or may be found. Box 9-1 provides some illustrative examples of possible operational definitions of PSUs for some of the target groups of interest for repeated HIV risk behavior surveys.
Developing Sample Frames
A sampling frame is simply a list of subjects of interest for a particular survey or, in the event that such a list does not exist, a list of sites or locations where members of a target group of interest are known to congregate. Sampling frames are an integral part of probability sampling. Indeed, the applicability of probability sampling methods to HIV risk behavior surveys hinges upon it being possible to construct meaningful sampling frames.
Except for the case of youth (who may be covered through household surveys), sampling frame development for subpopulation surveys will require preliminary fieldwork to identify for use as PSUs the locations where members of target groups tend to gather. The process of gathering this information is known as ethnographic mapping, which simply means that basic ethnographic techniques are used to create the maps-specifically, participation observation, key informant interviews, and spending time "walking the community." Creating the sampling frame may only involve creating lists of sites. In other instances, it may be necessary to prepare sketch maps. The maps need not have precise dimensions and distances; rough drawings including such things as main streets, main features of the landscape or other identifiable features, and most importantly, main places where target group members "hang out" will suffice. These techniques are explained in Chapter 12, "The Role of Qualitative Data in Evaluating HIV Programs," in Hogle and Sweat6, and in Brown and colleagues7.
Occasionally, lists and maps of locations of key gathering points for target group members, such as brothels, bars, massage parlors, truck stops, hotels, recreation sites, schools, or other locations, may already exist. Sometimes, non-governmental organizations (NGOs) who have been working with a subpopulation may have already created maps of their catchment areas.
It is important that sampling frames cover the entire geographic universe defined for a given survey effort and include the large majority of sites or locations where target group members congregate. If not, the resulting survey estimates will be prone to bias if the behaviors of target group members excluded from the possibility of selection for the survey differ from those who were surveyed. Where the creation of a sampling frame is infeasible for the intended universe for a survey effort, the only alternative under probability sampling is to restrict the universe to that for which it is possible to create a sampling frame.
Selecting Sample Clusters
Once a sampling frame of relevant PSUs has been created, unless the number of sites or locations is sufficiently small that they may all be covered in a given survey, a sample will need to be chosen. The recommended procedure for doing so will depend upon whether any information on the size of clusters (in other words, the number of target group members associated with each site or cluster) is available before the selection of sample clusters.
Statistically, the most efficient procedure is one in which PSUs or clusters are selected using systematic sampling with probability-proportional-to-size (PPS) at the first stage and a constant number of target group members chosen from each PSU at the second stage. Such a design results in a sample in which each target group member has the same overall probability of selection. This is known as a self-weighted sample. In addition to being relatively efficient in terms of sampling precision, this design eliminates the need to weight the data during analysis.
PPS selection should be used when establishments vary significantly in terms of numbers of sex workers associated with them; for example, when some establishments have five times or so as many sex workers as other establishments. Where the numbers of sex workers associated with establishments are roughly comparable, selecting clusters with equal probability will suffice.
However, PPS selection requires that a sampling frame with measures of size (MOS) be available or developed before the sample is selected. A measure of size is simply a count or estimate of the number of elements, or target group members, associated with each PSU. Exact counts are not necessary for use as measures of size-rough approximations will suffice. Unless the errors in measures of size are quite large, the bias introduced into the survey estimates generally will be modest.
When measures of size for clusters are not available, sample clusters will have to be chosen with equal probability. Depending upon how sample elements are chosen at the second stage of sample selection, selecting PSUs with equal probability may result in sample elements having different probabilities of selection. In other words, the sample may be non-self-weighting, and it will be necessary to apply sampling weights to the data at the analysis stage if unbiased survey estimates are to be obtained.
Selecting Target Group Members Within Sample PSUs
In conventional cluster sampling, sample elements are chosen from a list of elements associated with each sample cluster using either simple random or systematic sampling. For many target groups, however, developing a relatively complete list of elements associated with each sample site is likely to be problematic.
Two alternative approaches to second stage sample selection are proposed. The first option, a quota sampling approach, entails interviewing target group members as they come into contact with sample sites or locations until a target sample size has been achieved. For example, a sample of truck drivers might be selected by interviewing all truck drivers who happen to appear at the truck stop until the target sample size has been achieved. Note that under this strategy, the length of time required to achieve the target sample size will vary depending upon the volume of contacts with sample sites.
Alternatively, a "take-all" strategy could be adopted in which all target group members who come into contact with a sample site during a specified data collection interval (for example, on a particular day or night) would be included in the sample irrespective of their number. The keys to the "take-all" strategy are that (1) data be collected at each sample site for the same amount of time at each site, and (2) data are obtained from all target group members that come into contact with each sample site during the designated data collection period. Thus, the "take-all" strategy is not recommended when large numbers of target group members congregate at the sites to be used as PSUs or when it is not possible for other reasons to capture all target group members who appear at sample sites during a specified data collection period. In such situations, the quota sampling approach is instead recommended.
For the "take-all" strategy to be workable, it will be necessary to have at least rough estimates of the typical number of target group members associated with each site. This information is needed to determine both how many sites need to be included in the sample and how many interviewers need to be assigned to each site in order to "capture" all of the target groups members who come into contact with each site on the randomly chosen day.
Targeted Snowball Sampling
The primary role envisioned for non-probability sampling in target group surveys is as a substitute for probability methods in situations where the latter prove to be infeasible. This occurs largely in instances where constructing an adequate sampling frame of sites or locations where target group members congregate is not possible. Target groups for which non-probability sampling methods may have to be used include injecting drug users (IDU), some types of sex workers, and possibly men who have sex with men (MSM).
The basic form of non-probability sampling recommended for target group survey efforts is a modified snowball sampling referred to as targeted sampling5. The basic idea in snowball sampling is to compensate for the lack of a sampling frame by learning the identities and/or locations of members of a given network of target group members through interviews with informants and other target group members themselves. Snowball sampling is an inherently iterative process. Typically, the data collection process begins by interviewing informants and target group members known to the researchers in order to learn the identities of other target group members. The researchers then contact these persons, collect the data, and obtain information on where additional target group members might be found. Leads from each wave of referrals are followed up until a sample of pre-determined size has been achieved. An important limitation of snowball sampling is that sample target group members are likely to provide information only on other target group members who are in their own social, economic, and/or sexual network.
To the extent that risk-taking and/or protective behaviors differ across networks, this poses a potential bias problem for target group surveys. Research in San Francisco, California, for example, revealed the existence of social networks that differ in terms of race, ethnicity, and type of drug used, even in relatively compact geographic areas. Thus, in order for the snowball sampling approach to yield meaningful monitoring data, it is necessary to ensure that target group members from different networks in a given setting are included in the sample.
Targeted sampling is a combination of street ethnography, stratified sampling, quota sampling, and snowball or chain referral sampling. Watters and Biernacki5 describe this approach as being "a purposeful, systematic method by which controlled lists of specified populations within geographic districts are developed and detailed plans are designed to recruit adequate numbers of cases within each of the targets. While they are not random samples, it is particularly important to emphasize that targeted samples are not convenience samples. They entail, rather, a strategy to obtain systematic information when true random sampling is not feasible and when convenience sampling is not rigorous enough to meet the assumptions of the research design."
Three basic steps are involved in taking a targeted sample:
- initial geographic mapping;
- ethnographic mapping and stratification; and
- recruitment of quotas of target group members in specified subcategories through snowball sampling.
Applications Of The Prototype Designs To Selected Target Groups
The following sections contain examples of how the various sampling designs described above may be used with sex workers, men who have sex with men, injecting drug users, and youth.
Sex Workers
Domains and Stratification
An initial issue to be addressed in undertaking surveys of sex workers is whether different types of sex workers in a given setting differ with regard to risk-taking and protective behaviors. For example, in Senegal a distinction is made between registered and clandestine sex workers; in India, between brothel-based and freelance sex workers; in Kenya, between high- and low-paid sex workers; and in Thailand, between "direct" and "indirect" sex workers (namely, sex workers working in massage parlors or brothels versus those working as bartenders or waitresses in bars or restaurants who also engage in commercial sex). If behaviors are thought or known to differ, it would be advisable to treat the types of sex workers as separate survey domains. If not, then they can be treated as a single domain (although as separate sampling strata).
Sampling Frame Development
In most settings, at least some sex workers will work from fixed establishments, such as brothels, massage parlors, or bars. For sex workers who do not work from fixed establishments, city blocks, public parks, and other locations where sex workers congregate may be used as sample sites.
Once a list of sites has been created, it can be used to construct a sampling frame consisting of time-location segments, which are used as PSUs. To illustrate, suppose that preliminary research in a given setting revealed 20 commercial sex establishments. If establishments were open 7 days per week, a total of 140 PSUs would be formed (20 sites x 7 days). If sample PSUs are to be chosen with probability proportional to size, the listing or sampling frame of establishments should also include a measure of size for each PSU. The appropriate measure of size is the expected number of sex workers at a given site on a given day.
The rationale for doing this is to try to spread out the sample over different times/days of the week in the event that sex workers with differing behaviors work on different days of the week. For example, it might be the case that "part-time" sex workers whose behaviors differ from "full-time" sex workers work only on weekends.
To ensure an adequate distribution of sample PSUs with respect to such characteristics as geographic location and type of establishment, researchers typically order the sampling frame according to such factors. For example, commercial sex establishments might be ordered by first listing establishments located in the northwest quadrant of the city, followed by establishments in the southwest quadrant, and so on. Within each quadrant, establishments would be ordered by type of establishment. If two or more cities are included in a target group survey, geographic stratification could be accomplished by listing all establishments in the first city, then those in the second city, and so on. The sampling frame development process for a target group survey of sex workers is illustrated in Box 9-2.
Selection of Sample Clusters and Sex Workers
Once the sampling frame has been developed, a sample of PSUs can be chosen either with probability-proportional-to-size or with equal probability, and a sample of sex workers using either a quota sampling or a "take-all" approach.
Box 9-3 provides an illustrative application of this two-stage cluster sampling scheme for establishment-based sex workers.
"Broker"-based Sex Workers
In some settings, sex workers who are not based in establishments may not congregate in public places, and thus, the cluster sampling approach described above will be infeasible. In India, for example, encounters with sex workers are sometimes arranged through "brokers." In other settings, arrangements are made by telephone.
If a significant portion of the commercial sex trade operates in this fashion in a given setting, then probability sampling methods will not be feasible and the targeted snowball sampling approach will be necessary.
Men Who Have Sex With Men (MSM)
Men who have sex with men (MSM) are difficult to enumerate in sample surveys. However, in many settings, MSM tend to congregate in certain types of establishments or locations (for example, certain bars, nightclubs, parks, or neighborhoods) in sufficient numbers that such locations may be used as PSUs for cluster sampling. In many settings, this may be the only feasible means of gathering behavioral data on MSM. It should be recognized, however, that because not all MSM frequent such locations, this approach is prone to bias to the extent that the behaviors of MSM who frequent such locations differs from those who do not. Alternatively, the targeted snowball sampling approach could be used.
The proposed cluster sampling approach for MSM is quite similar to that used for sex workers who are not based in establishments. The initial step is the development of a sampling frame of locations where MSM congregate. In compiling the list of establishments, attention should be paid to ensuring that the frame covers all geographic parts of the survey universe and that all relevant networks are included, such as those defined by specific ethnic or socioeconomic characteristics.
Once a list of establishments/locations has been developed, time-location sampling units should be created for use as PSUs. For example, if 10 establishments or locations were identified and establishments were open 7 days per week, a total of 70 PSUs would be created. Note, however, that if preliminary research indicated that MSM tended to frequent such establishments only on certain nights, the sampling frame might be limited to such nights. The list of PSUs should be ordered geographically and by establishment type before sample selection.
Box 9-4 provides an illustrative application of cluster sampling to behavioral surveys of MSM.
Injecting Drug Users (IDUs)
Of the groups to be covered by target group surveys, IDUs may well be the most difficult to survey. Among the problems likely to be encountered are difficulties in locating sufficient numbers of IDUs and in obtaining cooperation in responding to the survey. There is an absolute need to safeguard the identity, location, and confidentiality of anyone cooperating in the effort to obtain data from potential informants, as well as IDUs themselves.
With regard to sampling, IDUs may not congregate in sufficient numbers for a cluster sampling approach to be effective. However, in some settings it may be possible to identify areas of cities where higher than average concentrations of IDUs may be found. For example, in HIV/AIDS-related research in San Francisco, it has been feasible to use key informant interviews and consultations with police and medical authorities to identify neighborhoods or districts with significant numbers of IDUs. Even if a sufficient number of such areas can be identified, it will still be necessary to identify the different social networks operating. Accordingly, the targeted snowball sampling approach is likely to be the most feasible alternative in most settings. Box 9-5 provides an illustrative example of the use of the targeted snowball sampling approach for collecting survey data on IDUs.
Youth
Youth differ from the other groups that might be covered in target group surveys in that household surveys may be the preferred way to go about monitoring behavioral trends. Only youth who reside at school, who are institutionalized, or who have no fixed place of residence (for example, homeless or street children) would be excluded from the universe of a household survey. However, in some settings it may not be acceptable to survey youth at their place of residence about sensitive topics. If so, it will be necessary to identify segments of the general population of youth for whom it is feasible to locate and interview outside of their homes. For example, one might consider for inclusion as proxy groups youth in schools, youth working in the informal sector of the economy (such as street hawkers), and youth working in low-skill occupations in the formal sector (such as domestic workers or apprentices). Finally, special categories of youth, such as homeless or street children, might be considered.
Household Surveys of Youth
When household surveys are to be used to enumerate youth, the conventional two-stage cluster sample design proposed for general population surveys by WHO is the recommended sampling approach2. As this sampling scheme is well documented elsewhere, it will not be discussed here. However, a comment on the procedure used to select a sample of youth within sample PSUs is in order.
The preferred procedure is to first create a list or sampling frame of all households containing one or more youth located within each sample PSU, and then choose a sample of households using either simple random or systematic sampling. However, because creating complete lists of households with youth tends to be costly and time consuming, shortcut procedures are often used, which sometimes introduce substantial bias. A more robust shortcut method, referred to as the segmentation method, has seen increasing use in recent years. The basic approach is to divide sample clusters into smaller segments of approximately equal size, choose one segment at random from each cluster, and interview all youth found in households in the chosen segment. The advantages of this approach are twofold in that it (1) avoids the household listing operation, and (2) results in a self-weighting probability sample. The method is described in detail elsewhere9,10.
A key issue in household surveys of youth concerns the way in which sample youth who are not available to be interviewed should be handled. In some surveys, fieldworkers are instructed to merely substitute other respondents, such as in a neighboring household. For target group surveys of youth, this practice should be discouraged because of the potential bias that may be introduced. For example, youth who engage in high-risk behaviors may be more likely to live in single-parent households and/or to be at home less regularly, thus making it more difficult to locate them for a survey interview.
If such persons are systematically excluded from target group surveys, the survey data will underestimate the extent of risk behavior. The recommended course of action is to require return visits ("call-backs") to each sample household in order to obtain an interview from each sample respondent.
School Surveys of Youth
In settings where a sizeable proportion of youth remain in school at the intermediate and secondary levels, conducting surveys in schools represents a cost-effective way of reaching youth. Two cluster sampling schemes for undertaking school surveys are described below. The first is for use when surveys can be conducted in school classrooms using self-administered questionnaires; the second is used when data collection has to take place outside of classroom settings.
The logistically simplest approach is to have students' complete self-administered questionnaires during class sessions. In addition, the low cost of self-administered questionnaires might enable data to be obtained for larger samples of students than will be feasible if personal interviews are used to collect the data. When "in-class" data collection is possible, a two-stage cluster sample design should satisfy most target group survey needs. Under this design, a sample of schools is first chosen from an ordered list of schools, then a sample of classes is chosen from an ordered list of classes in the identified sample schools, and data are gathered from all students in sample classes. Because measures of size (in this case, the number of school enrollees) are likely to be available before sample selection in most settings, schools should be chosen using systematic sampling with probability-proportional-to-size. Box 9-6 provides an illustrative example of a school survey with in-class data collection.
If in-class data collection in schools is not possible, it will be necessary to obtain data from students in non-classroom settings. Although it may be possible to schedule appointments with individuals or groups of students to be interviewed either before or after school (see Box 9-7 for an example), it also may be necessary to conduct intercept interviews with individual students at strategically chosen locations, such as outside of classrooms or in cafeterias, lunch rooms, or other common areas where students congregate.
Irrespective of the strategy used, it is important that steps be taken to ensure that the sample is sufficiently well spread out across students of different grades or levels. If students are to be interviewed as they enter or leave class, the classes or sections from which sample students are to be drawn should be chosen using a systematic-random selection procedure similar to that used in selecting classes or sections for in-class data collection.
Workplace Surveys of Youth
In order to obtain behavioral survey data on out-of-school youth, it is first necessary to determine where such youth may be found. One possibility is to interview youth at business establishments that typically employ youth. Examples of workplace sampling frames for youth in the informal sector include businesses employing apprentices, helpers of truck/bus/van drivers, and motorcycle taxi drivers.
As the types of businesses or occupations with significant numbers of youth are likely to vary from setting to setting, a generic sampling approach is proposed here. The recommended approach is a cluster sample design, with business establishments employing youth being chosen at the first stage of sample selection. As with most target group surveys, the sampling frame development process will begin with consultations with key informants and target group members themselves. The purpose of these consultations is to determine businesses that employ youth and the number of youth who are typically found at such businesses.
Once the sampling frame has been developed, a sample of workplaces can be chosen. If measures of size are available, workplaces should be chosen with probability-proportional-to-size and a fixed number of workers chosen per workplace using systematic sampling at the second stage of selection. However, if the number of workers present at workplaces varies significantly from day to day, it is instead recommended that workplaces be chosen with equal probability and the take-all strategy for selecting sample subjects within sample sites be used. Under this strategy, all youth workers present on the day and time each sample site is visited should be included in the sample. This approach eliminates the need to sub-sample workers in the event that the number present exceeds the target sample size for a given site, or conversely having to return to the site on another occasion in the event that the sample size quota is not met on a single visit. It also results in a self-weighting sample.
Surveys of Youth with No Fixed Residence
For youth who do not have a fixed place of residence, a modified cluster sampling approach in which neighborhoods, city blocks, public parks, and other locations where youth with no fixed residence are known to congregate are used as PSUs. The number of sites of PSUs to be chosen will depend upon how many youth are expected to be found per PSU per interval of fieldwork. If only small numbers of youth are typically found on a given day or night, more PSUs or clusters will need to be included in the sample to reach the target sample size. Alternatively, the same sites could be visited on more than one night, although this may well result in many duplicate interviews. Note, however, that if this strategy is followed, the number of nights that each site was visited needs to be documented so that the sampling weights can be adjusted accordingly. Additionally, if the number of sites where street youth congregate is small (for example, fewer than 10), it may be necessary to include all sites in the sample.
Mobile Populations
Individuals in mobile populations are of concern for HIV/AIDS programs because they spend considerable periods of time away from home, and in many settings and cultures they tend to engage in casual sexual relationships and use the services of sex workers on a more frequent basis than is observed in the general population. In some cases, mobility may involve crossing national borders. Examples of mobile populations include transportation workers, merchants, and migrant laborers.
The basic cluster sampling approach described above is easily extendable to these groups. The major variation lies in the nature of the sites or clusters to be used in cluster sampling. Possible sites for constructing sampling frames for the various subgroups within the broad heading of mobile populations are indicated in Box 9-8.
Determining Sample Size Requirements
The primary objective of repeated behavioral surveys is to measure and compare changes in behavioral indicators over time. The size of the sample is a key design parameter in any survey because it is crucial in ensuring sufficient statistical power to detect and measure such changes.
The sample size required per survey round to measure change on a given indicator will depend upon five factors:
- the initial or starting level of the indicator:
- the magnitude of change that evaluators wish to be able to reliably detect;
- the probability with which evaluators wish to be certain that an observed change of the magnitude specified did not occur by chance (that is, the level of significance);
- the probability with which evaluators wish to be certain that the actual change of the magnitude specified will be detected.
- the relative frequency with which persons with the characteristics specified in a given indicator may be found in the target group population.
An expression for the required sample size (n) for an indicator measured as a proportion for a given target group in each survey round is given by:

where:
D = design effect;
P1 = the estimated proportion at the time of the first survey;
P2 = the proportion at some future date such that the quantity (P2 - P1) is the size of the magnitude of change that one wishes to be able to reliably detect;
P = (P1 + P2) / 2;
Z1-a = the z-score corresponding to the probability with which one can be certain that an observed change of size (P2 - P1) did not occur by chance (that is, the level of significance);
Z1-ß = the z-score corresponding to the probability with which one wishes to be certain that a change of size (P2 - P1) will be detected (that is, the power of the survey).
The design effect (D) is the factor by which the sample size has to increase in order to produce survey estimates with the same precision as a simple random sample. It is based on the homogeneity or similarity within and between the clusters. In short, the greater the differences between the clusters compared to within the clusters, the greater the sample size must be to compensate for these differences. Assuming that the number of cluster sample sizes can be moderately small in a given survey (not more than 20-25 individuals), the use of a standard value of D = 2.0 should adequately compensate for the loss of accuracy resulting from two-stage sampling designs.
The use of this formula is illustrated in Box 9-9.
A table based upon this formula that permits readers to determine final sample sizes without having to perform calculations is provided in a Technical Appendix at the end of this chapter. The table provides sample sizes needed to measure changes in behavioral indicators of a magnitude of 10 and 15 percentage points for different initial values of a given indicator and for different combinations of significance (a) and power (ß). The sample sizes provided in the Appendix table are based on one-tailed values of Z1-a (one-sided significance test), assuming a rationale exists for anticipating the direction of change in behavioral indicators in settings where HIV/AIDS prevention interventions have been introduced. This will result in smaller sample sizes than if corresponding two-tailed z-score values of Z1-a/2 (two-sided significance test) were to be used. Two-tailed z-score values are appropriate when the direction of change cannot reasonably be predicted and/or if programs wish to take a more cautious stance with regard to sample size requirements.
For some indicators, a second sample size computation step will be needed. Take, for example, the indicator "proportion of male vocational students who used a condom during their last encounter with a female sex worker." In this case, the first step in calculating the sample size required would be to determine how many students would be needed to measure a change in the proportion who used a condom during an encounter with a female sex worker during the previous year as described above-for illustrative purposes, say n = 200. However, because only students who had an encounter with a female sex worker in the last year are to be considered in this indicator, it will be necessary to determine how many students would have to be interviewed in order to find the required number of respondents who had sex with a female sex worker during the prior year.
Computationally, the procedure is simple. One merely divides the required sample size calculated as described above by the estimated proportion of the target group that exhibited the required "qualifying" behavior. For example, if 40 percent of male vocational students in a given setting are thought to have had sex with a sex worker in the last year, it would be necessary to interview n = 500 ( = 200/.4) students to find n = 200 subjects needed to measure the desired indicator. Additional illustrative computations are provided in Box 9-10.
The more difficult part is anticipating what the appropriate underlying proportion would be. Here, other surveys or anecdotal information might be consulted for guidance. As there may be considerable uncertainty concerning these parameters, the general guidance is to err toward underestimating the proportion engaging in a given behavior, as this will ensure a sufficient sample size for the main survey effort. For example, if it were thought that between 20 percent and 30 percent of students typically engage in sex with sex workers on an annual basis in a given setting, the 20 percent figure should be used in determining sample size requirements for target group surveys.
The sample size requirements for any given target group survey will be the largest of the sample sizes calculated for the key indicators measured by the survey.
Determining the Magnitude of Change to Measure
One of the more important considerations in determining sample size requirements is the magnitude of change to be measured. The quantity (P2-P1) is the minimum change in a given indicator that successive target group surveys aspire to be able to measure accurately. Sample size requirements vary inversely with the magnitude of (P2-P1). For small values of (P2-P1), the required sample size may be quite large. For practical reasons, it is thus recommended that risk behavior surveys not attempt to measure changes in behavioral indicators smaller than 10-15 percentage points. Attempts to measure smaller changes will likely exceed the resources available to most such efforts.
It should be emphasized that the magnitude of change in a parameter specified in sample size determination calculations may or may not correspond to program targets with regard to the indicator in question. In some cases, a program might have to accept measuring changes of larger magnitude than what they expect to achieve in a given period of time. This is because measuring changes of smaller magnitude may not be feasible. For example, where condom use in a given setting is only 5 percent, a program might wish to measure a 5 percent increase in a 1-year period. However, they may have to be satisfied with measuring a 10 percent increase over a 2-3 year period instead. This is because the sample size required to detect a change of 5 percentage points may be larger than the available resources for survey-taking can support. In this case, the change parameter (P2-P1) might be set to 10 or 15 percentage points in determining sample size requirements, simply because this is all that is feasible. Even though the program target of increasing condom use by 5 percentage points within 1 year may have been reached, it will not be possible to conclude statistically that the indicator has changed until a change of 10-15 percentage points has been realized unless, of course, additional resources can be found to support surveys with larger sample sizes. In other words, the sample size is too small to give the survey enough power to detect such a small change as statistically significant and to demonstrate that the program actually had an effect on condom use.
Considering Statistical Power
A second key sample size consideration is that of statistical power. Unless sample sizes are sufficient to be able to detect changes of a specified size, the utility of repeated surveys as a monitoring tool is compromised. To illustrate, suppose we desire to be able to measure a change of 10 percentage points in the proportion of sex workers who always use a condom with their clients. We compare two pairs of hypothetical surveys taken 2 years apart: one with a sample size of n = 500 in each survey round and the other with a sample size of n = 200 per survey round. While both surveys might indicate the expected increase of 10 percentage points, this change may well not be statistically significant at a given level of significance based upon the surveys with sample sizes of n = 200. Thus, we would be forced to conclude that no meaningful change in this behavior occurred over the study period, when, in fact, there was a real increase but it was not statistically significant. To ensure sufficient power, a minimum value of Z1-b of .80 should be used; .90 would be preferable where resources permit.
Further guidance on determining sample size requirements for target group surveys is provided in Family Health International's Guidelines for Repeated Behavioral Surveys in Populations at Risk of HIV12.
Other Sample And Survey Design Issues
Retaining or Replacing Sample Sites or Clusters in Each Survey Round
One of the key design issues in repeated surveys is whether to retain the same PSUs or clusters or choose a new sample of sites in each survey round. There are two advantages to retaining the same sample of sites or clusters. The first is that background characteristics and behaviors of individuals associated with particular sites tend to be correlated over time, and this factor can increase the statistical precision with which changes are measured. For example, sites such as brothels, bars, or truck stops may attract certain types of target group members and/or may encourage or discourage certain types of behaviors. The effect of this correlation is to reduce the standard error of survey estimates of change between the two survey rounds. Secondly, retaining the same sites reduces the sampling frame development work that needs to be done at the beginning of each survey round.
Balanced against this are several important disadvantages. Among these, the problem of resistance by site "gatekeepers" to repeated visits to the same sites and the loss of sites due to business failures loom especially large. In some settings, the sites where members of certain types of target groups congregate might change so rapidly over time that there is no choice but to construct a new sampling frame and select a new sample of sites in each survey round. Finally, retaining the same sites over an extended period of time does not allow for new sites or "pockets" of risk behavior to be reflected in the behavioral survey monitoring data.
While a compromise strategy of retaining a fixed proportion of sites between any two successive survey rounds and replacing the remaining sample of sites with a new sample might be considered, the advantages of retaining even some sample sites over time in target group surveys are debatable. For one thing, it is unclear that the correlations on characteristics and behaviors over time will be as large for the types of sites used in target group surveys as is often found when residential areas are used as clusters in household surveys. Thus, the magnitude of gains to be realized by maintaining the same clusters is uncertain. The general recommendation, therefore, is to choose a new sample of sites in each survey round.
Dealing with Duplicate Observations
Irrespective of which sampling method is used, one problem that may occur is that of duplicate observations. Duplicate observations may arise because some target group members are associated with more than one PSU. For example, sex workers may work at more than one location, or truck drivers may use more than one truck stop during the course of fieldwork for a survey. There is thus a possibility that the same target group member might be interviewed twice or possibly more.
The statistically correct way to deal with this problem is to adjust the sampling weights to account for the fact that some target group members had more than one opportunity to be included in the sample for a given survey round. However, the recordkeeping, statistical, and data processing requirements of doing so are likely to be beyond the resource capacity in most applications.
A more feasible, but less technically satisfactory solution would be to screen out potential duplicate observations by inquiring whether sample target group members had already been interviewed during the period of survey fieldwork, and not conducting interviews with respondents answering yes. If this approach were adopted, appropriate screening questions would have to be added at the beginning of the survey questionnaires used.
A third option is to do nothing. Except when the total population of a target group of interest is very small, the probability of encountering a sufficient enough number of duplicate observations in a given survey round to introduce serious bias is likely not large enough to worry about.
Dealing with Insufficient Numbers of Target Group Members at Sample Sites
In many of the sampling schemes described above, a number of decisions are driven by an expected number of target group members at each site during a specified time interval. What should be done if, during the course of fieldwork, researchers find that the actual number of target group members is substantially lower than expected?
Two options are available. The first is to return to sample sites for additional intervals of data collection. The second is to select a supplementary sample of PSUs. Returning to sample sites for an additional interval of data collection is the less desirable option for two reasons. First, if the expected daily volume of target group members were a serious overestimate, returning to the same sites would be an inefficient way of increasing the sample size. Secondly, sampling additional cases per PSU would increase the precision of survey estimates less so than sampling additional PSUs.
What should be done in cases where all PSUs are already included in the sample, and it is thus impossible to choose more PSUs? In this situation, the only alternative is to visit sample PSUs for longer intervals than had been originally planned.
What should be done if even after repeated visits to all PSUs, it is still not possible to reach the target sample size? The answer to this question depends upon the reason why it was not possible to reach the target sample size. One possible cause is that the sampling frame was incomplete. In such a case, one option would be to update the sampling frame and choose a supplementary sample of PSUs of sufficient size to enable the target sample size to be reached. Alternatively, a lower-than-planned sample size could be accepted for the current survey round, and more resources could be put into sampling frame development in subsequent survey rounds in which larger sample sizes would be used (larger sample size in subsequent survey rounds can offset the effects of sample size deficit in earlier rounds).
In Nepal, one approach to dealing with this problem was to inquire from successfully interviewed establishment-based sex workers about any friends who worked at the same establishment but had not been present during the times that data were being collected for the target group survey. These leads were then followed up and included in the sample as having been sampled from the "referring" establishment. Such an approach should be used cautiously, however. In the Nepal case, researchers found that a number of the leads were in fact not sex workers, but friends of the sex workers who were nominated so that the sex worker could collect the incentive offered for identifying other sex workers. In the final analysis, it may have been preferable to accept a lower than expected sample size.
A final note on the problem is that in some instances, there may simply not be enough target group members in the population. In such cases, the key issue is whether there is sufficient justification for doing surveys for the target group.
Ensuring Replicability Through Documentation
Given the difficult sampling problems posed by HIV/AIDS target group surveys, it is important that steps be taken to make the resulting data as unbiased and sampling plans as replicable as possible. Of crucial importance is that a thorough documentation of sampling plans and adopted selection criteria is prepared to enhance the replicability of data collection efforts over time. This is especially important where probability sampling methods are not used, as the credibility of estimated trends in behaviors over time depends very heavily upon whether a convincing case can be made that identical sampling and survey methods were used across repeated survey rounds. Being able to demonstrate that constant sampling procedures were used adds considerably to the credibility of such estimates.
Conclusion
Monitoring trends in HIV risk behaviors through periodic surveys presents some formidable sampling challenges. At the heart of these problems is the fact that many of the population subgroups or target groups that may be of interest for behavioral surveillance or monitoring are difficult to capture in conventional household surveys.
In this chapter, we have described in some detail two approaches to sampling for risk behavior surveys in key HIV target groups. The first approach extends the basic principles of cluster sampling in ways that should be both applicable and feasible for most target groups. The second approach, a more rigorous form of snowball or chain referral sampling, is recommended for use when the development of any type of meaningful sampling frame of sites where target group members congregate is infeasible. Applications of these two strategies to the key HIV target groups were presented in the chapter.
By providing guidance on more rigorous sampling methods, we hope the validity and quality of risk behavior surveillance data can be greatly improved. We acknowledge, however, that the use of these more rigorous sampling approaches in undertaking repeated risk behavior surveys is still in the testing stage. While recent experience suggests that the recommended approaches are feasible, further verification is required. Applications planned in a wide variety of settings over the next few years should provide further guidance on how relatively rigorous sampling methods for such survey undertakings might be adapted to meet field realities.
References
- Family Health International/IMPACT and Joint United Nations Programme on HIV/AIDS (UNAIDS). Meeting the behavioural data collection needs of national HIV/AIDS/STD programmes. Proceedings from a joint IMPACT/FHI/UNAIDS workshop. Arlington (VA) and Geneva: Family Health International/IMPACT and UNAIDS; 1998.
- WHO/GPA/TCO/SEF/94.1. Evaluation of a national AIDS programme: a methods package. Geneva: World Health Organization; 1994.
- Kalton G. Introduction to survey sampling. Newbury Park (CA): Sage Publications; 1989.
- Kalton G. Sampling rare and elusive populations. New York: United Nations, Department for Economic and Social Information and Policy Analysis; 1993.
- Watters J, Biernacki P. Targeted sampling: options for the study of hidden populations. Soc Prob 1989;36(4):416-430.
- Hogle J, Sweat M. Qualitative methods for evaluation research in HIV/AIDS prevention programming. AIDSCAP evaluation tools, Module 5. Arlington (VA): Family Health International; 1996.
- Brown T, Sittitrai W, Carl G, et al. Geographic and social mapping of commercial sex: a manual of procedures. Honolulu and Bangkok: Program on Population, East-West Center, and Thai Red Cross AIDS Research Center; 2000.
- Lemp GF, Hirozawa A, Givertz D, et al. Seroprevalence of HIV and risk behaviors among young homosexual and bisexual men: the San Francisco/Berkeley Young Men's Survey. JAMA 1994;272(6):449-454.
- Turner A, Magnani R, Shuaib M. A not quite as quick but much cleaner alternative to the Expanded Programme on Immunization (EPI) cluster survey design. Int J Epidemiol 1996;25:198-203.
- United Nations Children's Fund (UNICEF). Monitoring progress toward the goals of the world summit for children: Practical handbook for multiple-indicator surveys. New York: UNICEF; 1996.
- FOCUS on Young Adults Program. Impact of the Peruvian National Family Life Education Program on reproductive health knowledge, attitudes, and behaviors. Washington (DC): FOCUS Program and Ministry of Education, Republic of Peru; in press.
- Family Health International (FHI). Guidelines for repeated behavioral surveillance surveys in populations at risk of HIV. Arlington (VA): Family Health International; 2000.
| Sample size requirements for selected combinations of P1, P2, Z1-a and Z1-b |
|
|
Combinations of Z1 - a / Z1 - b (significance level / percent power) |
|
P1 |
P2 |
95/90 |
95/80 |
90/90 |
90/80 |
|
.10 |
.20 |
432 |
312 |
331 |
228 |
|
.10 |
.25 |
216 |
156 |
165 |
114 |
|
.20 |
.30 |
636 |
460 |
485 |
336 |
|
.20 |
.35 |
299 |
216 |
229 |
158 |
|
.30 |
.40 |
776 |
558 |
594 |
408 |
|
.30 |
.45 |
352 |
255 |
270 |
186 |
|
.40 |
.50 |
841 |
607 |
646 |
444 |
|
.40 |
.55 |
375 |
271 |
288 |
198 |
|
.50 |
.60 |
841 |
607 |
646 |
444 |
|
.50 |
.65 |
367 |
266 |
282 |
194 |
|
.60 |
.70 |
773 |
558 |
594 |
408 |
|
.60 |
.75 |
329 |
238 |
253 |
174 |
|
.70 |
.80 |
636 |
460 |
485 |
336 |
|
.70 |
.85 |
261 |
189 |
200 |
138 |
|
.80 |
.90 |
432 |
312 |
331 |
228 |
|
.80 |
.95 |
163 |
118 |
125 |
86 |
Note: sample sizes shown assume a design effect of 2.0 and are based on one-tailed values of Z1-a (one-sided significance test).