This project uses nationally representative survey data to examine patterns of substance use among U.S. adults. The analysis focuses on alcohol, cigarette, marijuana, cocaine, and heroin use, with particular attention to the occurrence of polysubstance use, defined as the use of two or more substances. Overall, the project provides descriptive insight into how substance use is distributed across age, education, employment, and income groups, and evaluates which sociodemographic factors are most strongly associated with multiple substance involvement. These findings contribute to a better understanding of population-level substance use behaviors and highlight groups at elevated risk of polysubstance use.
Substance use remains a major public health concern in the United States, with tobacco, alcohol, cannabis, and other drugs contributing significantly to morbidity, mortality, and healthcare costs. Increasingly, attention has shifted toward polysubstance use, where individuals use more than one substance, often simultaneously or within the same timeframe. Polysubstance use is associated with higher risks of dependence, overdose, and poorer health and social outcomes compared to single-substance use. Understanding the social and demographic patterns of substance use, as well as the distribution of polysubstance behaviors, is therefore essential for informing prevention and intervention strategies.
To address these issues, this project draws on complementary datasets derived from the National Survey on Drug Use and Health (NSDUH). The dataset provides nationally representative information on the U.S. population, including demographic characteristics and indicators of tobacco, alcohol, cannabis, and other substance use. These data will be used to characterize the study population and identify sociodemographic patterns of substance use across sociodemographic groups. The dataset also focuses on detailed measures of substance use behaviors, including tobacco, alcohol, cannabis, and other drugs. This data will be used to estimate the prevalence of single- and polysubstance use and to examine how patterns of use vary across key demographic characteristics.
By leveraging complementary perspectives from NSDUH—the broad population survey with demographics and the substance-focused dataset—this project provides a more comprehensive understanding of substance use behaviors in the U.S. and the sociodemographic factors that shape them.
The dataset analyzed is modeled after the National Survey on Drug Use and Health (NSDUH), a nationally representative survey that emphasizes behavioral health in the U.S. population. This dataset was extracted from the 2019 National Survey on Drug Use and Health, which is an annual survey (starting from 1971) that collects data regarding drug usage and mental health issues in the United States. The survey is run and maintained by the Substance Abuse and Mental Health Services Administration (SAMHSA) –a federal government agency that specializes in behavioral health and research. This dataset includes demographic variables—age (coded 1–6), education (1–5), employment (1–4), and income (1–4)—as well as indicators of tobacco, alcohol, cannabis, and other drug use. These data will be used to provide a descriptive overview of the population, characterizing patterns of substance use across sociodemographic groups. The dataset is also focused specifically on substance use behaviors. It includes detailed measures of tobacco, alcohol, and cannabis use, which provide the basis for estimating the prevalence of single- and polysubstance use within the study population.
Accordingly, this project seeks to address the following objectives:
This project will analyze publicly available datasets from Kaggle that provide complementary perspectives on substance use:
The analytic sample consists of individuals represented in the NSDUH-based dataset who provided responses on both demographic and substance use measures. Demographic information includes age, education, employment, and household income, while substance use indicators capture the use of alcohol, cigarettes, marijuana, cocaine, and heroin. For this study, the inclusion criterion is the availability of complete data on these measures, ensuring that all analyses are based on consistent and reliable information.
The dataset contains two broad categories of measures: demographic characteristics and substance use indicators. Demographic variables include age (coded 1–6), education (1–5), employment (1–4), and income (1–4). Substance use is captured through indicators of alcohol, cigarette, marijuana, cocaine, and heroin use, which will be recoded as binary variables (1 = any reported use, 0 = no use). From these variables, a derived measure of polysubstance use will be constructed, defined as the use of two or more substances.
Data management for this dataset will involve creating binary indicators for each substance (1 = use, 0 = no use) and constructing a composite measure of polysubstance use, defined as the use of two or more substances. This preparation will allow for a consistent approach to estimating prevalence and facilitate meaningful comparisons with the demographic patterns identified in Dataset. The dataset contains demographic variables including age, education, employment, and income, along with indicators of tobacco, alcohol, cannabis, and other drug use. These measures will first be used to describe the study population and characterize substance use patterns across sociodemographic groups. Beyond descriptive analyses, the dataset will also allow for an evaluation of how demographic factors are related to polysubstance use, making it possible to assess whether certain groups—defined by age, education, employment, or income—are more likely to engage in the use of multiple substances.
Data preparation will involve several key steps to ensure analytic clarity and consistency. First, the dataset will be imported into R from its external file format. Variable names will be renamed to provide clear and descriptive labels. Demographic variables will be recoded into categorical factors with meaningful level names, facilitating interpretation of results. Substance use indicators will be recoded into binary variables to allow for prevalence estimation and regression analysis. A composite variable for polysubstance use will then be created, identifying individuals who report use of two or more substances. Finally, missing values will be translated from their original codes into standard missing data designations (NA in R), ensuring accurate analysis and reproducibility.
All analyses will be conducted using R. Data preparation will include recoding demographic variables into meaningful categories, generating binary indicators for substance use, and constructing a composite measure of polysubstance use defined as the use of two or more substances.
Descriptive statistics (frequencies, proportions, means, and standard deviations) will be used to characterize the study population (age, education, employment, income) and summarize substance use patterns (tobacco, alcohol, cannabis, and other drugs). Graphical displays, including bar charts and histograms, will be used to illustrate patterns of substance use across sociodemographic groups.
Demographic Comparisons will be used to address the first aim; bivariate analyses will examine how demographic variables are associated with substance use. Comparisons of substance use prevalence across demographic groups will be examined using chi-square tests for categorical variables and t-tests or one-way ANOVA for continuous measures, where appropriate.
Prevalence Estimation: To address the second aim, prevalence estimates of single-substance and polysubstance use will be calculated with corresponding 95% confidence intervals. Stratified analyses will be conducted across demographic subgroups to identify populations at elevated risk of polysubstance involvement.
To evaluate associations between demographic variables and polysubstance use, logistic regression models will be employed. These models will estimate adjusted odds ratios (ORs) and 95% confidence intervals, providing an assessment of how factors such as age, education, employment, and income are related to the likelihood of engaging in polysubstance use. Model fit will be evaluated, and sensitivity analyses may be conducted to ensure robustness of results.
You must analyse an NSDUH-based, nationally representative dataset to describe and model U.S. adult substance-use patterns, with a special focus on polysubstance use (use of two or more of: alcohol, cigarettes, marijuana, cocaine, heroin). Deliverables typically include a cleaned analytic dataset, descriptive tables and figures, prevalence estimates (with 95% CIs), bivariate comparisons across sociodemographic groups (age, education, employment, income), logistic regression(s) predicting polysubstance use, model diagnostics/sensitivity checks, and a written report that interprets findings and notes limitations.
Looking to understand how to approach your assignment effectively? Download the sample solution provided below — it’s a great way to see how academic experts structure, analyze, and present ideas professionally. However, remember this sample is for reference purposes only. Submitting it as your own work may lead to plagiarism penalties or academic misconduct issues.
If you need a unique, plagiarism-free, and custom-written assignment, our professional academic writers are here to help. Every order is crafted from scratch, tailored to your topic, guidelines, and university requirements. You’ll receive a well-researched, properly cited, and 100% original paper ensuring top grades and complete academic integrity.
Avoid plagiarism risks and get the support you need to excel with confidence.
Download Sample Solution Order Fresh Assignment
© Copyright 2026 My Uni Papers – Student Hustle Made Hassle Free. All rights reserved.