論文の公開元へ

書き出し

Refer/BibIX

RIS

BibTeX

TSV

Claims-based algorithms for common chronic conditions were investigated using regularly collected date in Japan

原, 湖楠東京大学 DOI:10.15083/0002002372

2021.10.13

概要

1 Introduction
　A growing body of research using medical and pharmacy claims data has been conducted in various fields including epidemiology, health service research, and health economics. Nevertheless, claims data is subject to limitations due to potential imprecision in the identification of medical conditions. Because the claims are issued primarily for reimbursement to health care institutions, (1) information that is unnecessary for processing payments may not be collected or registered precisely in the claims forms; and (2) the diagnosis registered on claims may be relevant to testing for disease rather than to confirmed disease. The resulting misclassification of diagnosis can engender a substantial bias and undermine the credibility of the findings. To address these concerns, plenty of studies have proposed a claims-based algorithm (CBA) for identifying patients with their target condition and computed association measures to assess the usability of the algorithm.
　However, the literature of CBA still has two features to be refined: one regarding the source of the gold standard; another regarding the construction procedure of the CBA. In this study, I clarified obstacles in advancing research on CBA concerning these two features. I reviewed existing methods in the literature of CBA and made proposals on a better possible method that has not received much attention in the literature. I examined and discussed cases of three common chronic medical conditions, hypertension, diabetes, and dyslipidemia, about how these proposals are considered superior in comparison with existing methods.
　The first feature limits the population to which the CBA can be applied and the second makes the CBA construction procedure to be an overly complicated and cumbersome matter. Moreover, the burden of reviewing charts and searching for a fine-tuned CBA lead to a slow establishment of acceptable CBAs because it discourages researchers from CBA studies. The sluggish establishment of usable CBAs can be a big issue as the codes recorded in the claims for transmitting information about patients are supposed to change periodically. The dissertation (1) demonstrated the usefulness of health screening results as the source of gold standard; (2) showed the power of statistical learning methods to develop an efficient CBA construction procedure; (3) proposed a course of action for an efficient CBA research.

2 Methods
2.1 Setting
　Medical and pharmacy claims data combined with annual health screening results were obtained from Japan Medical Data Center. The baseline study population for condition X (hypertension, diabetes, or dyslipidemia) was defined as beneficiaries (1) who were enrolled in the claims database from 1 April 2016 to 31 March 2018 and whose health screening were sequentially conducted for fiscal year (FY) 2016 and FY2017, (2) with complete data on self-reported use of blood pressure-lowering drugs, hypoglycemic drugs, and lipid-lowering drugs for FY2016 and FY2017, (3) who in FY2017 visited a clinic/hospital that mainly specializes in internal medicine, and (4) with complete data on examination results required for the gold standard of condition X mentioned later for FY2016 and FY2017 (hypertension, n = 631,289; diabetes, n = 152,368; dyslipidemia, n = 614,434). I constructed a gold standard from the results of the annual health screening. I used two consecutive FYs (FY2016 and FY2017) of the health screening results to construct the gold standard. I consulted with experts and defined a gold standard to diagnose each condition in compliance with Japanese guidelines.

2.2 Claims-based algorithm
　The CBA was compared with the gold standard. I used FY2017 claims data as the source of the CBA and compared it with the diagnosis derived from the gold standard based on health screening results of FY2016-FY2017.
　Conventionally, researchers have selected input variables and decided how to incorporate variables into the CBA by hand. Hence, I first developed three conventional case-finding algorithms for each condition as baseline CBAs. Patients meeting the following selection rule were classified as “test-positive” for condition X (hypertension, diabetes, or dyslipidemia): (1) the diagnostic code corresponding to condition X is found in the claims at least once (diagnostic code-based CBA); (2) the medication code corresponding to condition X is found in the claims at least once (medication code-based CBA), and (3) the diagnostic code and the medication code corresponding to condition X are both found in the claims data at least once (combined CBA). The diagnostic codes corresponding to hypertension, diabetes, and dyslipidemia were, respectively, defined as ICD-10 codes I10-I15, E10-E14, and E78. The medication codes corresponding to hypertension, diabetes, and dyslipidemia were, respectively, defined as WHO-ATC codes C08 and C09, A10, and C10. To evaluate to what extent baseline CBAs is applicable to a wide range of populations, the following study populations were considered instead of the baseline study population: (1) enrollees who had visited any clinic/hospital at least once in FY2017; and (2) all enrollees including those who had not visited any clinic/hospital in FY2017.
　Statistical methods such as regression and statistical learning methods can foster the development of CBAs. Accordingly, I next applied (1) regression model, (2) discriminant analysis, and (3) generalized additive model (GAM) to a dataset that input variables were selected according to each condition. To bypass a somewhat cumbersome task of selecting variables that are likely to be associated with each target condition and constructing a satisfactory CBA from the selected variables, I devised methods by which a CBA is fine-tuned regardless of the level of knowledge and without modification of the CBA construction procedure across diﬀerent conditions. Consequently, I lastly applied (1) logistic regression, (2) k-nearest neighbor (kNN), (3) support vector machine (SVM), (4) penalized regression, (5) tree-based model, and (6) neural network to a dataset that input variables were chosen to be common to all target conditions. Although regression methods can be used when the number of the input variables is smaller than the sample size and the input variables with perfect colinearity were trimmed in advance, their predictive property is expected to be poor. To examine this point, I included a logistic regression to the models. The statistical learning methods elected are capable of handling the sparse high-dimensional input variables.

2.3 Statistical analysis
　I quantified the goodness of CBAs by association measures, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), receiver operating characteristic (ROC) curve, and area under the ROC curve (AUC). The dataset was randomly divided into three parts: a test set (25%), a training set (50%), and a validation set (25%). Association measures of the CBA were assessed using the test set. For the CBAs that were based on statistical methods, a prediction function needs to be derived to calculate association measures. As the current problem is a two-class classification problem, I estimated a prediction function that outputs the score of the propensity for having a disease given a set of input variables. The outcome variable is a binary indicator of having a disease that is assessed by the gold standard. If the model involves a hyperparameter to be tuned, the training set and the validation set were used for the tuning. For each candidate value of the hyperparameter, an estimation of the parameter of the model is conducted with the training set. Given the estimated parameter, the AUC of the model is computed using the validation set. Then, the hyperparameter is chosen to be the value that maximizes the AUC. If computationally feasible, tenfold cross-validation with a combined set of training and validation set (simply “combined training set” in what follows) was used to estimate the expected value of the AUC. After the hyperparameter determination, the combined training set was used to estimate parameters for the prediction function. When no hyperparameter tuning is required, the combined training set was used to estimate parameters in the prediction function from the beginning. As the computational burden of some statistical methods without a condition-specific variable selection was prohibitive for large sample size, I randomly drew 25% of the enrollees for the analysis of hypertension and dyslipidemia except for conventional methods. All statistical analysis was conducted using R version 3.5.1. R code will be available at https://github.com/harakonan/research- public/tree/master/cba after the publication of the study.

3 Results
　As the test set was employed for the calculation, the sample size was 157,822, 38,092, and 153,608 for hyperten- sion, diabetes, and dyslipidemia. The prevalence that was determined by the gold standard for each condition was 25.4%, 8.3%, and 38.7% for hypertension, diabetes, and dyslipidemia.
　In the baseline diagnostic code-based (combined) CBA, the sensitivity, the specificity, PPV, and NPV were 80.4%, 95.1%, 84.9%, and 93.4% (74.4%, 98.1%, 93.1%, and 91.8%) for hypertension, 91.1%, 92.8%, 53.4%, and 99.1% (79.2%, 99.6%, 94.7%, and 98.2%) for diabetes, and 49.2%, 90.1%, 75.8%, and 73.7% (35.8%, 97.0%, 88.2%, and 70.5%) for dyslipidemia. The sensitivity decreased when the study population was expanded to include all people (hypertension, 66%-71%; diabetes, 74%-85%; dyslipidemia, 28%-38%).
　The AUC of the regression model (discriminant analysis and GAM) with a dataset that input variables were selected according to each condition was .924-.925 (.925-.929) for hypertension, .958-.962 (.962-.963) for diabetes, and .738-.739 (.739-.758) for dyslipidemia.
　The AUC of the models with a dataset that input variables were chosen to be common to all target conditions was as follows: the logistic regression, hypertension .915, diabetes .936, dyslipidemia .743; the kNN with raw (standardized) input variables, .914-.915 (.855-.856), .942 (.888-.889), .739 (.677-.680); the SVM, .914-.919, .944-.950, .724-.749; the logistic ridge (the logistic lasso and the logistic elastic-net), .893 (.923-.924), .930 (.961), .725 (.748-.753); the random forest (the ISLE), .923 (.928-.930), .958-.960 (.963-.965), .760-.761 (.767-.772); the neural network, .910-.914, .919-.939, .739-.745.

4 Discussion
　As I expanded the study population to include all enrollees from the baseline study population, the sensitivity decreased. The decrease of the sensitivity was mild for hypertension (74%-80% to 66%-71%) and diabetes (79%- 91% to 74%-85%), while the decrease was sizable for dyslipidemia despite the low starting point (36%-49% to 28%-38%).
　The penalized regressions other than ridge and the tree-based models, which are the leading statistical learning methods, achieved AUCs comparable to the logistic regression with a knowledge-based condition-specific variable selection, and the level of the AUC was satisfactory for hypertension and diabetes.
　I propose a two-step course of action for an efficient CBA research. The first step is to prepare an efficient gold standard construction environment to sidestep chart reviewing. This can be achieved by the use of regularly collected data like annual health screening results, which are used in this study. EHRs and disease registries are possible candidates along this line. The second step is to use a condition-invariant procedure in the CBA construction. From this study, I recommend using the penalized regressions other than ridge or the tree-based models with input variables as age, gender, and all ICD-10/WHO-ATC codes with a letter followed by two digits to generate a prediction function that outputs the score of the propensity for having a disease. This procedure is expected to yield an AUC that is comparable to the AUC of the logistic regression with a knowledge-based condition-specific variable selection. Once a broad set of input variables are selected, researchers can uniformly apply the procedure to construct a prediction function for each of their target conditions and compare it against their gold standard that is constructed from the regularly collected data. All coordinates on the ROC curve can be realized by the CBA induced by the prediction function. The course of action should considerably encourage the implementation of CBA research.
　The use of regularly collected data such as the routine health screening results as the source of the gold standard is a novel approach in the literature of CBA. There are advantages of adopting health screening results over the standard of chart review. First, once the gold standard for the target condition is defined, one can systematically acquire the gold standard diagnosis of enrollees without relying on chart reviewers’decision on diagnosis. Second, it takes much less time to run a computer program on health screening results than review charts to obtain the gold standard diagnosis. Third, while the chart review disregards the relevant information which is included in the charts of other medical institutions that is not on the review list, health screening captures the required information for the present three conditions.
　The use of statistical learning methods in the CBA construction procedure is an innovative strategy in the literature. Researchers needed to select input variables and decide how to incorporate variables in the CBA with existing knowledge on a case-by-case basis. They may not be so confident about whether the resulting CBA is sufficiently capturing features of the target condition, especially if they failed to attain a satisfactory performance by the CBA. Consequently, it is necessary to conduct a tedious comparison of a large collection of knowledge-based candidate CBAs to alleviate the uneasiness. An appropriate statistical learning method overcomes these issues proficiently: researchers only need to select variables that can be uniformly applied to all conditions and the variables that are crucially related to the target condition will be incorporated in the model automatically.

5 Conclusion
　The dissertation showed that one can (1) construct fine-tuned CBAs using a statistical learning method without knowledge for target conditions and condition-specific modifications of the CBA construction procedure and (2) make an assessment of the usability of CBAs in a large population efficiently when regularly collected data as a source of the gold standard is available. I believe that the series of techniques evaluated in the study should become essential in future CBA research.

論文の公開元へ

参考文献

Abaluck, Jason, and Jonathan Gruber. 2016. “Evolving choice inconsistencies in choice of prescription drug insurance.” American Economic Review 106 (8): 2145–2184.

Abrahamowicz, Michal, Yongling Xiao, Raluca Ionescu-Ittu, and Diane Lacaille. 2007. “Sim- ulations showed that validation of database-derived diagnostic criteria based on a small subsample reduced bias.” Journal of Clinical Epidemiology 60 (6): 600–609.

Amershi, Saleema, Max Chickering, Steven M. Drucker, Bongshin Lee, Patrice Simard, and Jina Suh. 2015. “ModelTracker: Redesigning Performance Analysis Tools for Machine Learning.” In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems - CHI ’15, 337–346. New York, New York, USA: ACM Press.

Andrade, Susan E., Jerry H. Gurwitz, K. Arnold Chan, James G. Donahue, Arne Beck, Myde Boles, Diana S M Buist, et al. 2002. “Validation of diagnoses of peptic ulcers and bleeding from administrative databases: A multi-health maintenance organization study.” Journal of Clinical Epidemiology 55 (3): 310–313.

Belloni, Alexandre, and Victor Chernozhukov. 2013. “Least squares after model selection in high-dimensional sparse models.” Bernoulli 19 (2): 521–547.

Belloni, Alexandre, Victor Chernozhukov, and Christian Hansen. 2011. “Inference for High- Dimensional Sparse Econometric Models”: 1–41. arXiv: 1201.0220.

Belloni, Alexandre, Victor Chernozhukov, and Christian Hansen. 2014. “Inference on Treat- ment E ects after Selection among High-Dimensional Controls.” The Review of Economic Studies 81 (2): 608–650.

Boser, Bernhard E., Isabelle M Guyon, and Vladimir N. Vapnik. 1992. “A training algorithm for optimal margin classifiers.” In Proceedings of the fifth annual workshop on Computational learning theory, 144–152. New York: ACM Press.

Bradley, Andrew P. 1997. “The use of the area under the ROC curve in the evaluation of machine learning algorithms.” Pattern Recognition 30 (7): 1145–1159.

Breiman, Leo. 1996. “Bagging Predictors.” Machine Learning 24:123–140. Breiman, Leo. 2001. “Random Forests.” Machine Learning 45 (5): 5–32.

Breiman, Leo, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone. 1984. Classifica- tion and regression trees. London: Chapman & Hall/CRC.

Bullano, Michael F., Siddhesh Kamat, Vincent J. Willey, Suna Barlas, Douglas J. Watson, and Susan K. Brenneman. 2006. “Agreement Between Administrative Claims and the Medical Record in Identifying Patients With a Diagnosis of Hypertension.” Medical Care 44 (5): 486–490.

Candes, Emmanuel, and Terence Tao. 2007. “The Dantzig selector: Statistical estimation when p is much larger than n.” The Annals of Statistics 35 (6): 2313–2351.

Chan, An-Wen, Kinwah Fung, Jennifer M. Tran, Jessica Kitchen, Peter C. Austin, Martin A. Weinstock, and Paula A. Rochon. 2016. “Application of Recursive Partitioning to Derive and Validate a Claims-Based Algorithm for Identifying Keratinocyte Carcinoma (Non- melanoma Skin Cancer).” JAMA Dermatology 152 (10): 1122.

Chawla, N. V., K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. 2002. “SMOTE: Synthetic Minority Over-sampling Technique.” Journal of Artificial Intelligence Research 16:321– 357.

Chen, Tianqi, and Carlos Guestrin. 2016. “XGBoost: A Scalable Tree Boosting System”: 1–13. arXiv: 1603.02754.

Chen, Wei, Tie-Yan Liu, Yanyan Lan, Zhiming Ma, and Hang Li. 2009. “Ranking Measures and Loss Functions in Learning to Rank.” In NIPS ’07: Proceedings of the 22nd International Conference on Neural Information Processing Systems, 315–323. March.

Cheng, Ching-Lan, Yea-Huei Yang Kao, Swu-Jane Lin, Cheng-Han Lee, and Ming Liang Lai. 2011. “Validation of the National Health Insurance Research Database with ischemic stroke cases in Taiwan.” Pharmacoepidemiology and drug safety 20 (3): 236–42.

Cheng, Ching-Lan, Cheng-Han Lee, Po-Sheng Chen, Yi-Heng Li, Swu-Jane Lin, and Yea-Huei Kao Yang. 2014. “Validation of Acute Myocardial Infarction Cases in the National Health Insurance Research Database in Taiwan.” 24 (6): 500–507.

Collet, David. 1999. Modelling Binary Data. Second. Boca Raton, Florida: Chapman & Hall/CRC.

Cortes, Corinna, and Vladimir Vapnik. 1995. “Support-vector networks.” Machine Learning 20 (3): 273–297.

Cristianini, Nello, and John Shawe-Taylor. 2000. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. New York: Cambridge University Press.

DeLong, Elizabeth R., David M. DeLong, and Daniel L. Clarke-Pearson. 1988. “Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach.” Biometrics 44 (3): 837–845.

Donoho, D. L., and M. Elad. 2003. “Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization.” Proceedings of the National Academy of Sciences 100 (5): 2197–2202.

Donoho, David L. 2006. “For most large underdetermined systems of equations, the minimal l1-norm near-solution approximates the sparsest near-solution.” Communications on Pure and Applied Mathematics 59 (7): 907–934.

Einav, Liran, Amy Finkelstein, and Paul Schrimpf. 2015. “The Response of Drug Expenditure to Contract Design in Medicare Part D.” Quarterly Journal of Economics 130 (2): 841–899.

Fawcett, Tom. 2006. “An introduction to ROC analysis.” Pattern Recognition Letters 27 (8): 861–874.

Fisher, Ronald A. 1936. “The use of multiple measurements in taxonomic problems.” Annals of Eugenics 7:179–188.

Fix, Evelyn, and J.L. Hodges. 1951. Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties. Technical report. U.S. Air Force, School of Aviation Medicine, Randolph Field.

Freeman, Jean L., Dong Zhang, Daniel H. Freeman, and James S. Goodwin. 2000. “An approach to identifying incident breast cancer cases using Medicare claims data.” Journal of Clinical Epidemiology 53 (6): 605–614.

Friedman, Jerome H. 1991. “Multivariate Adaptive Regression Splines.” The Annals of Statistics 19 (1): 1–67.

Friedman, Jerome H. 2001. “Greedy function approximation: A gradient boosting machine.” The Annals of Statistics 29 (5): 1189–1232.

Friedman, Jerome H. 2002. “Stochastic gradient boosting.” Computational Statistics & Data Analysis 38 (4): 367–378.

Friedman, Jerome H., and Bogdan E. Popescu. 2003. Importance Sampled Learning Ensembles. Technical report. Department of Statistics, Stanford University.

Friedman, Jerome, Trevor Hastie, and Rob Tibshirani. 2010. “Regularization Paths for Gen- eralized Linear Models via Coordinate Descent.” Journal of statistical software 33 (1): 1–22.

Gold, Heather T., and Huong T. Do. 2007. “Evaluation of three algorithms to identify incident breast cancer in medicare claims data.” Health Services Research 42 (5): 2056–2069.

Gorina, Yelena, and Ellen A. Kramarow. 2011. “Identifying chronic conditions in medicare claims data: Evaluating the chronic condition data warehouse algorithm.” Health Services Research 46 (5): 1610–1627.

Greene, William H. 2012. Econometric analysis. 7th. Upper Saddle River: Prentice Hall.

Guyon, Isabelle, and André Elissee . 2003. “An Introduction to Variable and Feature Selection.” Journal of Machine Learning Research 3:1157–1182.

Hara, Konan, Jun Tomio, Thomas Svensson, Rika Ohkuma, Akiko Kishi Svensson, and Tsutomu Yamazaki. 2018. “Association measures of claims-based algorithms for common chronic conditions were assessed using regularly collected data in Japan.” Journal of Clinical Epidemiology 99:84–95.

Hasan, Asad, Zhiyu Wang, and Alireza S. Mahani. 2016. “Fast Estimation of Multinomial Logit Models: R Package mnlogit.” Journal of Statistical Software 75 (3).

Hastie, Trevor. 2018. gam: Generalized Additive Models. https://cran.r-project.org/ package=gam.

Hastie, Trevor, Andreas Buja, and Robert Tibshirani. 1995. “Penalized Discriminant Analysis.” The Annals of Statistics 23 (1): 73–102.

Hastie, Trevor, and Robert Tibshirani. 1986. “Generalized Additive Models.” Statistical Science 1 (3): 297–310.

Hastie, Trevor, and Robert Tibshirani. 1990. Generalized Additive Models. London: Chapman & Hall/CRC.

Hastie, Trevor, Robert Tibshirani, and Andreas Buja. 1994. “Flexible Discriminant Analysis by Optimal Scoring.” Journal of the American Statistical Association 89 (428): 1255–1270.

Hastie, Trevor, Robert Tibshirani, and Jerome H. Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd. 745. New York: Springer.

Hastie, Trevor, Robert Tibshirani, Friedrich Leisch, Kurt Hornik, and Brian D. Ripley. 2017. mda: Mixture and Flexible Discriminant Analysis. https://cran.r- project.org/ package=mda.

Hebert, Paul L., Linda S. Geiss, Edward F. Tierney, Michael M. Engelgau, Barbara P. Yawn, and A. Marshall McBean. 1999. “Identifying Persons with Diabetes Using Medicare Claims Data.” American Journal of Medical Quality 14 (6): 270–277.

Helleputte, Thibault. 2017. LiblineaR: Linear Predictive Models Based on the LIBLINEAR C/C++ Library. https://cran.r-project.org/package=LiblineaR.

Hinton, Geo rey E., Simon Osindero, and Yee-Whye Teh. 2006. “A Fast Learning Algorithm for Deep Belief Nets.” Neural Computation 18 (7): 1527–1554.

Hoerl, Arthur E., and Robert W. Kennard. 1970. “Ridge Regression: Biased Estimation for Nonorthogonal Problems.” Technometrics 12 (1): 55.

Iizuka, Toshiaki. 2012. “Physician agency and adoption of generic pharmaceuticals.” American Economic Review 102 (6): 2826–2858.

Japan Atherosclerosis Society (Eds.) 2013. Guidelines for the management of Dyslipidemia 2013 (in Japanese). Tokyo: Japan Atherosclerosis Society.

Katz, Je rey N., Jane Barrett, Matthew H. Liang, Anne M. Bacon, Herbert Kaplan, Raphael I. Kieval, Stephen M. Lindsey, et al. 1997. “Sensitivity and positive predictive value of medicare part B physician claims for rheumatologic diagnoses and procedures.” Arthritis & Rheumatism 40 (9): 1594–1600.

Kawasumi, Yuko, Michal Abrahamowicz, Pierre Ernst, and Robyn Tamblyn. 2011. “Develop- ment and validation of a predictive algorithm to identify adult asthmatics from medical services and pharmacy claims databases.” Health Services Research 46 (3): 939–963.

Kern, Elizabeth F. O., Miriam Maney, Donald R. Miller, Chin-Lin Tseng, Anjali Tiwari, Mangala Rajan, David Aron, and Leonard Pogach. 2006. “Failure of ICD-9-CM codes to identify patients with comorbid chronic kidney disease in diabetes.” Health Services Research 41 (2): 564–580.

Khera, Rohan, Karen B. Dorsey, and Harlan M. Krumholz. 2018. “Transition to the ICD-10 in the United States.” JAMA 320 (2): 133.

Kimura, Shinya, Toshihiko Sato, Shunya Ikeda, Mitsuhiko Noda, and Takeo Nakayama. 2010. “Development of a Database of Health Insurance Claims: Standardization of Disease Clas- sifications and Anonymous Record Linkage.” Journal of Epidemiology 20 (5): 413–419.

Klabunde, Carrie N., Linda C. Harlan, and Joan L. Warren. 2006. “Data sources for measuring comorbidity: a comparison of hospital records and medicare claims for cancer patients.” Medical care 44 (10): 921–8.

Layton, J. Bradley, Yoonsang Kim, G. Caleb Alexander, and Sherry L. Emery. 2017. “Associ- ation Between Direct-to-Consumer Advertising and Testosterone Testing and Initiation in the United States, 2009-2013.” JAMA 317 (11): 1159–1166.

LeCun, Y. 1989. Generalization and Network Design Strategies. Technical report. Department of Computer Science, Univ. of Toronto.

LeCun, Y., L. Bottou, Y. Bengio, and P. Ha ner. 1998. “Gradient-based learning applied to document recognition.” Proceedings of the IEEE 86 (11): 2278–2324.

Lin, Yi, and Hao Helen Zhang. 2006. “Component selection and smoothing in multivariate nonparametric regression.” Annals of Statistics 34 (5): 2272–2297.

Losina, Elena, Jane Barrett, John A. Baron, and Je rey N. Katz. 2003. “Accuracy of Medicare claims data for rheumatologic diagnoses in total hip replacement recipients.” Journal of Clinical Epidemiology 56 (6): 515–519.

Malley, J. D., J. Kruppa, A. Dasgupta, K. G. Malley, and A. Ziegler. 2012. “Probability Ma- chines.” Methods of Information in Medicine 51 (01): 74–81.

Mardia, Kanti V., John T. Kent, and John M. Bibby. 1979. Multivariate Analysis. New York: Academic Press.

McCulloch, Warren S., and Walter Pitts. 1943. “A logical calculus of the ideas immanent in nervous activity.” The Bulletin of Mathematical Biophysics 5 (4): 115–133.

McWilliams, J. Michael, Laura A. Hatfield, Michael E. Chernew, Bruce E. Landon, and Aaron L. Schwartz. 2016. “Early Performance of Accountable Care Organizations in Medicare.” New England Journal of Medicine 374 (24): 2357–2366.

Ministry of Health Labour and Welfare. 2012. Survey on State of Employees’ Health in 2012 (in Japanese). Accessed October 10, 2018. http://www.e-stat.go.jp/.

Mitchell, Janet B., Thomas Bubolz, John E. Paul, Chris L. Pashos, José J. Escarce, Lawrence H. Muhlbaier, John M. Wiesman, Wanda W. Young, Robert S. Epstein, and Jonathan C. Javitt. 1994. “Using Medicare Claims for Outcomes Research.” Medical Care 32 (7): JS38–JS51.

Mohri, Mehryar, Afshin Rostamizadeh, and Ameet Talwalkar. 2012. Foundations of Machine Learning. Cambridge and London: MIT Press.

Morgan, James N., and John A. Sonquist. 1963. “Problems in the Analysis of Survey Data, and a Proposal.” Journal of the American Statistical Association 58 (302): 415–434.

Muhajarine, Nazeem, Cameron Mustard, Leslie L. Roos, T. Kue Young, and Dale E. Gelskey. 1997. “Comparison of survey and physician claims data for detecting hypertension.” Journal of Clinical Epidemiology 50 (6): 711–718.

Nattinger, Ann B., Purushottam W. Laud, Ruta Bajorunaite, Rodney A. Sparapani, and Jean L. Freeman. 2004. “An Algorithm for the Use of Medicare Claims Data to Identify Women with Incident Breast Cancer.” Health Services Research 39 (6): 1733–1750.

Newton, K. M., P. L. Peissig, A. N. Kho, S. J. Bielinski, R. L. Berg, V. Choudhary, M. Bas- ford, et al. 2013. “Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network.” Journal of the American Medical Informatics Association 20:e147–e154.

Nordstrom, Beth L., Heather S. Norman, Timothy J. Dube, Marsha A. Wilcox, and Alexander M. Walker. 2007. “Identification of abacavir hypersensitivity reaction in health care claims data.” Pharmacoepidemiology and Drug Safety 16 (3): 289–296.

Nuti, Sudhakar V., Li Qin, John S. Rumsfeld, Joseph S. Ross, Frederick A. Masoudi, Sharon- Lise T. Normand, Karthik Murugiah, Susannah M. Bernheim, Lisa G. Suter, and Harlan M. Krumholz. 2016. “Association of Admission to Veterans A airs Hospitals vs Non-Veterans A airs Hospitals With Mortality and Readmission Rates Among Older Men Hospitalized With Acute Myocardial Infarction, Heart Failure, or Pneumonia.” JAMA 315 (6): 582–92.

Østbye, Truls, Donald H. Taylor, Elizabeth C. Clipp, Lynn Van Scoyoc, and Brenda L. Plassman. 2008. “Identification of dementia: Agreement among national survey data, medicare claims, and death certificates.” Health Services Research 43 (1): 313–326.

Perkins, Neil J., and Enrique F. Schisterman. 2006. “The inconsistency of "optimal" cutpoints obtained using two criteria based on the receiver operating characteristic curve.” American Journal of Epidemiology 163 (7): 670–675.

Pinto, David. 2018. fastknn: Build Fast k-Nearest Neighbor Classifiers. https://github. com/davpinto/fastknn.

Quam, Lois, Lynda B.M. Ellis, Pat Venus, Jon Clouse, Cynthia G. Taylor, and Sheila Leatherman. 1993. “Using claims data for epidemiologic research. The concordance of claims-based cri- teria with the medical record and patient survey for identifying a hypertensive population.” Medical care 31 (6): 498–507.

Quan, Hude, Nadia Khan, Brenda R. Hemmelgarn, Karen Tu, Guanmin Chen, Norm Campbell, Michael D. Hill, William A. Ghali, and Finlay A. McAlister. 2009. “Validation of a case definition to define hypertension using administrative data.” Hypertension 54 (6): 1423– 1428.

R Core Team. 2018. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.r-project.org/.

Raskutti, Garvesh, Martin J. Wainwright, and Bin Yu. 2011. “Minimax Rates of Estimation for High-Dimensional Linear Regression Over Lq-Balls.” IEEE Transactions on Information Theory 57 (10): 6976–6994.

Ravikumar, Pradeep, John La erty, Han Liu, and Larry Wasserman. 2009. “Sparse additive models.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71 (5): 1009–1030.

Rector, Thomas S., Steven L. Wickstrom, Mona Shah, N. Thomas Greeenlee, Paula Rheault, Jeannette Rogowski, Vicki Freedman, John Adams, and José J. Escarce. 2004. “Specificity and sensitivity of claims-based algorithms for identifying members of Medicare+Choice health plans that have chronic medical conditions.” Health Services Research 39 (6): 1839– 1857.

Robin, Xavier, Natacha Turck, Alexandre Hainard, Natalia Tiberti, Frédérique Lisacek, Jean- Charles Sanchez, and Markus Müller. 2011. “pROC: an open-source package for R and S+ to analyze and compare ROC curves.” BMC Bioinformatics 12 (1): 77.

Robinson, J. Reneé, T. Kue Young, Leslie L. Roos, and Dale E. Gelskey. 1997. “Estimating the burden of disease. Comparing administrative data and self-reports.” Medical care 35 (9): 932–947.

Rong-En, Fan, Chang Kai-Wei, Hsieh Cho-Jui, Wang Xiang-Rui, and Lin Chih-Jen. 2008. “LIBLINEAR: A Library for Large Linear Classification.” Journal of Machine Learning Research 9:1871–1874.

Rosenblatt, F. 1958. “The perceptron: A probabilistic model for information storage and orga- nization in the brain.” Psychological Review 65 (6): 386–408.

Rumelhart, David E., Geo rey E. Hinton, and Ronald J. Williams. 1986a. “Learning Internal Representations by Error Propagation.” Chap. 8 in Parallel Distributed Processing: Explo- rations in the Microstructure of Cognition, edited by David E. Rumelhart and James L. McClelland, 1:318–362. Cambridge: MIT Press.

Rumelhart, David E., Geo rey E. Hinton, and Ronald J. Williams. 1986b. “Learning represen- tations by back-propagating errors.” Nature 323 (6088): 533–536.

Sands, Kenneth, Gordon Vineyard, James Livingston, Cindy Christiansen, and Richard Platt. 1999. “Efficient Identification of Postdischarge Surgical Site Infections: Use of Automated Pharmacy Dispensing Information, Administrative Data, and Medical Record Information.” The Journal of Infectious Diseases 179 (2): 434–441.

Schermerhorn, Marc L., Dominique B. Buck, A. James O’Malley, Thomas Curran, John C. Mc- Callum, Jeremy Darling, and Bruce E. Landon. 2015. “Long-Term Outcomes of Abdominal Aortic Aneurysm in the Medicare Population.” New England Journal of Medicine 373 (4): 328–338.

Scholes, D., O. Yu, M. A. Raebel, B. Trabert, and V. L. Holt. 2011. “Improving automated case finding for ectopic pregnancy using a classification algorithm.” Human Reproduction 26 (11): 3163–3168.

Segal, Mark R. 2004. “Machine learning benchmarks and random forest regression.” UCSF: Center for Bioinformatics and Molecular Biostatistics.

Shepard, Donald. 1968. “A two-dimensional interpolation function for irregularly-spaced data.” In Proceedings of the 23rd ACM National Conference, 517–524. New York, USA: ACM.

Shevade, S. K., and S. S. Keerthi. 2003. “A simple and efficient algorithm for gene selection using sparse logistic regression.” Bioinformatics 19 (17): 2246–2253.

Simard, Patrice, Yann LeCun, and John S. Denker. 1992. “Efficient pattern recognition using a new transformation distance.” In NIPS’92: Proceedings of the 5th International Conference on Neural Information Processing Systems, 50–58.

Stevenson, Mark, Telmo Nunes, Cord Heuer, Jonathon Marshall, Javier Sanchez, Ron Thornton, Jeno Reiczigel, et al. 2018. epiR: Tools for the Analysis of Epidemiological Data. https://cran.r-project.org/package=epiR.

Taylor, Donald H., Gerda G. Fillenbaum, and Michael E. Ezell. 2002. “The accuracy of medicare claims data in identifying Alzheimer’s disease.” Journal of Clinical Epidemiology 55 (9): 929–937.

Taylor, Donald H., Truls Østbye, Kenneth M. Langa, David Weir, and Brenda L. Plassman. 2009. “The accuracy of medicare claims as an epidemiological tool: The case of dementia revisited.” Journal of Alzheimer’s Disease 17 (4): 807–815.

Tessier-Sherman, Baylah, Deron Galusha, Oyebode a Taiwo, Linda Cantley, Martin D Slade, Sharon R Kirsche, and Mark R Cullen. 2013. “Further validation that claims data are a useful tool for epidemiologic research on hypertension.” BMC public health 13:51.

The Japan Diabetes Society (Eds.) 2016. Guidelines for the management of Diabetes 2016 (in Japanese). Tokyo: The Japan Diabetes Society.

The Japanese Society of Hypertension (Eds.) 2014. Guidelines for the management of Hyper- tension 2014 (in Japanese). Tokyo: The Japanese Society of Hypertension.

Tibshirani, Robert. 1996. “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society. Series B (Methodological) 58 (1): 267–288.

Tikhonov, Andrey N. 1963. “Solution of incorrectly formulated problems and the regularization method.” Soviet Math. Dokl. 4:1035–1038.

Tu, Karen, Doug Manuel, Kelvin Lam, Doug Kavanagh, Tezeta F. Mitiku, and Helen Guo. 2011. “Diabetics can be identified in an electronic medical record using laboratory tests and prescriptions.” Journal of Clinical Epidemiology 64 (4): 431–435.

Upadhyaya, Sudhi G., Dennis H. Murphree, Che G. Ngufor, Alison M. Knight, Daniel J. Cronk, Robert R. Cima, Timothy B. Curry, Jyotishman Pathak, Rickey E. Carter, and Daryl J. Kor. 2017. “Automated Diabetes Case Identification Using Electronic Health Record Data at a Tertiary Care Facility.” Mayo Clinic Proceedings: Innovations, Quality & Outcomes 1 (1): 100–110.

Van Walraven, Carl, Carol Bennett, and Alan J. Forster. 2011. “Administrative database re- search infrequently used validated diagnostic or procedural codes.” Journal of Clinical Epidemiology 64 (10): 1054–1059.

Vapnik, Vladimir N. 1999. “An Overview of Statistical Learning Theory.” IEEE Transactions on Neural Networks 10 (5): 988–999.

Venables, W. N., and B. D. Ripley. 2002. Modern Applied Statistics with S. Fourth. New York: Springer.

Virnig, Beth A, and Marshall McBean. 2001. “Administrative Data for Public Health Surveil- lance and Planning.” Annual Review of Public Health 22 (1): 213–230.

Waldron, Levi, Melania Pintilie, Ming-sound Tsao, Frances A Shepherd, Curtis Huttenhower, and Igor Jurisica. 2011. “Optimized application of penalized regression methods to diverse genomic data.” Bioinformatics 27 (24): 3399–3406.

Walraven, Carl van, and Ian Colman. 2016. “Migraineurs were reliably identified using admin- istrative data.” Journal of Clinical Epidemiology 71:68–75.

WHO. 2018a. WHO - International Classification of Diseases. Accessed October 10. http://www.who.int/classifications/icd/en/.

WHO. 2018b. WHOCC - ATC/DDD Index. Accessed October 10. https://www.whocc.no/ atc_ddd_index/.

Wilchesky, Machelle, Robyn M. Tamblyn, and Allen Huang. 2004. “Validation of diagnostic codes within medical services claims.” Journal of Clinical Epidemiology 57 (2): 131–141.

Wooldridge, Je rey M. 2010. Econometric analysis of cross section and panel data. 2nd. Cambridge: MIT Press.

Wright, Marvin N., and Andreas Ziegler. 2017. “ranger : A Fast Implementation of Random Forests for High Dimensional Data in C++ and R.” Journal of Statistical Software 77 (1).

Yamana, Hayato, Hiromasa Horiguchi, Kiyohide Fushimi, and Hideo Yasunaga. 2016. “Com- parison of Procedure-Based and Diagnosis-Based Identifications of Severe Sepsis and Disseminated Intravascular Coagulation in Administrative Data.” Journal of Epidemiology 26 (10): 1–8.

Yamana, Hayato, Mutsuko Moriwaki, Hiromasa Horiguchi, Mariko Kodan, Kiyohide Fushimi, and Hide Yasunaga. 2017. “Validity of diagnoses, procedures, and laboratory data in Japanese administrative data.” Journal of Epidemiology: 1–7.

Youden, W. J. 1950. “Index for rating diagnostic tests.” Cancer 3 (1): 32–35.

Zhu, JI, and Trevor Hastie. 2004. “Classification of gene microarrays by penalized logistic regression.” Biostatistics 5 (3): 427–443.

Zou, Hui, and Trevor Hastie. 2005. “Regularization and variable selection via the elastic net.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 (2): 301–320.

参考文献をもっと見る

分野

大学

学位論文種類・取得年

言語

Claims-based algorithms for common chronic conditions were investigated using regularly collected date in Japan

概要

関連論文

X線画像を用いた深層学習に基づく放射線検査支援とその応用に関する研究

Novel methods to detect signals using information of similar drugs in spontaneous reporting systems

次世代シーケンサーによる遺伝子解析の精度保証に関する研究

因子グラフを基礎とした病態生理学的知識の新たな表現手法に関する研究 : 病態に基づく診断を支援するシステムの実現に向けて

Robotic in vitro selection of functional cyclic peptides for diverse target proteins

参考文献

分野

大学

学位論文種類・取得年

言語

コピーが完了しました

URLをコピーしました

Claims-based algorithms for common chronic conditions were investigated using regularly collected date in Japan

概要

関連論文

X線画像を用いた深層学習に基づく放射線検査支援とその応用に関する研究

Novel methods to detect signals using information of similar drugs in spontaneous reporting systems

次世代シーケンサーによる遺伝子解析の精度保証に関する研究

因子グラフを基礎とした病態生理学的知識の新たな表現手法に関する研究 : 病態に基づく診断を支援するシステムの実現に向けて

Robotic in vitro selection of functional cyclic peptides for diverse target proteins

参考文献