Integrating multi-modal clinical data, such as electronic health records (EHR) and chest X-ray images (CXR), is particularly beneficial for clinical prediction tasks. However, in a temporal setting, multi-modal data are often inherently asynchronous. EHR can be continuously collected but CXR is generally taken with a much longer interval due to its high cost and radiation dose. When clinical prediction is needed, the last available CXR image might have been outdated, leading to suboptimal predictions. To address this challenge, we propose DDL-CXR, a method that dynamically generates an up-to-date latent representation of the individualized CXR images. Our approach leverages latent diffusion models for patient-specific generation strategically conditioned on a previous CXR image and EHR time series, providing information regarding anatomical structures and disease progressions, respectively. In this way, the interaction across modalities could be better captured by the latent CXR generation process, ultimately improving the prediction performance. Experiments using MIMIC datasets show that the proposed model could effectively address asynchronicity in multimodal fusion and consistently outperform existing methods.
IEEE CAI
An End-to-end Learning Approach for Counterfactual Generation and Individual Treatment Effect Estimation
F.
Wu
, K.
Yin
, and W. K.
Cheung
In 2024 IEEE Conference on Artificial Intelligence (CAI) , 2024
Estimating the causal effect due to an intervention is important for many applications, such as healthcare. Unobserved counterfactuals make unbiased treatment effect estimation non-trivial. Among existing approaches, counterfactual generation which augments observational data with generated pseudo counterfactuals has been found promising for reducing the bias. These methods typically take a two-stage approach for the counterfactual generation and treatment effect estimation. Therefore, the counterfactual generation could be sub-optimal. To this end, we propose to jointly optimize the auxiliary models for generating the counterfactuals and the outcome estimation models. In particular, we demonstrate the viability by first connecting a counterfactual outcome generator with a reparameterized VAE model, and then learning them in an end-to-end fashion using the EM algorithm. Our evaluation results based on synthetic and semi-synthetic datasets show that a simple causal effect VAE model learned together with the counterfactual outcome generator can outperform a number of SOTA models for treatment effect estimation.
AAAI-24
DrFuse: Learning Disentangled Representation for Clinical Multi-Modal Fusion with Missing Modality and Modal Inconsistency
W.
Yao*
, K.
Yin*
, W. K.
Cheung
, J.
Liu
, and J.
Qin
In Proceedings of the AAAI Conference on Artificial Intelligence , 2024
The combination of electronic health records (EHR) and medical images is crucial for clinicians in making diagnoses and forecasting prognoses. Strategically fusing these two data modalities has great potential to improve the accuracy of machine learning models in clinical prediction tasks. However, the asynchronous and complementary nature of EHR and medical images presents unique challenges. Missing modalities due to clinical and administrative factors are inevitable in practice, and the significance of each data modality varies depending on the patient and the prediction target, resulting in inconsistent predictions and suboptimal model performance. To address these challenges, we propose DrFuse to achieve effective clinical multi-modal fusion. It tackles the missing modality issue by disentangling the features shared across modalities and those unique within each modality. Furthermore, we address the modal inconsistency issue via a disease-wise attention layer that produces the patient- and disease-wise weighting for each modality to make the final prediction. We validate the proposed method using real-world large-scale datasets, MIMIC-IV and MIMIC-CXR. Experimental results show that the proposed method significantly outperforms the state-of-the-art models.
IEEE JBHI
DNA-T: Deformable Neighborhood Attention Transformer for Irregular Medical Time Series
J.
Huang
, B.
Yang
, K.
Yin
, and J.
Xu
IEEE Journal of Biomedical and Health Informatics, 2024
The real-world Electronic Health Records (EHRs) present irregularities due to changes in the patient’s health status, resulting in various time intervals between observations and different physiological variables examined at each observation point. There have been recent applications of Transformer-based models in the field of irregular time series. However, the full attention mechanism in Transformer overly focuses on distant information, ignoring the short-term correlations of the condition. Thereby, the model is not able to capture localized changes or short-term fluctuations in patients’ conditions. Therefore, we propose a novel end-to-end Deformable Neighborhood Attention Transformer (DNA-T) for irregular medical time series. The DNA-T captures local features by dynamically adjusting the receptive field of attention and aggregating relevant deformable neighborhoods in irregular time series. Specifically, we design a Deformable Neighborhood Attention (DNA) module that enables the network to attend to relevant neighborhoods by drifting the receiving field of neighborhood attention. The DNA enhances the model’s sensitivity to local information and representation of local features, thereby capturing the correlation of localized changes in patients’ conditions. We conduct extensive experiments to validate the effectiveness of DNA-T, outperforming existing state-of-the-art methods in predicting the mortality risk of patients. Moreover, we visualize an example to validate the effectiveness of the proposed DNA. Our code is available at https://github.com/Nekumiya-x/DNA-T.
2023
ACM TIST
Adaptive Integration of Categorical and Multi-relational Ontologies with EHR Data for Medical Concept Embedding
C. W.
Cheong
, K.
Yin
, W. K.
Cheung
, B. C.
Fung
, and J.
Poon
ACM Transactions on Intelligent Systems and Technology, 2023
Representation learning has been applied to Electronic Health Records (EHR) for medical concept embedding and the downstream predictive analytics tasks with promising results. Medical ontologies can also be integrated to guide the learning so the embedding space can better align with existing medical knowledge. Yet, properly carrying out the integration is non-trivial. Medical concepts that are similar according to a medical ontology may not be necessarily close in the embedding space learned from the EHR data, as medical ontologies organize medical concepts for their own specific objectives. Any integration methodology without considering the underlying inconsistency will result in sub-optimal medical concept embedding and, in turn, degrade the performance of the downstream tasks. In this article, we propose a novel representation learning framework called ADORE (ADaptive Ontological REpresentations) that allows the medical ontologies to adapt their structures for more robust integrating with the EHR data. ADORE first learns multiple embeddings for each category in the ontology via an attention mechanism. At the same time, it supports an adaptive integration of categorical and multi-relational ontologies in the embedding space using a category-aware graph attention network. We evaluate the performance of ADORE on a number of predictive analytics tasks using two EHR datasets. Our experimental results show that the medical concept embeddings obtained by ADORE can outperform the state-of-the-art methods for all the tasks. More importantly, it can result in clinically meaningful sub-categorization of the existing ontological categories and yield attention values that can further enhance the model interpretability.
IEEE TKDE
PATNet: Propensity-Adjusted Temporal Network for Joint Imputation and Prediction Using Binary EHRs With Observation Bias
K.
Yin
, D.
Qian
, and W. K.
Cheung
IEEE Transactions on Knowledge and Data Engineering, 2023
Predictive analysis of electronic health records (EHR) is a fundamental task that could provide actionable insights to help clinicians improve the efficiency and quality of care. EHR are commonly recorded in binary format and contain inevitable missing data. The nature of missingness may vary by patients, clinical features, and time, which incurs observation bias. It is essential to account for the binary missingness and observation bias or the predictive performance could be substantially compromised. In this paper, we develop a propensity-adjusted temporal network (PATNet) to conduct data imputation and predictive analysis simultaneously. PATNet contains three subnetworks: 1) an imputation subnetwork that generates the initial imputation based on historical observations, 2) a propensity subnetwork that infers the patient-, feature-, and time-dependent propensity scores, and 3) a prediction subnetwork that produces the missing-informative prediction using the propensity-adjusted imputations and the missing probabilities. To allow the propensity scores to be inferred from data, we use the expectation-maximization (EM) algorithm to learn the imputation and propensity subnetworks and incorporate a low-rank constraint via PARAFAC2 approximation. Extensive evaluation using the MIMIC-III and eICU datasets demonstrates that PATNet outperforms the state-of-the-art methods in terms of binary data imputation, disease progression modeling, and mortality prediction tasks.
2022
IEEE TKDE
Learning Inter-Modal Correspondence and Phenotypes From Multi-Modal Electronic Health Records
K.
Yin
, W. K.
Cheung
, B. C.
Fung
, and J.
Poon
IEEE Transactions on Knowledge and Data Engineering, 2022
Non-negative tensor factorization has been shown a practical solution to automatically discover phenotypes from the electronic health records (EHR) with minimal human supervision. Such methods generally require an input tensor describing the inter-modal interactions to be pre-established; however, the correspondence between different modalities (e.g., correspondence between medications and diagnoses) can often be missing in practice. Although heuristic methods can be applied to estimate them, they inevitably introduce errors, and leads to sub-optimal phenotype quality. This is particularly important for patients with complex health conditions (e.g., in critical care) as multiple diagnoses and medications are simultaneously present in the records. To alleviate this problem and discover phenotypes from EHR with unobserved inter-modal correspondence, we propose the collective hidden interaction tensor factorization (cHITF) to infer the correspondence between multiple modalities jointly with the phenotype discovery. We assume that the observed matrix for each modality is marginalization of the unobserved inter-modal correspondence, which are reconstructed by maximizing the likelihood of the observed matrices. Extensive experiments conducted on the real-world MIMIC-III dataset demonstrate that cHITF effectively infers clinically meaningful inter-modal correspondence, discovers phenotypes that are more clinically relevant and diverse, and achieves better predictive performance compared with a number of state-of-the-art computational phenotyping models.
2021
AAAI-21
SWIFT: Scalable Wasserstein factorization for sparse nonnegative tensors
A.
Afshar
, K.
Yin
, S.
Yan
, C.
Qian
, J.
Ho
, H.
Park
, and J.
Sun
In Proceedings of the AAAI Conference on Artificial Intelligence , 2021
Existing tensor factorization methods assume that the input tensor follows some specific distribution (i.e. Poisson, Bernoulli, and Gaussian), and solve the factorization by minimizing some empirical loss functions defined based on the corresponding distribution. However, it suffers from several drawbacks: 1) In reality, the underlying distributions are complicated and unknown, making it infeasible to be approximated by a simple distribution. 2) The correlation across dimensions of the input tensor is not well utilized, leading to sub-optimal performance. Although heuristics were proposed to incorporate such correlation as side information under Gaussian distribution, they can not easily be generalized to other distributions. Thus, a more principled way of utilizing the correlation in tensor factorization models is still an open challenge. Without assuming any explicit distribution, we formulate the tensor factorization as an optimal transport problem with Wasserstein distance, which can handle non-negative inputs. We introduce SWIFT, which minimizes the Wasserstein distance that measures the distance between the input tensor and that of the reconstruction. In particular, we define the N-th order tensor Wasserstein loss for the widely used tensor CP factorization and derive the optimization algorithm that minimizes it. By leveraging sparsity structure and different equivalent formulations for optimizing computational efficiency, SWIFT is as scalable as other well-known CP algorithms. Using the factor matrices as features, SWIFT achieves up to 9.65% and 11.31% relative improvement over baselines for downstream prediction tasks. Under the noisy conditions, SWIFT achieves up to 15% and 17% relative improvements over the best competitors for the prediction tasks.
SDM-21
TedPar: Temporally dependent PARAFAC2 factorization for phenotype-based disease progression modeling
K.
Yin
, W. K.
Cheung
, B. C.
Fung
, and J.
Poon
In Proceedings of the 2021 SIAM International Conference on Data Mining (SDM) , 2021
PARAFAC2 factorization provides a practical solution to map the temporally irregular electronic health records (EHR) to clinically relevant and interpretable phenotypes. Existing methods ignore the effect of interdependency of diseases over clinical history. Consequently, the crucial temporal information contained in the EHR data cannot be fully utilized and the learned phenotypes can be sub-optimal to characterize patients with progressive conditions. To address this issue, we propose a novel temporally dependent PARAFAC2 (TedPar) factorization in which the temporal dependency among the phenotypes is explicitly modeled. TedPar learns a set of target phenotypes to capture the clinical features relevant to the diseases of interest and a set of background phenotypes to capture irrelevant but frequently co-occurring clinical features. By effectively modeling the temporal dependency and separating relevant and irrelevant features, the discovered target phenotypes can be used to model the progression of the diseases of interest. Empirical evaluations show that TedPar obtains up to 32.4% relative improvement in reconstruction accuracy over the test set, suggesting significantly better generalizability than the baselines for both noise-free and heavily noisy input data. Qualitative analysis also shows that TedPar is capable of discovering clinically meaningful phenotypes and capturing the temporal dependency between them.
2020
KDD-20
LogPar: Logistic PARAFAC2 factorization for temporal binary data with missing values
K.
Yin
, A.
Afshar
, J. C.
Ho
, W. K.
Cheung
, C.
Zhang
, and J.
Sun
In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , 2020
Research Track; acceptance ratio: 216⁄1279 = 16.9%
Binary data with one-class missing values are ubiquitous in real-world applications. They can be represented by irregular tensors with varying sizes in one dimension, where value one means presence of a feature while zero means unknown (i.e., either presence or absence of a feature). Learning accurate low-rank approximations from such binary irregular tensors is a challenging task. However, none of the existing models developed for factorizing irregular tensors take the missing values into account, and they assume Gaussian distributions, resulting in a distribution mismatch when applied to binary data. In this paper, we propose Logistic PARAFAC2 (LogPar) by modeling the binary irregular tensor with Bernoulli distribution parameterized by an underlying real-valued tensor. Then we approximate the underlying tensor with a positive-unlabeled learning loss function to account for the missing values. We also incorporate uniqueness and temporal smoothness regularization to enhance the interpretability. Extensive experiments using large-scale real-world datasets show that LogPar outperforms all baselines in both irregular tensor completion and downstream predictive tasks. For the irregular tensor completion, LogPar achieves up to 26% relative improvement compared to the best baseline. Besides, LogPar obtains relative improvement of 13.2% for heart failure prediction and 14% for mortality prediction on average compared to the state-of-the-art PARAFAC2 models.
JHIR
Context-aware time series imputation for multi-analyte clinical data
K.
Yin
, L.
Feng
, and W. K.
Cheung
Journal of Healthcare Informatics Research, 2020
This is an extension of our previous two-page abstract appeared in ICHI-19.
Clinical time series imputation is recognized as an essential task in clinical data analytics. Most models rely either on strong assumptions regarding the underlying data-generation process or on preservation of only local properties without effective consideration of global dependencies. To advance the state of the art in clinical time series imputation, we participated in the 2019 ICHI Data Analytics Challenge on Missing Data Imputation (DACMI). In this paper, we present our proposed model: Context-Aware Time Series Imputation (CATSI), a novel framework based on a bidirectional LSTM in which patients’ health states are explicitly captured by learning a “global context vector” from the entire clinical time series. The imputations are then produced with reference to the global context vector. We also incorporate a cross-feature imputation component to explore the complex feature correlations. Empirical evaluations demonstrate that CATSI obtains a normalized root mean square deviation (nRMSD) of 0.1998, which is 10.6% better than that of state-of-the-art models. Further experiments on consecutive missing datasets also illustrate the effectiveness of incorporating the global context in the generation of accurate imputations.
2019
AAAI-19
Learning phenotypes and dynamic patient representations via RNN regularized collective non-negative tensor factorization
K.
Yin
, D.
Qian
, W. K.
Cheung
, B. C.
Fung
, and J.
Poon
In Proceedings of the AAAI Conference on Artificial Intelligence , 2019
Non-negative Tensor Factorization (NTF) has been shown effective to discover clinically relevant and interpretable phenotypes from Electronic Health Records (EHR). Existing NTF based computational phenotyping models aggregate data over the observation window, resulting in the learned phenotypes being mixtures of disease states appearing at different times. We argue that by separating the clinical events happening at different times in the input tensor, the temporal dynamics and the disease progression within the observation window could be modeled and the learned phenotypes will correspond to more specific disease states. Yet how to construct the tensor for data samples with different temporal lengths and properly capture the temporal relationship specific to each individual data sample remains an open challenge. In this paper, we propose a novel Collective Non-negative Tensor Factorization (CNTF) model where each patient is represented by a temporal tensor, and all of the temporal tensors are factorized collectively with the phenotype definitions being shared across all patients. The proposed CNTF model is also flexible to incorporate non-temporal data modality and RNN-based temporal regularization. We validate the proposed model using MIMIC-III dataset, and the empirical results show that the learned phenotypes are clinically interpretable. Moreover, the proposed CNTF model outperforms the state-of-the-art computational phenotyping models for the mortality prediction task.
IJCAI-19
Medical Concept Embedding with Multiple Ontological Representations
L.
Song
, C. W.
Cheong
, K.
Yin
, W. K.
Cheung
, B. C. M.
Fung
, and J.
Poon
In Proceedings of the 28th International Joint Conference on Artificial Intelligence , 2019
Learning representations of medical concepts from the Electronic Health Records (EHR) has been shown effective for predictive analytics in healthcare. Incorporation of medical ontologies has also been explored to further enhance the accuracy and to ensure better alignment with the known medical knowledge. Most of the existing work assumes that medical concepts under the same ontological category should share similar representations, which however does not always hold. In particular, the categorizations in medical ontologies were established with various factors being considered. Medical concepts even under the same ontological category may not follow similar occurrence patterns in the EHR data, leading to contradicting objectives for the representation learning. In this paper, we propose a deep learning model called MMORE which alleviates this conflicting objective issue by allowing multiple representations to be inferred for each ontological category via an attention mechanism. We apply MMORE to diagnosis prediction and our experimental results show that the representations obtained by MMORE can achieve better predictive accuracy and result in clinically meaningful sub-categorization of the existing ontological categories.
ICHI-19
Context-aware imputation for clinical time series
K.
Yin
, and W. K.
Cheung
In 2019 IEEE International Conference on Healthcare Informatics (ICHI) , 2019
Non-negative tensor factorization has been shown effective for discovering phenotypes from the EHR data with minimal human supervision. In most cases, an interaction tensor of the elements in the EHR (e.g., diagnoses and medications) has to be first established before the factorization can be applied. Such correspondence information however is often missing. While different heuristics can be used to estimate the missing correspondence, any errors introduced will in turn cause inaccuracy for the subsequent phenotype discovery task. This is especially true for patients with multiple diseases diagnosed (e.g., under critical care). To alleviate this limitation, we propose the hidden interaction tensor factorization (HITF) where the diagnosis-medication correspondence and the underlying phenotypes are inferred simultaneously. We formulate it under a Poisson non-negative tensor factorization framework and learn the HITF model via maximum likelihood estimation. For performance evaluation, we applied HITF to the MIMIC III dataset. Our empirical results show that both the phenotypes and the correspondence inferred are clinically meaningful. In addition, the inferred HITF model outperforms a number of state-of-the-art methods for mortality prediction.
JAAS
Identifying laser-induced plasma emission spectra of particles in a gas–solid flow based on the standard deviation of intensity across an emission line
S.
Yao
, L.
Zhang
, K.
Yin
, K.
Bai
, J.
Xu
, Z.
Lu
, and J.
Lu
Journal of Analytical Atomic Spectrometry, 2018
This is an extension of my undergraduate final-semester project.