[Oxford]Privacy-preserving construction of generalized linear mixed model for biomedical computation

Issue Cover

The GLMM is an expansion of the GLM in which the linear predictor uses random effects. The technique is employed in genome-wide association studies (GWASs) that try to identify substantial genetic diversity that is linked with human illnesses, because the method is adept at accurately simulating the effects of random changes from many sources. The widespread reluctance to share sensitive data is a significant obstacle to genetic research of huge numbers of patients at many different institutions. To tackle these worries, we discuss in this article a privacy-preserving Expectation–Maximization (EM) method that can jointly construct a Generalized Linear Model (GLMM) with no central server, since it utilizes data spread among many participants. To start, we expect that the data is split across several organizations: That is, each party maintains its own copy of all the records, and every record is jointly controlled by the same fixed and random effects.

Using a distributed computing method, EM algorithms do their work with input data being divided among several collaborating universities. This way, each participant may retain their own copy of the input data on their server, and only the preliminary findings (such partial proposal probabilities and the partial Hessian matrix) need to be transmitted to the other participants. Computation on a central server is still required, as is transferring data between the central server and the server of each person involved. Even if you run a large GLMM job through many thousands of rounds of communication, the overall data transfer between servers will still be modest (a few hundred megabytes). Because of this, we believe our privacy-preserving method is well-suited for collaborating on GLMM creation. Though we see that the data exchange may limit the method’s use, it has a lot of iterations, and owing to that, a lot of data transmission. The approximation methods to the MH algorithm will be explored in order to see if fewer rounds of communication are needed without compromising its accuracy.

An method for building a general latent-group model (cGLMM) for horizontally partitioned data includes finding each participating institution’s full subject records. It’s worth noting that the data may be partitioned vertically under certain conditions. The scenario being provided is that of two different medical organizations sharing clinical data on a portion of the same patients, and trying to bring their own partial data together for some collaborative analysis.

Reference :

Categories: Clinical