Welcome! The following statistics provide some visusal insights into LEOSS Public Data Set. The Public Data Set constitutes patient data from the LEOSS cohort after a data cleaning process and includes data from patients documented until December 17, 2020. The LEOSS Public Data Set is originating from the LEOSS Initiative. The data preprocessing pipeline is described by Jakob et al. in "Design and evaluation of a data anonymization pipeline to promote Open Science on COVID-19".
Copyright: This work is licensed under the Creative Commons Attribution Non-Commercial 4.0 License. With the use of this data you agree to include a proper acknowledgement of the LEOSS study group in any work based on the data set. By working with this notebook you agree to maintain the confidentiality of the data set at all times and to not attempt to compromise or otherwise violate the privacy of the patients described. To view a copy of the license, visit https://creativecommons.org/licenses/by-nc/4.0/.
Acknowledgements: These analyses are based on voluntary work by PROCON IT, which we are truly grateful for.
If you have any comments on the notebook, please drop us a message at analysis@leoss.net.
Here we provide information on the basic structure of the LEOSS Public Data Set.
The data set consists of more than 7000 patients and 16 variables. Though this analysis is restricted to only the first 4802 patients. A row represents anonymized data of a single patient.
The columns are described by the variables:
*The Clinical Phases are defined according to the LEOSS criteria on https://leoss.net/statistics/:
Uncomplicated Phase:
OR
Complicated Phase:
Critical Phase:
Recovery Phase:
AND
AND
To get to know the Public Data Set better, the values of variables are shown below according to the used data set. Please be aware that the Public Data Set is only a part of the complete LEOSS data set. Anonymization processes may lead to variables having less values than in the complete LEOSS data set. For example the variable 'Sex' can also have the value 'Diverse', but there is no patient with this sex in the Public Data Set.
n/a: In cases where the patient was not in the respective phase a variable refers to, the variable has been given the value 'Not applicable (N/a)'. If for example a patient has never been in the Critical Phase, 'Vasopressors.in.critical.phase' is a variable which is not applicable to this patient.
These are the first 50 patients in the Public Data Set:
The following descriptive statistics are computed in this section:
The total number of patients is 4802.
The total number of patients is 4802.
The total number of patients is 4802.
The following descriptive statistics on the health status at the end of medical consultation are computed in this section:
Note that we will use a filtered data set for computing the rates, which we describe below.
The total number of patients is 4802.
For the COVID-19 mortality and recovery rate computations, we exclude patients with a documented health status at the end of medical consultation of 'unknown/missing', 'not recovered', and 'dead from other causes'. Please note that this influences the following computations and plots.
The number of patients in the filtered data set is 4285.
The number of patients in the filtered data set is 4285. Patients with a documented health status at the end of medical consultation of 'unknown/missing', 'not recovered', and 'dead from other causes' are excluded in the filtered data set.
The number of patients in the filtered data set is 4285. Patients with a documented health status at the end of medical consultation of 'unknown/missing', 'not recovered', and 'dead from other causes' are excluded in the filtered data set.
From here on we will indicate the four clinical phases as
In the following we will plot the:
The Baseline/diagnosis is defined as the day when the sample of the first positive SARS-CoV-2 result was taken.
The disease courses are denoted as compositions of the above phase abbreviations. 'UC_RC' is, for example, whenever a patient was in the Uncomplicated Phase at Baseline and then in the Recovery Phase without a severe disease progression (Complicated or Critical Phase).
Since there might be patients who have no phase documented at all we need to proceed with a filtered data set in which those patients are dropped.
The number of patients in this filtered data set is 4797.
Please note that these numbers add up to more than the total number of patients as each patient can be in different phases during the course of disease.
Indicated by the value name starting with UC_.
The number of patients being in the Uncomplicated Phase at Baseline is 4102 from 4797 total patients.
The number of patients being in the Uncomplicated Phase at Baseline is 4102 from 4797 total patients.
The number of patients being in the Uncomplicated Phase at Baseline is 4102 from 4797 total patients. The number of male patients being in the Uncomplicated Phase at Baseline is 2357 from 2786 total male patients. The number of female patients being in the Uncomplicated Phase at Baseline is 1745 from 2011 total female patients.
The number of male patients being in the Uncomplicated Phase at Baseline is 2357 from 2786 total male patients. The number of female patients being in the Uncomplicated Phase at Baseline is 1745 from 2011 total female patients.
Indicated by the value name starting with CO_.
The number of patients being in the Complicated Phase at Baseline is 536 from 4797 total patients.
The number of patients being in the Complicated Phase at Baseline is 536 from 4797 total patients.
The number of patients being in the Complicated Phase at Baseline is 536 from 4797 total patients. The number of male patients being in the Complicated Phase at Baseline is 326 from 2786 total male patients. The number of female patients being in the Complicated Phase at Baseline is 210 from 2011 total female patients.
In LEOSS superinfections are recorded as 'Proven bacterial infection', 'Probable or suspected bacterial infection', 'Proven fungal infection' or 'Probable or suspected fungal infection'.
The number of patients with any at least probable or suspected superinfection is 1718 from 4797 total patients.