Autoimmune Disease Phenotype

Autoimmune diseases (AID) refer to destructive conditions involving an aberrant chronic activation of the adaptive immune system, where the immune cells instead of producing antibodies to attack foreign invaders, mistakenly attack the body’s own healthy cells.  While autoimmune diseases are heterogeneous according to symptoms, lesion types, and prognosis, and are usually studied in isolation according to groups based on organ system; various autoimmunity diseases share similar immune effector mechanisms.  Recent genetic studies suggest that many autoimmune and chronic autoinflammatory conditions share common pathogenic mechanisms, providing a rationale to combine different AID subtypes into a single comprehensive electronic phenotype.  An algorithm that defines the spectrum of autoimmune and autoinflammatory conditions based on electornic health records is ideally suited for studies of pleiotropy and shared genetic susceptibility.

The phenotype developed for this study considers a group of autoimmune diseases within a single algorithm and the phenotype is generated by combining a number of knowledge sources.  Despite the heterogeneity of the diseases, this phenotype demonstrates that a commonality can be derived from individual diseases and classified under an “umbrella” phenotype.  



(July 27, 2017)

Phenotype ID: 
Do Not List on the Collaboration Phenotypes List
Type of Phenotype: 
Maurine Tong, Krzysztof Kiryluk, Lynn Petukhova, Runsheng (Bridget) Wang, Benjamin Lebwohl, Ali G. Gharavi, Chunhua Weng, George M. Hripcsak,
Contact Author: 
Date Created: 
Wednesday, July 26, 2017
Network Associations: 
Owner Phenotyping Groups: 
Data Model: 

Suggested Citation

Maurine Tong, Krzysztof Kiryluk, Lynn Petukhova, Runsheng (Bridget) Wang, Benjamin Lebwohl, Ali G. Gharavi, Chunhua Weng, George M. Hripcsak,. Columbia University. Autoimmune Disease Phenotype. PheKB; 2017 Available from:



We cannot find a document with your pseudocode, eg. a description of the steps that the programmer should take to define potential cases and potential controls for chart reviews.  We have questions about the items outlined in the figure that could be made clear by a more detailed set of written steps.  For example, when you define "at least 3 diagnosis codes", is this for each line of the AIDalgorithm_V1_coding cases.cvs file, column C?  Or do you want us to consider a disease group as in column A, where the coding rule would be at least 3 instances of any code for any condition in that category?  Thanks, Beth

It does not have to be the same exact diagnosis code repeated. It has to be any code for that disease. There has to be at least one code on at least three days, and the first and last have to be separated by at least 7 days. 

Can you add short names in your data dictionary for the following serology lab tests:

Serology: Endomysium IgG + IgA

Serology: GAD (Glutamate decarboxylase)

Serology: Gliadin IgA + IgG

Serology: ICA + IA-2 + ICA 512 (Islet cell)

Serology: Tissue Transglutaminase IgA


Lab codes got updated to AIDalgorithm_coding_V3.txt and above labs are removed since they are not part of the algorithm. 

Thank you. 

is it possible to provide this as columnar format, e.g. as an .xls file?  it's challenging to work with in .txt format.  thanks.

The aglorithm code file is updated to AIDalgorithm_coding_V3.txt. In order to generate a .xls file, please make a new xls file and format cells to "Text" category, then directly copy contents from AIDalgorithm_coding_V3.txt to this .xls file. 

Thank you. 

this file is listed in the algorithm document, but is not one of the included files in the phenotype.  there may be other files in the algorithm document that are not included.  thanks.

in the algorithm document i find 25 distinct lab tests called out, from table 3 and table 4B1.  this does not seem to correspond exactly with the lab codes in the codes file  (codecategory=lab.serology, lab.complement,  lab.immunoglobulin). 

can you reconcile the list of lab tests in the 2 sources? 


We just updated algorithm coding file to AIDalgorithm_coding_V3.txt, now codes provided should be mapped to the main documentation tables (table 3 and table 4B1). Thank you. 

The change in AutoimmuneDiseaseAlgorithm_V4 is the coding file name from AIDalgorithm_coding_v2.txt to AIDalgorithm_coding_v3.txt.
The changing part in AIDalgorithm_coding_v2.txt is for serology tests: (1) make the serology name consistent with the Lab table in Autoimmunediseasealgorithm_v3/v4, (2) remove serology tests that are not in the lab table, (3) updated serology lab LONIC codes.

Thank you.

Sorry, I missed 'dx.autoinflammatoryAutoimmune' codes from AIDalgorithm_coding_V3.txt file, which is availalbe from AIDalgorithm_coding_v2.txt file. AIDalgorithm_coding_V3b.txt contains all diagnosis codes and correct lab codes. 

Page 2 of algorithm (AutoimmuneDiseaseAlgorithm_V4.pdf) states that controls can have no autoimmune and autoinflammatory diagnosis codes. In the AIDalgorithm_coding_V3b.txt file included with the algorithm there are 26794 unique ICD9/ICD10 codes under the code category of dx.autoinflammatoryAutoimmune. If I exclude potential controls subjects that have any of these 26794 diagnoses, I end up with only a few controls? Am I missing something here?

Submitted by Xinnan Niu on

I parsed the file "AIDalgorithm_coding_V3b.txt" and found there are 27326 unique ICD9/ICD10 codes used to define AI cases. However, the file, AutoimmuneDiseaseAlgorithm_V4.pdf, states there are 1528 diagnosis codes were used to determine case. In additon, Jim Linneman said there are 26794 unique ICD9/ICD10 codes. So, I need to get confirmed the # of icd9/10 codes. If possible, please give a list of codes of icd9 and 10. 

Thanks !

Thank you very much for catching up the inconsistencies.

The information in "AutoimmuneDiseaseAlgorithm_V4.pdf" is outdated and will be removed. Please use "AIDalgorithm_coding_V3b.txt" as the correct coding sorce. You are correct, there are 27326 distinct ICD9CM/ICD10CM codes used in AID (for both case and control definition).

Specifically, there are 1443 ICD9CM/ICD10CM/SNOMED codes to define AID subtypes and type 2 diabetes (used for cases), 39497 ICD9CM/ICD10CM/SNOMED codes to define autoinflammatory disease which is used in AID controls, and 177 LOINC codes for labs. In total, there are 40940 distinct ICD9CM/ICD10CM/SNOMED codes for diagnosis (including 27326 distinct ICD9CM/ICD10CM codes).


Submitted by Xinnan Niu on

Would you please let's know when are you going to upload with the new "AutoimmuneDiseaseAlgorithm", V5. Thanks !

Hello Xinnan, 

There will not have the V5 documentation at this moment, since there is only minor change by removing the number of codes part. I am documenting all the minor changes with all the feedback from the group. 

AutoimmuneDiseaseAlgorithm_V4.pdf: please remove "A total of 1528 diagnosis codes were used to determine case status, and 49960 diagnosis codes and measurement codes were used for controls."

AutoimmuneDiseaseAlgorithm_V4.pdf Table 3:

Jo 1 Autoantibody (J01): J01 should be Jo1, Sclerdoma Autoantibody (Scl 70: Scl 70 should be Scl70. 

Thanks Jane Grafton for pointing this out. 


Most of the labs by potiential controls do not indicate positive or negative. Are the units relevant, or are they common across all institutions?

for example

"Anti-Phospholipid Profile"|"CARDIOLIPIN IGM ANTIBODY"|"9.8"|"MPL"|"<=9.2"

9.8 is the 'result value'

<=9.2 is the 'reference value'

-Does that mean that this particluar lab is positive?


-Should patients be excluded from controls if they have just ONE positive lab, therefore?

-Should all listed serology labs be included for every case in the DD?

Yes, 9.8 means positive in this lab. 

Patients should be excluded if they contain any positive lab. 

Yes, please please include all serology tests for both case and controls. Thank you. 

If there is an overlap of subphenotypes, what is the rule to decide which to list?

If there is no subphenotype and a case has only "Auto-inflammatory + Autoimmune (SNOMED)" codes, do we leave them as null in the subphenotype DD?

If there is overlap of subphenotypes, please label both. 

"Auto-inflammatory + Autoimmune (SNOMED)" codes are only used in Control, they are not used in Case. So if there is no subphenotype, then this is not CASE. 

The only ANCA labs that our cohort has seem to be non specific (Not C or P) Should we include these?

"Cytoplasmic Neutrophil Ab w/reflex to Titer/MPOPR3"
"Cytoplasmic Neutrophil Ab"
"ANCA  (Cytoplasmic Neutrophil Ab w/reflex to Titer/MPOPR3)"

It looks to us that the file AIDalgorithm_coding_V3b.txt has either ICD9 or ICD10 codes, but not both, for diseases Goodpasture, Dermatomyositis, Guillain-Barre, and Reiter's. Could you please check to see if that is correct or if we are in some way misreading the file? Thanks.

The file is correct and you are not misreading the files. For above four diseases, there either have ICD9 or ICD10 codes. But if you find ICD codes that are related to these disease that are not in the file, please let us know. Thank you. 

Could you share an OMOP SQL version of the algorithm? Thanks.

Hello Ken, Because many categorical labs are used in autoimmune phenotype and they are not populated in our current OMOP data, so we didn't implement the OMOP version. 


Some of the icd9 codes are just the 3 digit base code. For example 725 for arthirits polymyalgia... Are these valid codes or do we need to do a wild card search?