The Problem Statement

Recently, one of my friend came down with a pain around his wrist. Because I work at a hospital, he asked me to pick a medical specialist for him to consult. Since I am a good friend, I suggested for him to visit the best hand surgeon in town.

Being a good friend, I followed up on his consult. To my surprise, he said he had to see an urologist because it could be a kidney stone.

WHAT??! Kidney stones. You must be kidding.

Apparently, that is a possibility. The wrist pain could be a result of gout which in turn could be a result of kidney stones. For more info on joint pain, gout and kidney stones see this( link text)

That is when the idea for this blog popped up.

In this blog we ask the question, Can we build an AI chatbox that takes in a patient's query and outputs an appropriate specialty the patient should see?

Such AI assistants could come in handy to some healthcare providers to guide patients to the right specialty.

It can be used by telephone operators (who might not have aqequate medical knowledge to answer such questions) at those hospitals to better serve the patients.
It can also integrated with hospital's telemedicine app to guide patients.

Now that we have defined our problem statement. Let's spell out the plan to achieve this goal.

For this we will loosely follow this paper as our guide. The paper uses NLP to tackle a similar problem.

Briefly, the following is our plan

building the dataset
- we will extract patient queries from online medical forums
- the queries are organised under different topics such as diabetes, thyroid issues
- we manually map the topics to specialty
- clean the queries
training using fastai's ULMFit
- build a language model using the dataset we have prepared
- build a text classifier using the dataset and the encoader we duild during the language model training
inference and understanding what we have built

Preparing the dataset

This is the raw dataset following scraping the data from medical forums.

df_raw.head(5)

There are about 1368 unique topics. Let's take a look at some of them.

df_raw['topic'].unique()[random.choices(range(1368), k=50)]

array(['Colposcopy', "Sulfasalazine for Crohn's Disease",
       'Thromboembolism', 'Disc Prolapse', "Marfan's Syndrome",
       'Transposition Of Great Arteries', 'Gynaecomastia',
       "Fallot's Tetralogy", 'Portal Hypertension', 'Septic Arthritis',
       'Vestibular Neuritis', 'Dengue Fever ', 'Erythema',
       'Angiotensin II Receptor Blockers',
       'Upper Gastrointestinal Endoscopy', 'Sildenafil', 'THR',
       'Extrinsic Allergic Alveolitis ', 'Appendicitis',
       'Gilberts Syndrome', "Tourette's Syndrome", 'E coli ',
       'Varicose Vein', 'TKR - Total Knee Replacement', 'Bed Sores',
       'Irbesartan', 'Allergic Rhinitis (Hay Fever)', 'Metabolic Disease',
       'Fosamax', 'Incontinence - Urine', 'Acne', 'Colofac', 'DEXA Scan',
       'Amisulpride', "Perthes' Disease", 'Pyoderma Gangrenosum',
       'Etodolac', 'Oxytetracycline', 'Gastroenteritis',
       'Hydatidiform Mole', 'Polycythaemia Rubra Vera',
       'Orthostatic Hypotension', "Fanconi's Anaemia", 'Osteomalacia',
       'Backache', 'Abscess - Dental', 'Indigestion',
       'Hair Loss and Disorders', 'Sciatica', 'Chorea'], dtype=object)

As we can note some of the topics are name of medicines. In the subsequent process, I have decided to remove them.

df_processed.head(5)

I mapped the topics to specialty to the best of knowledge. Let's take a look at some of these.

map = df_processed[['topic', 'specialty']].drop_duplicates('topic').set_index('topic')

map.head(5)

map['specialty'].unique()

array(['gastroenterologist', 'dentist', 'general_surgeon', 'psychiatrist',
       'orthopedic_surgeon', 'dermatologist', 'neurosurgeon',
       'endocrinologist', 'infection_disease', 'pulmonologist',
       'neurologist', 'allergist', 'ophthalmologist', 'o&g', 'oncologist',
       'hematologist', 'cardiologist', 'rheumatologist',
       'cardiacthoracic_surgeon', 'urogyn', 'urologist', 'paediatrician',
       'otolaryngologist', 'nephrologist', 'breast_surgeon',
       'plastic_surgeon', 'internal_medicine', 'vascular_surgeon'],
      dtype=object)

Apart from the mapping, some basic preprocessing was also done.

removing white spaces
removing unnecessary words

Next, to training those language models.

Training using ULMFit

Language model training

dls_lm = TextDataLoaders.from_df(df_processed, 
                                 is_lm=True, 
                                 valid_pct=0.1)

learn = language_model_learner(dls_lm, 
                               AWD_LSTM, 
                               metrics=[accuracy, Perplexity()], 
                               wd=0.1).to_fp16()

We have defined the language model dataloader and learner as above.

Let's take a look at some of the vocab. As expected, fastai set up additional tokens such as xxunk which is used when there is an unknown word that is not part of the vocab set, xxbos to indicate beginning of sentence.

dls_lm.vocab[:30]

['xxunk',
 'xxpad',
 'xxbos',
 'xxeos',
 'xxfld',
 'xxrep',
 'xxwrep',
 'xxup',
 'xxmaj',
 'i',
 '.',
 'and',
 ',',
 'the',
 'to',
 'a',
 'my',
 'it',
 'of',
 'have',
 'in',
 'is',
 'for',
 'was',
 'that',
 'this',
 'but',
 'on',
 'with',
 'had']

That is a sufficient number of words in the vocab.

The training for the language model is defined below.

learn.fit_one_cycle(1, 1e-2)
learn.unfreeze()
learn.fit_one_cycle(10, 1e-3)

Training the classifier

For classification, we first load the vocab that we built from the language model training.

vocab = load_pickle('models/vocabv2.pkl')

Then, define our classification dataloader.

dls_clas = TextDataLoaders.from_df(df_processed, 
                                   text_col='text', 
                                   label_col='specialty',
                                   text_vocab=vocab,
                                   valid_pct=0.1,
                                   seq_len=100,
                                   bs=64,
                                   is_lm=False,
                                   y_block=CategoryBlock())

Let's take a look at some sample from the dls

dls_clas.show_batch()

Let's define our learner to do the job.

learn = text_classifier_learner(dls_clas, 
                                AWD_LSTM, 
                                drop_mult=0.6, 
                                metrics=accuracy)

Load the encoder from the language model training. Also, lets take a look at the model we are using.

learn.model

SequentialRNN(
  (0): SentenceEncoder(
    (module): AWD_LSTM(
      (encoder): Embedding(24664, 400, padding_idx=1)
      (encoder_dp): EmbeddingDropout(
        (emb): Embedding(24664, 400, padding_idx=1)
      )
      (rnns): ModuleList(
        (0): WeightDropout(
          (module): LSTM(400, 1152, batch_first=True)
        )
        (1): WeightDropout(
          (module): LSTM(1152, 1152, batch_first=True)
        )
        (2): WeightDropout(
          (module): LSTM(1152, 400, batch_first=True)
        )
      )
      (input_dp): RNNDropout()
      (hidden_dps): ModuleList(
        (0): RNNDropout()
        (1): RNNDropout()
        (2): RNNDropout()
      )
    )
  )
  (1): PoolingLinearClassifier(
    (layers): Sequential(
      (0): LinBnDrop(
        (0): BatchNorm1d(1200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (1): Dropout(p=0.24, inplace=False)
        (2): Linear(in_features=1200, out_features=50, bias=False)
        (3): ReLU(inplace=True)
      )
      (1): LinBnDrop(
        (0): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (1): Dropout(p=0.1, inplace=False)
        (2): Linear(in_features=50, out_features=28, bias=False)
      )
    )
  )
)

The model contain the encoder module and the classification module. Later, we will load the encoder from the language model.

learn.model[0]

SentenceEncoder(
  (module): AWD_LSTM(
    (encoder): Embedding(24664, 400, padding_idx=1)
    (encoder_dp): EmbeddingDropout(
      (emb): Embedding(24664, 400, padding_idx=1)
    )
    (rnns): ModuleList(
      (0): WeightDropout(
        (module): LSTM(400, 1152, batch_first=True)
      )
      (1): WeightDropout(
        (module): LSTM(1152, 1152, batch_first=True)
      )
      (2): WeightDropout(
        (module): LSTM(1152, 400, batch_first=True)
      )
    )
    (input_dp): RNNDropout()
    (hidden_dps): ModuleList(
      (0): RNNDropout()
      (1): RNNDropout()
      (2): RNNDropout()
    )
  )
)

Our training set up is as below.

learn.fit_one_cycle(3, 2e-2)
learn.freeze_to(-2)

learn.fit_one_cycle(1, slice(1e-2/(2.6**4),1e-2))
learn.freeze_to(-3)

learn.fit_one_cycle(1, slice(5e-3/(2.6**4),5e-3))
learn.unfreeze()

learn.fit_one_cycle(20, slice(1e-3/(2.6**4), 1e-3), cbs=[GradientAccumulation(16), SaveModelCallback(fname='classi_v2'), EarlyStoppingCallback(comp=np.less, patience=3)])

How is our classifier performing?

Let's try to interpret what we have achieved so far.

interp = ClassificationInterpretation.from_learner(learn)

Let's take a look at the confusion matrix

interp.plot_confusion_matrix(figsize=(15,15))

Weighted avg f1-score of 0.87 looks decent. Overall for all classes the metrics looks fine except maybe O&G, oncologist, plastic surgeon. Maybe our dataset might not contain enough data for these data.

interp.print_classification_report()

                         precision    recall  f1-score   support

              allergist       0.93      0.91      0.92        43
         breast_surgeon       0.91      0.94      0.92        31
cardiacthoracic_surgeon       0.94      0.77      0.85        39
           cardiologist       0.84      0.87      0.85       115
                dentist       0.95      0.77      0.85        26
          dermatologist       0.87      0.92      0.90       277
        endocrinologist       0.85      0.87      0.86       120
     gastroenterologist       0.85      0.90      0.87       196
        general_surgeon       0.89      0.84      0.87       185
           hematologist       0.80      0.89      0.84        63
      infection_disease       0.89      0.90      0.89        94
      internal_medicine       0.89      0.81      0.85        59
           nephrologist       0.78      0.82      0.80        22
            neurologist       0.81      0.93      0.87       257
           neurosurgeon       0.88      0.67      0.76        66
                    o&g       0.98      0.77      0.86        83
             oncologist       0.00      0.00      0.00        12
        ophthalmologist       0.98      0.97      0.98        63
     orthopedic_surgeon       0.88      0.91      0.90       332
       otolaryngologist       0.87      0.86      0.86       122
          paediatrician       0.83      0.68      0.75        59
        plastic_surgeon       0.83      0.45      0.59        11
           psychiatrist       0.89      0.88      0.88       189
          pulmonologist       0.90      0.95      0.92       103
         rheumatologist       0.90      0.82      0.85       157
                 urogyn       0.91      0.88      0.90       118
              urologist       0.88      0.96      0.91       135
       vascular_surgeon       0.94      0.94      0.94        31

               accuracy                           0.88      3008
              macro avg       0.85      0.82      0.83      3008
           weighted avg       0.87      0.88      0.87      3008

/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_classification.py:1272: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))

Let's take a look at one of the query that our model messed up. Honestly, even I would have gotten confused with that. Perhaps shows the limitation of our dataset preparation method we used.

interp.plot_top_losses(1)

Let's make some queries and see what our model says.

get_prediction('lump under my armpit', learn)

Query lump under my armpit outputs general surgeon with a confidence of 70.66%. That is a good start.

Below we have few more examples. The model prefers descriptive queries which is understandable as that is how it was trained. Seizure as a query gives wrong prediction while a descriptive queries that include seizure predicts correctly.

The word feeling strongly predicts psychiatrist. A descriptive query around feeling does help with better prediciton.

get_prediction('seizure', learn)

get_prediction('i have had seizure 3 times in the past 2 days', learn)

get_prediction('i am feeling cold', learn)

get_prediction('i am feeling cold at night with slight cough and fever', learn)

get_prediction('knee pain with slight swelling', learn)

get_prediction('my ldl levels are high', learn)

We have come to the end of the blog. What is next?

increase the size of dataset
use transformer based model

	text	topic
0	Gut issues \n\nI realized I was eating to make my stomach feel better. I would eat normal several days then I could not get enough to eat felt like I was starving. So I decided to treat myself and do a 2 week course of Prilosec. I did this for several years every 6 months seemed to put stomach back to normal. I was trying to save on Dr. bills as I have other issues. So in May I had a vomiting spell of burning acid and abdominal pain did not go to Doctor. I came down with urinary infection right after showing blood in urine. I went on antibiotics. They sent me for tests to see if kidney...	Abdominal Disorders
1	Has anyone experienced gastritis as a side effect of an SSRI (antidepressant)? \n\nI had been on Escitalopram for 5 years when I developed intermittent nausea. After switching to Sertraline, I developed a very sore stomach and the nausea became constant and severe. A gastroscopy diagnosed stomach-wide, non-erosive gastritis (H.Pylori-negative). This has not improved in 6 weeks of taking a PPI and an H2 blocker but I have been continuing to take Sertraline.\nI cannot find any articles about an increased risk of Gastritis from Sertraline (although there is some link with gastric bleeding), b...	Abdominal Disorders
2	Severe stomach pain and flare up every 2-3 weeks \n\nsince April 21 I have been having these so called flare ups (doctors name for them) I get severe abdominal pain where I can't move, vomiting, diarrhea, cold sweats, shakes, rapid heart beat, confusion and dizzyness, I pass out nearly every time, I ring an ambulance straight away but its been months and my doctors don't have a clue what is causing it as we've found no triggers, I've been on a strict diet, gluten free, no alcohol, no dairy no coffee and no spicy food. Its really disheartening not knowing what it is, is anyone else having t...	Abdominal Disorders
3	Plzz can anyone tell colonscopy \n\nhello can any one tell what i have from 2020 my stomach is not well some day i get loose type stools other day its normal i did mt colonscopy it went till ileo ceacal valve it was normal doctor said it is ibsbut still i have symptom can anyone tell me ?\n0 likes, 0 replies\n\n\n\n\n\n\n\nReport / Delete\n\n\n\n\n\n\n\n\n\n	Abdominal Disorders
4	Effects of late night eating \n\nI don't think that eating late at night when I was young was due to an eating disorder, I think it was something I did due to a lack of understanding or knowledge of the dangers.\nI'm currently trying to find a diagnosis for serious health issues I have which I believe were caused by this. But, I can't seem to find any scientific answers as to what the dangers of late night eating really are.\nI'm 37 now, and the first symptoms I noticed that were a result of my eating habits were when I was 18. These symptoms were reflux ones, but not common ones such as h...	Abdominal Disorders

	text	topic	specialty	seq_length
0	Gut issues I realized I was eating to make my stomach feel better. I would eat normal several days then I could not get enough to eat felt like I was starving. So I decided to treat myself and do a 2 week course of Prilosec. I did this for several years every 6 months seemed to put stomach back to normal. I was trying to save on Dr. bills as I have other issues. So in May I had a vomiting spell of burning acid and abdominal pain did not go to Doctor. I came down with urinary infection right after showing blood in urine. I went on antibiotics. They sent me for tests to see if kidney sto...	Abdominal Disorders	gastroenterologist	425
1	Has anyone experienced gastritis as a side effect of an SSRI (antidepressant)? I had been on Escitalopram for 5 years when I developed intermittent nausea. After switching to Sertraline, I developed a very sore stomach and the nausea became constant and severe. A gastroscopy diagnosed stomach-wide, non-erosive gastritis (H.Pylori-negative). This has not improved in 6 weeks of taking a PPI and an H2 blocker but I have been continuing to take Sertraline.I cannot find any articles about an increased risk of Gastritis from Sertraline (although there is some link with gastric bleeding), but it ...	Abdominal Disorders	gastroenterologist	124
2	Severe stomach pain and flare up every 2-3 weeks since April 21 I have been having these so called flare ups (doctors name for them) I get severe abdominal pain where I can't move, vomiting, diarrhea, cold sweats, shakes, rapid heart beat, confusion and dizzyness, I pass out nearly every time, I ring an ambulance straight away but its been months and my doctors don't have a clue what is causing it as we've found no triggers, I've been on a strict diet, gluten free, no alcohol, no dairy no coffee and no spicy food. Its really disheartening not knowing what it is, is anyone else having the s...	Abdominal Disorders	gastroenterologist	110
3	Plzz can anyone tell colonscopy hello can any one tell what i have from 2020 my stomach is not well some day i get loose type stools other day its normal i did mt colonscopy it went till ileo ceacal valve it was normal doctor said it is ibsbut still i have symptom can anyone tell me ?	Abdominal Disorders	gastroenterologist	58
4	Effects of late night eating I don't think that eating late at night when I was young was due to an eating disorder, I think it was something I did due to a lack of understanding or knowledge of the dangers.I'm currently trying to find a diagnosis for serious health issues I have which I believe were caused by this. But, I can't seem to find any scientific answers as to what the dangers of late night eating really are.I'm 37 now, and the first symptoms I noticed that were a result of my eating habits were when I was 18. These symptoms were reflux ones, but not common ones such as heartburn...	Abdominal Disorders	gastroenterologist	503

	Specialty	Probability
0	cardiologist	37.10
1	orthopedic_surgeon	34.22
2	neurologist	12.44
3	psychiatrist	7.14
4	paediatrician	2.18

	Specialty	Probability
0	neurologist	93.13
1	psychiatrist	6.34
2	orthopedic_surgeon	0.32
3	infection_disease	0.10
4	neurosurgeon	0.05

	Specialty	Probability
0	psychiatrist	76.58
1	neurologist	5.19
2	o&g	4.95
3	orthopedic_surgeon	3.17
4	otolaryngologist	2.03

	text	category
0	xxbos pacs and pvcs … what worked for me . hello , people 's research and sharing is great on these sites . xxmaj as a matter of fact , i can relate to everyone 's research because we all do research . xxmaj we do it because noone has an answer . xxmaj well , i found the solution for me . xxmaj the cure , not just masking symptoms . xxmaj and , we 're talking about a guy who had pacs and pvc 's so bad , that xxmaj i 'd literally get up and drive myself to the hospital xxmaj emergency xxmaj room parking lot , and sleep in my car there because i felt secure . xxmaj now , that 's after admitting myself to the hospital , so many times with the same outcome … pac 's and xxup pvc 's , with the	cardiologist
1	xxbos xxmaj could i possibly have xxmaj asperger syndrome ? i do n't really know where to start , but i guess i should start from my childhood . i have always been the elephant in the room who nearly never speaks and avoids other people . xxmaj i 've noticed i feel better and in fact , that 's the only time when i talk and it 's in very small groups of up to 2 people who i feel are not xxunk but i talk most with just one other person who i feel safer with , but still with restriction and no one really knows my actual self , i would say . xxmaj tbh , even i do n't . i do n't know who i am and if i am just a xxunk person who xxunk in front of that one person to be more	psychiatrist
2	xxbos xxup how i xxup beat xxup molluscum xxup contagiosum xxup in 1 xxup week xxup for xxup less xxup than $ 10 . xxmaj wanted to post this to help anyone else struggling with this awful infection : i had about 30 - 40 sores in my genital area and was able to achieve clear skin in 1 week . xxmaj no medicine , no dermatology appointments , just some reading online to get some background information on the infection . xxmaj if you have less sores , or in a more accessible area , this method only gets easier with an even higher likelihood of success . xxmaj other forums or doctors will tell you not to pop them because if people try to do it without understanding the gravity of the situation or the need for xxunk control of what touches what , they can make it	dermatologist
3	xxbos xxmaj mollescum xxmaj journey at 23 xxmaj hello ! xxmaj just a lil background info before i go into detail . xxmaj i 'm a 23 year old white female with mollescum on my genital area . xxmaj it took 4 months for my mollescum to even show up after i contracted it from my last sleeping partner unknowingly . 🙃 i started noticing my mollescum around early xxmaj february . xxmaj i 'm guessing they started to appear a week or two before but i did n't notice until they matured . ( not an often xxunk ) i initially thought they were warts because i had been tested positive for hpv last year so i made an appt with the gyno . i was due for my yearly pap smear anyways so i went in to talk about my options . xxmaj turns out my bumps were	dermatologist
4	xxbos xxmaj bartholin xxmaj gland xxmaj excision xxmaj experience xxmaj just wanted to hop on here and share some of my xxmaj bartholin cyst journey , because reading about the experiences from other women on here is really what helped me make the decision to finally have my gland xxunk got my first xxmaj bartholin cyst in 2014 , and it abscessed . xxmaj with a combination of three sitz baths per day in xxup acv , baking soda and xxunk xxunk tincture as well as occasional hot compresses , it burst on its own after about a week . xxmaj it returned after a couple years , but was very small and seemed to shrink on it 's own . xxmaj and then , in the midst of a tough breakup , it returned again . i tried all the natural remedies , and nothing worked to make it	urogyn
5	xxbos xxmaj could it be xxmaj sjogren 's ? xxmaj hey i 'm a 20 year old male and literally have no clue what is causing all my problems . i know it is rare that i could have any autoimmune disease let all alone xxmaj sjogren 's as i know it is predominantly found in woman but it seems to fit to my xxunk so it started 6 months ago when i was 19 , i was absolutely fit and healthy until i came down with what i thought was a bug , i was violently sick for only a day but it was pretty bad , i ended up throwing up so much i think i got down to my stomach lining as my sick turned orange and tasted bitter . xxmaj however i got slowly better but i got pain in my testicles for a while but	rheumatologist
6	xxbos xxmaj coronary xxmaj artery xxmaj spasms xxmaj ok i have had this condition for years , some cardiologists agree with the diagnosis and a couple do not xxunk a long story short , last week i was out shopping , i had sudden crushing chest pain and felt really ill , had a spray , rested for 5 minutes and was able to able to continue on even though i was still feeling very unwell , the chest pain kept returning and it got to the stage where i just do what i usually do , grit my teeth and put up with it , as i was out had no other pain relief . xxmaj finished the shopping went back to my mothers to install a new telephone for her and then came home , i was then able to take my prescribed pain relief , after an	cardiologist
7	xxbos xxmaj chronic xxmaj lower xxmaj back xxmaj pain - xxmaj can you help ? xxmaj male , 33 years old hi , i've had chronic low back pain for about 5 years now and it 's got steadily worse . xxmaj i 've written a lot of detail here - but i 'm just interested in how you think i should progress in treating my lower back pain - as i 've seen a few physios now and nothings really helping and i do n't really know what the cause of my backpain is or how to get it diagnosed or treated ! ! ( at a bit of a xxunk : i'm xxmaj male , 33 and i 'm pretty good shape ( ran london marathon last year ) . i ca nt think of any incident five years ago that caused this - there was no trauma	orthopedic_surgeon
8	xxbos xxmaj do i have xxmaj diabetes or xxmaj insulin xxmaj resistance or xxmaj hypoglycemia ? i have been strangely hungry a lot lately . i am a 42yr old female . i have been since i was 15 or 14 skipping meals to maintain my weight so i can eat a lot at once . xxmaj my dieting has varied from when i was a teenager and in my 20 's xxmaj i 'd go for 4 days each day eating less than 500 calories to binge eat for 3 days during high school and college whatever and how much food i wanted . xxmaj then in 1 xxrep 3 9 , for around 6 months i stopped , gained xxunk from xxunk to xxunk at 5'0 " tall . xxmaj well i did n't like the way i looked , went back to some dieting from time to	endocrinologist

	Specialty	Probability
0	breast_surgeon	70.66
1	cardiologist	5.14
2	psychiatrist	4.57
3	oncologist	3.33
4	plastic_surgeon	2.71

	Specialty	Probability
0	pulmonologist	38.52
1	psychiatrist	26.58
2	neurologist	7.16
3	otolaryngologist	4.06
4	allergist	3.15

	Specialty	Probability
0	orthopedic_surgeon	80.73
1	rheumatologist	5.56
2	cardiologist	5.12
3	neurologist	3.17
4	internal_medicine	1.34

	Specialty	Probability
0	cardiologist	49.45
1	endocrinologist	21.16
2	internal_medicine	10.64
3	nephrologist	3.21
4	psychiatrist	2.76