Cassava Sciences: Of Posters and Spaghetti Plots

In a previous blog post, I took a look at Western blots in papers from the lab of Dr. Hoau-Yan Wang at City University of New York (CUNY), mostly related to Cassava Sciences (Nasdaq: $SAVA), its predecessor Pain Therapeutics INC, and its flagship Alzheimer’s Disease drug candidate Simufilam.

While those papers were mainly about the preclinical Simufilam data, here I will review a conference poster reporting on Phase 2 data obtained by Cassava Sciences.

The AAIC conference poster

The poster was presented at the Alzheimer’s Association International Conference (AAIC) held in Denver, CO, on July 26, 2021. In several press releases around that time, Cassava announced that Simufilam treatment reduced biomarkers [poster presentation] in Alzheimer’s patients while improving their cognition in Phase 2 trials [oral presentation held at the same conference].

Poster presentation at the Alzheimer’s Association International Conference, July 2021, by Cassava Sciences. Source: https://www.cassavasciences.com/static-files/0854aec6-59b3-4e2b-ac20-c32b7c307b08. Archived version: https://archive.is/NpL7Q

The AAIC poster is of interest because it represents data from the Phase 2 clinical trial. Some folks commented on social media that one should not criticize old preclinical data if clinical data proves that the candidate drug works (see e.g. discussion here). So let’s take a look at the published clinical data in this conference poster.

P-tau181 as an Alzheimer’s disease biomarker

The poster reports on 64 Alzheimer’s disease (AD) patients, who were randomized into three treatments groups. One group received a placebo, the second group got 50 mg of oral simufilam, and the third group got 100 mg of oral simufilam, for 28 days.

Plasma levels of P-tau181 were determined on day 1 (start of treatment) and day 28. High P-tau181 plasma levels are correlated with the development of AD dementia, and are considered a diagnostic and prognostic biomarker of AD.

Figure 5 of the Cassava poster shows the day 1 and day 28 plasma P-tau181 levels of the patients in each of the three groups, with a line connecting the two values. Although this is a rather busy figure, one can see that P-tau181 levels went up for some patients (suggestive of worsening AD), while they went down for other patients (suggestive of an improvement of AD). This figure was called a spaghetti plot, because it shows a lot of individual lines that all go in slightly different directions.

Figure 4 looks like it is showing the same data in a different way, one that is supposed to be easier to interpret. The Y-axis is labeled as “Percent CFB”, the change-from-baseline. I will assume here it is the percentage of change as explained in this Wikipedia article. Say, a patient’s P-tau181 was 5 units on day 1 and 10 units on day 28, that is an increase of 5 points, so it would count as a 100% increase (+100%). Similarly, if a patient’s measurement was 5 on day 1 and 2.5 on day 28, it went down 2.5 points, which would count as 50% decrease (-50%). If a the value did not change at all, that would be plotted as 0% change.

Each colored dot is the difference between day 28 and day 1 for each individual patients, shown for each of the three groups. The small horizontal lines in each group represent the mean increase/decrease for each of the three groups, while the dotted horizontal line highlights the 0% change value.

As you can see in Figure 4, the average plasma P-tau181 values in the patients in the placebo group increased a bit after 28 days (above the dotted line), suggesting their disease got a bit worse. In contrast, the two groups of patients who received Simufilam – either 50 mg or 100 mg – ended up with average P-tau181 values a bit under the dotted line, suggesting they improved a bit.

Concerns about Figures 4 and 5

Figures 4 and 5 are presumably two figures showing the same data points plotted in two alternative ways. However, there appear to be some unexpected differences between the data points in the two figures. These discrepancies between Figures 4 and 5 have been pointed out in the Labaton Sucharow (LS) Statement of Concern report filed to the FDA on August 18, 2021, and were further analyzed by Luosheng Peng on Twitter. I agree with those concerns, and will reword them here, and also look at them in more detail.

Figures 4 and 5 from the IAAC poster. Numbers in red/italics were added by me. Source: https://www.cassavasciences.com/static-files/0854aec6-59b3-4e2b-ac20-c32b7c307b08

Discrepancies between Figures 4 and 5

  1. The poster states that 64 patients were included in the study, and randomly assigned to three groups. There are no details given about how many patients were assigned to each group, but we would expect roughly 20 people per group. Yet, Figure 4 only shows 20, 15, and 17 data points for the placebo, 50 mg, and 100 mg Simufilam groups, respectively. This is a total of 52 patients. What happened to the 12 other patients? If fewer data points from the treatment groups are reported, does this mean that data points from the treatment groups have been left out? It would be nice if Cassava scientists could explain what happened with those 12 patients who appear to have dropped out of the study.
  2. While Figure 4 shows 20/15/17 (sum=52) data points for the three groups, Figure 5 appears to show 18/15/18 data points (sum=51). The numbers of the placebo and 100 mg treatment groups therefore do not match.
  3. This is a minor issue, but the color scheme confusingly changes from Figure 4 to Figure 5. While the 50 mg Simufilam remains blue, the placebo group switches from red to green and the 100 mg Simufilam group switches from green to red.
  4. The biggest concern is that some of the data points do not seem to match up between the two figures. Of particular note, in Figure 5, the P-tau181 value from a patient in the 100 mg treatment group went up from ~2.1 to ~5.2 pg/ml, which is an increase of 150%. In other words, one of the patients in the 100 mg treatment group showed a large increase of the Alzheimer’s disease biomarker . Yet, this particular data point is missing in the 100 mg treatment group in Figure 4. Instead, the 150% increase data point is included in the placebo group.

Update: The Clinical Trials page belonging to this study has more information on the numbers of patients in each of the groups who completed the study: 22/20/21. The patient numbers of CSF P-tau181 were 22/19/18 respectively, but no numbers are provided for the serum P-tau181 levels.

Update 2: The LS report stated that the missing data point in the 100 mg group in Figure 4 should have been around 235% – based on an estimated increase from 1.5 to 5 pg/ml. Using WebPlotDigitizer (see below), this estimate appears too high. The LS report did not notice that the placebo group had an additional data point of 150% that does not match Figure 5’s data. Luosheng Peng correctly estimated in a Twitter thread that the missing data point should be around 150% and also found it in the placebo group.

A more precise comparison of Figures 4 and 5

I did a more precise look at the estimated CFB percentages, by extracting the values in Figure 5 using WebPlotDigitizer. Since the spaghetti plot was a bit busy, I did not include all data points, but I focused on the lines that showed the biggest changes between day 1 and day 28, leaving out some lines that were horizontal-ish.

Looking at the estimated change-percentages, Figure 5’s estimated values appear to line up nicely with the position of the data points in Figure 4. Except, that is, for the 150% data point, which seems to have jumped from the 100 mg treatment group to the Placebo group. If this one outlier data point, specifically mentioned in the text of the poster, had been included in the 100 mg treatment group, the average change-from-baseline would change from -17% to around -3%, which is a much less spectacular reduction of plasma P-tau181 levels than claimed by the company.

Please note that I am not suggesting here that moving this data point from the treatment group to the placebo group was done on purpose. Mistakes can happen, in particular if there is a rush to publish, and I do not know exactly what has happened here.

I hope Cassava Sciences can show more details and exact measurements of all 64 patients included here.

Comparison of estimated change-from-baseline from Figure 5 to data points given in Figure 4. Source of right part of figure: https://www.cassavasciences.com/static-files/0854aec6-59b3-4e2b-ac20-c32b7c307b08

Disclosures

  • I do not own any Cassava Sciences shorts or stock, nor stock from other pharmaceutical companies that might be working on competing drugs
  • I was not involved in the publication of the Labaton Sucharow report, and only heard about this case on August 25, after it was discussed on Twitter here and here.
  • I was not paid by any person or organization to investigate these allegations, to analyze these papers, or to write this post.
  • I get donations through Patreon for my ongoing work on science integrity, but no one asked me to work on the analysis in this blog post as a condition for their donation.
  • This post is no accusation of misconduct, just a summary of image problems, some of which could be resolved if the researchers can show the original data.
  • This post is not meant to be financial advice. I am a scientist specializing in photographic scientific figures, and I know almost nothing about the stock market.

30 thoughts on “Cassava Sciences: Of Posters and Spaghetti Plots”

  1. On clinicaltrials.gov, under “study results,” it clearly states how the patients were randomized, how many were included in each group, and why certain patients were excluded (non-compliance, inability to follow directions, lost to follow up, etc.). You’re telling me you don’t know how to look up this data?
    You showed the same incompetence when you needed to ask for original western blots and information on the compounds. Both were easily and publicly accessible.
    Using this to insinuate that they haven’t provided this data is irresponsible and reeks of malintent. What’s your motive here? I’ll assume there is an under-the-table payment vs. notoriety vs. plain incompetence. None of those are healthy for your career.

    Like

      1. Here are the original western blots

        No, those are low-resolution third-generation photocopies of bands that purport to be taken from the original western blots.

        Liked by 1 person

      1. The poster is referring to the study itself. The results of the study are integral to interpreting the poster if you feel the need to take a deeper dive.

        Do you really think that the fact you were referring to the poster is an excuse for your ignorance?

        You have a platform with thousands of followers and you make statements insinuating fraud about a poster of p2b trials without looming at the P2b trial results???

        You lack of the basic professional courtesy is astounding.

        Why don’t you do yourself a favor. Stop talking, go to clinicaltrials.gov and start reading.

        Next puck up the phone and ask to speak with Dr. Wang. That would be considered the absolute minimum.

        Liked by 1 person

  2. Imran, you make a great point about seeing what $SAVA reported to trials.gov.

    https://clinicaltrials.gov/ct2/show/results/NCT04079803?term=simufilam&draw=2&rank=5

    Looks like they collected tau data from the CSF of 22 (Placebo), 19 (50mg), and 18 (100mg) patients. That is 59(!) enrolled patients on whom they told the NIH that they have tau data. But when you look at the poster, they only included 52 (fig 4) or 51 (fig 5) patients. Some one has some splainin’ to do!!! The SEC (and NIH and FDA) don’t like it when you hide data from them to tell a better story, especially when you are putting a synthetic chemical into humans. I am glad you helped us discover additional evidence of carelessness/fraud!!!!

    Like

      1. Elizabeth, is it possible that there’s some misunderstanding in regards to what’s depicted on the Figures. Pleas help me understand this:

        “Yet, Figure 4 only shows 20, 15, and 17 data points for the placebo, 50 mg, and 100 mg Simufilam groups, respectively. This is a total of 52 patients. ”

        But I believe this is not quite right… 20, 15, 17 doesn’t correspond to the number of patients/data points… it corresponds to % increase/decrease in the groups: 20% increase in placebo vs. 15% and 17% decrease in 50mg and 100mg? So I feel like Fig. 4 has been interpreted by you incorrectly… Am I wrong?

        Like

      2. By some bizarre coincidence, the reported average increase / decrease percentage per patient group is the same as the number of dots shown in Figure 4 for each of the groups. But if you would count the number of red, blue, and green dots, they are 20/15/17. Each dot represents one set of patient measurements. The reported average percentages are shown by the horizontal black stripe that is show roughly in the middle of each data point ‘cloud’.

        Like

  3. This figure was called a spaghetti plot on social media, because it shows a lot of individual lines that all go in slightly different directions

    They’re a horrible way of showing information IMHO. What’s wrong with a simple X/Y plot of “Before / After”, with colored points to distinguish the different sub-groups? [/rant]

    Like

  4. Hi,

    I understand that you have discovered discrepancies in the Open Trial Phase 2 presentation (Figs 4,5) of data in the context of SavaDx. I appreciate you doing this work and uncovering these errors. I hope we can get a better understanding about why this happened from the company.

    Have you also been able to discover discrepancies in the presentation of Open Trial Phase 2 data in the context of their presentation on Simufilam here?
    https://www.cassavasciences.com/static-files/5f96b2d4-46e8-4936-a6cf-8332a56f19b1

    Thanks

    Like

  5. It is an extremely bizarre coincidence for sure… I just don’t understand how it can possibly be 20/15/17… it clearly doesn’t align with the total number of patients in the trial… To me, it seems like those 20/15/17 in the chart are actually the labels for the percentages…while a few data points are indeed missing from the chart… Is it possible that they just messed something up while putting this chart together, and omitted a few data points, but that omission didn’t actually affect the calculation of actual percentages? (20%, 15%, 17%). Or it would’ve impacted the calculation of the % for sure? Also, is there a possibility some patients were excluded from the analysis for various reasons? (although I would agree they should’ve included a footnote with detailed explanation what and why was excluded)… Also, in regards to that clear outlier – by looking at clinicaltrial.gov website it looks like one patient withdrew from the 100mg group. Could it be that this is the outlier that we are seeing? Meaning the person was initially in the 100mg group but withdrew and ended up in the placebo group? Or is it not possible and if that person withdrew s/he would’ve have been excluded from the analysis entirely rather than moved from one group to another? Thanks for taking the time to respond, greatly appreciated!

    Like

    1. Once a patient begins a study you are required to track them, even if they withdraw. An excess of patients withdrawing from medication as opposed to placebo is a strong indicator of problematic side effects of the medication.

      You can never move a patient from treatment arm to placebo arm, even if they refuse to take further doses of the medication.

      Doing a clinical trial correctly is very finicky. It is easy to get it wrong. For example, if you only count people who completed the treatment, you will enrich your sample for those who were less ill to begin with. There is a pretty good discussion on Wikipedia under “intention-to-treat” and links from that page if you want to read more about this.

      We have seen some awful examples of trials violating these rules: the most egregious one I’ve seen dropped patients who died from the analysis. It’s much easier to show that your medication makes people get better if you remove all the people who died!

      Liked by 1 person

  6. Dear Mr Khan,

    You seem to have some interesting points to make, but you risk losing all goodwill from the audience to your exchange if you continue to treat Dr Bilk in an aggressive and uncivil manner. If you are correct, there is no benefit in being condescending and sarcastic, as you will only undermine your position. You yourself have commented about basic professional courtesy.

    I think Dr Bilk is showing incredible grace in her exchange with you.

    Liked by 1 person

    1. Thank you for your advice. My comments are aggressive, as they convey my frustrated and perplexed tone. As an academic MD, if find it hard to believe that Elisabeth has publicly voiced concerns against researchers without basic due diligence.

      Unfortunately, Elisabeth is showing no grace to either Dr. Wang / Dr. Burns, or me. She is, however, showing plenty of grace to Jordan Thomas and his anonymous whistleblowers. As a physician, I have direct experience with AD, and AD patients are one of the most underserved secondary to lack of treatment options. For that reason, Casava Sciences, Avonex, and others have my full support, as long as there haven’t been safety concerns in human trials.

      With that said, I am certainly open to opposing viewpoints from real experts. However, I expect those experts do at least a rudimentary analysis before raising concerns. This goes without saying and is even more essential when experts have large followings on social media. I actually stopped reading her article halfway through. It’s poorly researched and screaming of ingenuity / ulterior motives.

      Again there is way too much on the line here for said experts to potentially cause the halting of P3 trials. I’ve seen scientists do much worse with their popularity.

      Like

      1. If you’re still around, Imran Khan, can you expand on your claim (in the previous thread) that thousands of scientists had signed a petition denouncing Dr Bik?
        I would hate to think that you just made it up, or even worse, that you misunderstood the actual petition supporting Dr Bik.

        Liked by 2 people

  7. I would also point out that the numbers of patients analyzed was also subject to a verification procedure. They clearly mention in the poster that each sample was run twice, and if the values were off by >11% they were rerun and discarded if above >15% discordance. That may account for the enrolled/reported differences.

    Like

  8. Dear Mr. Imran Khan,

    I don’t understand how you can say you ‘stopped reading halfway through’ due to poorly researched??

    I’m genuinely suspecting Ms. Bik to be willing to look at ALL of your ‘scientific feedback’ , if you can be so kind to substantiate them for her in more detail…

    So for myself I appreciated your first contribution to point everyone to the correct files on trials.gov.
    But then….her feedback afterwards to you did not resonate?

    She checked it out I believe, but judged it ‘not uncropped /raw’ enough to falsify or adjust her findings in those publications so far?
    (“low-resolution third-generation photocopies”)

    –>I think science is most served by us all helping each other’s work

    Like

  9. Dr. Bik, aren’t you concerned the petitioners against Cassava are using your research to ask for the halt of the trials of their drug. (See https://www.regulations.gov/document/FDA-2021-P-0930-0023). Your posts by themselves are not clear cut proof of anything and are requests for original images and more data. There have been instance where you have been corrected as seen in your twitter account and you haven’t updated your blog after receiving these updates.
    You also haven’t commented on the noticeable differences when blowing up the questionable western blots and haven’t looked at the high resolution western blots in the patent application.

    Should you be stating your position more clearly.

    Like

  10. Dr. Bik,
    Can you take a look and comment on this blog post pertaining to the AAIC poster discussed in your review?
    The blog writers seem to have also done a through review of the poster data and their conclusion differs from yours.

    Thanks for the work you do!

    Like

    1. Thanks for the link 🙂
      That blog post looked at the new, corrected data, in which data points have been added to or removed from some spaghetti plots. In this new plot, the data in Figure 5 now matches that in Figure 4. My blog post is on the data as originally published on the poster, in which the spaghetti data in Figure 5 does not match that in Figure 4. So the reason we come to a different conclusion is that we analyzed different data. I analyzed the original data, while the Ad Science blog posts analyzed the corrected data. I hope this helps you better understand why the reached a different conclusion.

      Liked by 1 person

  11. Dr. Bik

    Like Imram Kahn I’m an MD clinician who has attempted to take care of Alzheimer patients for decades. I would be very interested in any comments you choose to make on the following post from my blog. No one has ever come to see me complaining of an abnormal biomarker. What is far more important to the patient and their family is their cognition and behavior. I’d love it if you’d comment on the clinical data they have presented (which I think is actually better than Cassava realizes).

    Here is a link to my blog — https://luysii.wordpress.com/2021/08/25/cassava-sciences-9-month-data-is-probably-better-than-they-realize/

    Here is the post itself.

    Cassava Sciences 9 month data is probably better than they realize

    My own analysis of the Cassava Sciences 9 month data shows that it is probably even better than they realize.

    Here is a link to what they released — keep it handy https://www.cassavasciences.com/static-files/13794384-53b3-452c-ae6c-7a09828ad389.

    I was unable to listen to Lindsay Burn’s presentation at the Alzheimer Association International Conference in July as I wasn’t signed up. I have been unable to find either a video or a transcript, so perhaps Lindsay did realize what I’m about to say.

    Apparently today 25 August there was another bear attack on the company and its data. I’ve not read it or even seen what the stock did. In what follows I am assuming that everything they’ve said about their data is true and that their data is what they say it is.

    So the other day I had a look at what Cassava released at the time of Lindsay’s talk.

    First some background on their study. It is a report on the first 50 patients who had received Simulfilam for 9 months. It is very important to understand how they were measuring cognition. It is something called ADAS-Cog11

    Here it is and how it is scored and my source — https://www.verywellhealth.com/alzheimers-disease-assessment-scale-98625

    The original version of the ADAS-Cog consists of 11 items, including:1

    1. Word Recall Task: You are given three chances to recall as many words as possible from a list of 10 words that you were shown. This tests short-term memory.

    2. Naming Objects and Fingers: Several real objects are shown to you, such as a flower, pencil and a comb, and you are asked to name them. You then have to state the name of each of the fingers on the hand, such as pinky, thumb, etc. This is similar to the Boston Naming Test in that it tests for naming ability, although the BNT uses pictures instead of real objects, to prompt a reply.

    3. Following Commands: You are asked to follow a series of simple but sometimes multi-step directions, such as, “Make a fist” and “Place the pencil on top of the card.”

    4. Constructional Praxis: This task involves showing you four different shapes, progressively more difficult such as overlapping rectangles, and then you will be asked to draw each one. Visuospatial abilities become impaired as dementia progresses and this task can help measure these skills.

    5. Ideational Praxis: In this section, the test administrator asks you to pretend you have written a letter to yourself, fold it, place it in the envelope, seal the envelope, address it and demonstrate where to place the stamp. (While this task is still appropriate now, this could become less relevant as people write and send fewer letters through the mail.)

    6. Orientation: Your orientation is measured by asking you what your first and last name are, the day of the week, date, month, year, season, time of day, and location. This will determine whether you are oriented x 1, 2, 3 or 4.

    7. Word Recognition Task: In this section, you are asked to read and try to remember a list of twelve words. You are then presented with those words along with several other words and asked if each word is one that you saw earlier or not. This task is similar to the first task, with the exception that it measures your ability to recognize information, instead of recall it.

    8. Remembering Test Directions: Your ability to remember directions without reminders or with a limited amount of reminders is assessed.

    9. Spoken Language: The ability to use language to make yourself understood is evaluated throughout the duration of the test.

    10. Comprehension: Your ability to understand the meaning of words and language over the course of the test is assessed by the test administrator.

    11. Word-Finding Difficulty: Throughout the test, the test administrator assesses your word-finding ability throughout spontaneous conversation.

    What the ADAS-Cog Assesses

    The ADAS-Cog helps evaluate cognition and differentiates between normal cognitive functioning and impaired cognitive functioning. It is especially useful for determining the extent of cognitive decline and can help evaluate which stage of Alzheimer’s disease a person is in, based on his answers and score. The ADAS-Cog is often used in clinical trials because it can determine incremental improvements or declines in cognitive functioning.2

    Scoring

    The test administrator adds up points for the errors in each task of the ADAS-Cog for a total score ranging from 0 to 70. The greater the dysfunction, the greater the score. A score of 70 represents the most severe impairment and 0 represents the least impairment.

    The average score of the 50 individuals entering was 17 with a standard deviation of 8, meaning that about 2/3 of the group entering had scores of 9 to 25 and that 96% had scores of 1 to 32 (but I doubt that anyone would have entered the study with a score of 1 — so I’m assuming that the lowest score on entry was 9 and the highest was 25). Cassava Sciences has this data but I don’t know what it is.

    Now follow the link to Individual Patient Changes in ADAS-Cog (N = 50) and you will see 50 dots, some red, some yellow, some green.

    Look at the 5 individuals who fall between -10 and – 15 and think about what this means. -10 means that an individual made 10 fewer errors at 9 months than on entry into the study. Again, I have no idea what the scores of the 5 were on entry.

    So assume the worst and that the 5 all had scores of 25 on entry. The group still showed a 50% improvement from baseline as they look like they either made 12, 13, or 14 fewer errors. If you assume that the 5 had the average impairment of 17 on entry, they were nearly normal after 9 months of treatment. That doesn’t happen in Alzheimer’s and is a tremendous result. Lindsay may have pointed this out in her talk, but I don’t know although I’ve tried to find out.

    Is there another neurologic disease with responses like this. Yes there is, and I’ve seen it.

    I was one of the first neurologists in the USA to use L-DOPA for Parkinsonism. All patients improved, and I actually saw one or two wheelchair bound Parkinsonians walk again (without going to Lourdes). They were far from normal, but ever so much better.

    However, treated mildly impaired Parkinsonians became indistinguishable from normal, to the extent that I wondered if I’d misdiagnosed them.

    12 to 14 fewer errors is a big deal, an average decrease of 3 errors, not so much, but still unprecedented in Alzheimer’s disease. Whether this is clinically meaningful is hard to tell. However, 12 month data on the 50 will be available in the fourth quarter of ’21, and if the group as a whole continues to improve over baseline it will be a very big deal as it will tell us a lot about Alzheimer’s.

    Cassava Sciences has all sorts of data we’ve not seen (not that they are hiding it). Each of the 50 has 4 data points (entry, 3, 6 and 9 months) and it would be interesting to see the actual scores rather than the changes between them in all 50. Were the 5 patients with the 12 – 14 fewer errors more impaired (high ADAS-Cog11 score in entry) or less.

    Was the marked improvement in the 5 slow and steady or sudden? Ditto for the ones who deteriorated or who got much worse or who slightly improved.

    Even if such dramatic improvement is confined to 10% of those receiving therapy it is worth a shot to give it to all. Immune checkpoint blockade has dramatically helped some patients with cancer (far from all), yet it is tried in many.

    Disclaimer: My wife and I have known Lindsay since she was a teenager and we were friendly with her parents. However, everything in this post is on the basis of public information available to anyone (and of course my decades of experience as a clinical neurologist)

    Like

  12. I feel I should respond to such a terrific post (if that is what it is called). When you parenthetically commented …“which is actually better…” I thought for sure you were going to comment on what I noticed and feel strongly is overwhelmingly more important than the cognitive findings. Can you guess what I have in mind. I am preparing to “post” it (whatever that means) but I want to be thorough before I do so. As for seemingly small changes producing enormous mental effects and being cured by simple molecules think UTIs and/or dehydration in the elderly where both run rampant, as well as naloxone and PKU (which both might have relevance to simufilam). More later if you are interested.

    Like

Leave a comment