Great idea! I think we should have some examples to compare some of the higher end tornadoes to, just to give us an idea on how strong it is.
Some "textbook" examples of a TF4 would be stuff like Barnsdall, Elkhorn, and the 2021 Tri-State tornado. I also think that the 2023 Amory/Wren tornado would fall under mid-high end TF4.
Some textbook examples of a TF5 would be anything like Tuscaloosa-Birmingham, Rochelle-Fairdale, and Joplin. I do think that Mayfield and Diaz would fall under this catagory, with Diaz being on the lower end and Mayfield being mid-high end TF5.
(SKIP THIS IF YOU DON'T WANT TO READ COGNITIVE LINGUISTICS STUFF)
This is somewhat related, but in cognitive linguistics there are two theories that are somewhat similar, prototype theory and exemplar theory. They both pertain to how we mentally organize categories (I'll use BIRD as an example - I promise this relates to your comment, bear with me). Prototype theory would have a prototype containing abstracted features relevant for BIRD, and any members would have some combination of these - wings, feathers, beak, etc. Some birds are more close to the prototype than others (a robin is much closer to BIRD prototype than a penguin or ostrich). Exemplar theory would instead store specific instances of birds in their mind (robin, penguin, etc.). If a new thing is close enough to these exemplars, it's categorized as BIRD. There are also theories that combine these two in some form.
(SKIP TO HERE IF YOU WANT)
Tornadoes. I have wondered if applying these concepts to tornado ratings could be of some use. Here's what I was thinking (broad steps):
- For each EF rating, determine how diagnostic each damage feature is (not based just on frequency, but on how distinct it is for that EF rating). The better a feature can distinguish EF ratings, the higher the weight it is assigned (e.g. maybe extreme ground scouring is rare, but it is highly EF5 specific, so it is given a very high weight)
- For each EF rating, check how well each tornado matches those feature weights calculated previously, and choose the most internally consistent ones (say the top x% of that category). These are the core exemplars for that rating.
- Average the feature profiles of those exemplars to make a prototype for each rating (which serves as the idealized "what an EFX looks like")
- Calculate similarity for each tornado to all prototypes. If its rating doesn't match the prototype it's closest to, and another rating is significantly closer, flag it as a potential misclassification (e.g. New Wren rating is EF3. After calculating the similarity score, it is much closer to the EF5 prototype than the EF3 prototype. It gets flagged, and can be reassigned)
Very simplified example:
Features are tree debarking severity, ground scouring severity, debris granulation severity.
Smithville has normalized values of [1.0, 0.9, 0.8]
The diagnostic weights for these are [0.7, 0.8, 0.95] (aka "how useful are these for distinguishing categories")
The EF5 core exemplars are something like [0.89, 0.7, 0.8], [ . . .], and the prototype avg based on those is say [0.7, 0.7, 0.5].
Compare Smithville values to all (EF0 similarity = 0.01, . . . EF4 similarity = 0.1, EF5 similarity = 0.99). Smithville most matches the EF5 prototype and is therefore classified as an EF5
tl;dr find out most diagnostic features > pick tornadoes that are closest to that combination > those are averaged into a prototype > tornadoes are assigned to the category whose prototype most closely resembles them