Friday, July 31, 2015

Fun decks to play in the Nekroz age

Nekroz is one of those decks you think is going to get hit hard, and comes out from each banlist almost unscathed.  With that in mind, here are some fun decks to play that won't win nearly as often as Nekroz, but are less generic.

1. Chicken Game OTK
The Chicken Game OTK is based on deck thinning and transferring a lot of spells to the grave, and then activating life equalizer, which should drop your opp's life points to 3000, then magical equalizer to finish them off.

Sample Deck List:

2. Slifer OTK:

Yes, Slifer the Sky Dragon is Back, and there is a neat way to consistently get his atk above 24000 (aka drawing 24 cards.)  The draw loop is based on Tethys, the goddess of light, and constantly reloading your hand to draw extra cards, till you draw your deck.  Then, by spec summoning mozarta, hyperion, watapon, and tributing, you can get slifer out with insane atk.

Sample Deck List:

3. Hero S0 Lock:

(To be updated)

Wednesday, July 22, 2015

The Problem with Science (technical)

Published research findings are sometimes refuted by subsequent evidence, with ensuing confusion and disappointment. Refutation and controversy is seen across the range of research designs, from clinical trials and traditional epidemiological studies to the most modern molecular research. There is increasing concern that in modern research, false findings may be the majority or even the vast majority of published research claims However, this should not be surprising. It can be proven that most claimed research findings are false. Here I will examine the key factors that influence this problem and some corollaries thereof.
Modeling the Framework for False Positive Findings
Several methodologists have pointed out that the high rate of nonreplication (lack of confirmation) of research discoveries is a consequence of the convenient, yet ill-founded strategy of claiming conclusive research findings solely on the basis of a single study assessed by formal statistical significance, typically for a p-value less than 0.05. Research is not most appropriately represented and summarized by p-values, but, unfortunately, there is a widespread notion that medical research articles should be interpreted based only on p-values. Research findings are defined here as any relationship reaching formal statistical significance, e.g., effective interventions, informative predictors, risk factors, or associations. “Negative” research is also very useful. “Negative” is actually a misnomer, and the misinterpretation is widespread. However, here we will target relationships that investigators claim exist, rather than null findings.
It can be proven that most claimed research findings are false
As has been shown previously, the probability that a research finding is indeed true depends on the prior probability of it being true (before doing the study), the statistical power of the study, and the level of statistical significance. Consider a 2 × 2 table in which research findings are compared against the gold standard of true relationships in a scientific field. In a research field both true and false hypotheses can be made about the presence of relationships. Let R be the ratio of the number of “true relationships” to “no relationships” among those tested in the field. R is characteristic of the field and can vary a lot depending on whether the field targets highly likely relationships or searches for only one or a few true relationships among thousands and millions of hypotheses that may be postulated. Let us also consider, for computational simplicity, circumscribed fields where either there is only one true relationship (among many that can be hypothesized) or the power is similar to find any of the several existing true relationships. The pre-study probability of a relationship being true is R/(R + 1). The probability of a study finding a true relationship reflects the power 1 - β (one minus the Type II error rate). The probability of claiming a relationship when none truly exists reflects the Type I error rate, α. Assuming that c relationships are being probed in the field, the expected values of the 2 × 2 table are given in. After a research finding has been claimed based on achieving formal statistical significance, the post-study probability that it is true is the positive predictive value, PPV. The PPV is also the complementary probability of what Wacholder et al. have called the false positive report probability. One can derive that PPV = (1 - β)R/(R - βR + α). A research finding is thus more likely true than false if (1 - β)R > α. Since usually the vast majority of investigators depend on a = 0.05, this means that a research finding is more likely true than false if (1 - β)R > 0.05.
What is less well appreciated is that bias and the extent of repeated independent testing by different teams of investigators around the globe may further distort this picture and may lead to even smaller probabilities of the research findings being indeed true. We will try to model these two factors in the context of similar 2 × 2 tables.

Bias

First, let us define bias as the combination of various design, data, analysis, and presentation factors that tend to produce research findings when they should not be produced. Let u be the proportion of probed analyses that would not have been “research findings,” but nevertheless end up presented and reported as such, because of bias. Bias should not be confused with chance variability that causes some findings to be false by chance even though the study design, data, analysis, and presentation are perfect. Bias can entail manipulation in the analysis or reporting of findings. Selective or distorted reporting is a typical form of such bias. We may assume that u does not depend on whether a true relationship exists or not. This is not an unreasonable assumption, since typically it is impossible to know which relationships are indeed true. In the presence of bias, one gets PPV = ([1 - β]R + uβR)/(R + α − βR + u uα + uβR), and PPV decreases with increasing u, unless 1 − β ≤ α, i.e., 1 − β ≤ 0.05 for most situations. Thus, with increasing bias, the chances that a research finding is true diminish considerably. This is shown for different levels of power and for different pre-study odds in. Conversely, true research findings may occasionally be annulled because of reverse bias. For example, with large measurement errors relationships are lost in noise, or investigators use data inefficiently or fail to notice statistically significant relationships, or there may be conflicts of interest that tend to “bury” significant findings. There is no good large-scale empirical evidence on how frequently such reverse bias may occur across diverse research fields. However, it is probably fair to say that reverse bias is not as common. Moreover measurement errors and inefficient use of data are probably becoming less frequent problems, since measurement error has decreased with technological advances in the molecular era and investigators are becoming increasingly sophisticated about their data. Regardless, reverse bias may be modeled in the same way as bias above. Also reverse bias should not be confused with chance variability that may lead to missing a true relationship because of chance.

Testing by Several Independent Teams

Several independent teams may be addressing the same sets of research questions. As research efforts are globalized, it is practically the rule that several research teams, often dozens of them, may probe the same or similar questions. Unfortunately, in some areas, the prevailing mentality until now has been to focus on isolated discoveries by single teams and interpret research experiments in isolation. An increasing number of questions have at least one study claiming a research finding, and this receives unilateral attention. The probability that at least one study, among several done on the same question, claims a statistically significant research finding is easy to estimate. For n independent studies of equal power, PPV = R(1 − βn)/(R + 1 − [1 − α]n  Rβn) (not considering bias). With increasing number of independent studies, PPV tends to decrease, unless 1 - β < a, i.e., typically 1 − β < 0.05. For n studies of different power, the term βn is replaced by the product of the terms βi for i = 1 to n, but inferences are similar.

Corollaries

A practical example is shown in Box 1. Based on the above considerations, one may deduce several interesting corollaries about the probability that a research finding is indeed true.

Box 1. An Example: Science at Low Pre-Study Odds

Let us assume that a team of investigators performs a whole genome association study to test whether any of 100,000 gene polymorphisms are associated with susceptibility to schizophrenia. Based on what we know about the extent of heritability of the disease, it is reasonable to expect that probably around ten gene polymorphisms among those tested would be truly associated with schizophrenia, with relatively similar odds ratios around 1.3 for the ten or so polymorphisms and with a fairly similar power to identify any of them. Then R = 10/100,000 = 10−4, and the pre-study probability for any polymorphism to be associated with schizophrenia is also R/(R + 1) = 10−4. Let us also suppose that the study has 60% power to find an association with an odds ratio of 1.3 at α = 0.05. Then it can be estimated that if a statistically significant association is found with the p-value barely crossing the 0.05 threshold, the post-study probability that this is true increases about 12-fold compared with the pre-study probability, but it is still only 12 × 10−4.
Now let us suppose that the investigators manipulate their design, analyses, and reporting so as to make more relationships cross the p = 0.05 threshold even though this would not have been crossed with a perfectly adhered to design and analysis and with perfect comprehensive reporting of the results, strictly according to the original study plan. Such manipulation could be done, for example, with serendipitous inclusion or exclusion of certain patients or controls, post hoc subgroup analyses, investigation of genetic contrasts that were not originally specified, changes in the disease or control definitions, and various combinations of selective or distorted reporting of the results. Commercially available “data mining” packages actually are proud of their ability to yield statistically significant results through data dredging. In the presence of bias with u = 0.10, the post-study probability that a research finding is true is only 4.4 × 10−4. Furthermore, even in the absence of any bias, when ten independent research teams perform similar experiments around the world, if one of them finds a formally statistically significant association, the probability that the research finding is true is only 1.5 × 10−4, hardly any higher than the probability we had before any of this extensive research was undertaken!
Corollary 1: The smaller the studies conducted in a scientific field, the less likely the research findings are to be true. Small sample size means smaller power and, for all functions above, the PPV for a true research finding decreases as power decreases towards 1 − β = 0.05. Thus, other factors being equal, research findings are more likely true in scientific fields that undertake large studies, such as randomized controlled trials in cardiology (several thousand subjects randomized) than in scientific fields with small studies, such as most research of molecular predictors (sample sizes 100-fold smaller)
Corollary 2: The smaller the effect sizes in a scientific field, the less likely the research findings are to be true. Power is also related to the effect size. Thus research findings are more likely true in scientific fields with large effects, such as the impact of smoking on cancer or cardiovascular disease (relative risks 3–20), than in scientific fields where postulated effects are small, such as genetic risk factors for multigenetic diseases (relative risks 1.1–1.5). Modern epidemiology is increasingly obliged to target smaller effect sizes. Consequently, the proportion of true research findings is expected to decrease. In the same line of thinking, if the true effect sizes are very small in a scientific field, this field is likely to be plagued by almost ubiquitous false positive claims. For example, if the majority of true genetic or nutritional determinants of complex diseases confer relative risks less than 1.05, genetic or nutritional epidemiology would be largely utopian endeavors.
Corollary 3: The greater the number and the lesser the selection of tested relationships in a scientific field, the less likely the research findings are to be true. As shown above, the post-study probability that a finding is true (PPV) depends a lot on the pre-study odds (R). Thus, research findings are more likely true in confirmatory designs, such as large phase III randomized controlled trials, or meta-analyses thereof, than in hypothesis-generating experiments. Fields considered highly informative and creative given the wealth of the assembled and tested information, such as microarrays and other high-throughput discovery-oriented research should have extremely low PPV.
Corollary 4: The greater the flexibility in designs, definitions, outcomes, and analytical modes in a scientific field, the less likely the research findings are to be true. Flexibility increases the potential for transforming what would be “negative” results into “positive” results, i.e., bias, u. For several research designs, e.g., randomized controlled trials or meta-analyses there have been efforts to standardize their conduct and reporting. Adherence to common standards is likely to increase the proportion of true findings. The same applies to outcomes. True findings may be more common when outcomes are unequivocal and universally agreed (e.g., death) rather than when multifarious outcomes are devised (e.g., scales for schizophrenia outcomes) Similarly, fields that use commonly agreed, stereotyped analytical methods (e.g., Kaplan-Meier plots and the log-rank test) may yield a larger proportion of true findings than fields where analytical methods are still under experimentation (e.g., artificial intelligence methods) and only “best” results are reported. Regardless, even in the most stringent research designs, bias seems to be a major problem. For example, there is strong evidence that selective outcome reporting, with manipulation of the outcomes and analyses reported, is a common problem even for randomized trails.  Simply abolishing selective publication would not make this problem go away.
Corollary 5: The greater the financial and other interests and prejudices in a scientific field, the less likely the research findings are to be true. Conflicts of interest and prejudice may increase bias, u. Conflicts of interest are very common in biomedical research, and typically they are inadequately and sparsely reported. Prejudice may not necessarily have financial roots. Scientists in a given field may be prejudiced purely because of their belief in a scientific theory or commitment to their own findings. Many otherwise seemingly independent, university-based studies may be conducted for no other reason than to give physicians and researchers qualifications for promotion or tenure. Such nonfinancial conflicts may also lead to distorted reported results and interpretations. Prestigious investigators may suppress via the peer review process the appearance and dissemination of findings that refute their findings, thus condemning their field to perpetuate false dogma. Empirical evidence on expert opinion shows that it is extremely unreliable.
Corollary 6: The hotter a scientific field (with more scientific teams involved), the less likely the research findings are to be true. This seemingly paradoxical corollary follows because, as stated above, the PPV of isolated findings decreases when many teams of investigators are involved in the same field. This may explain why we occasionally see major excitement followed rapidly by severe disappointments in fields that draw wide attention. With many teams working on the same field and with massive experimental data being produced, timing is of the essence in beating competition. Thus, each team may prioritize on pursuing and disseminating its most impressive “positive” results. “Negative” results may become attractive for dissemination only if some other team has found a “positive” association on the same question. In that case, it may be attractive to refute a claim made in some prestigious journal. The term Proteus phenomenon has been coined to describe this phenomenon of rapidly alternating extreme research claims and extremely opposite refutations Empirical evidence suggests that this sequence of extreme opposites is very common in molecular genetics.
These corollaries consider each factor separately, but these factors often influence each other. For example, investigators working in fields where true effect sizes are perceived to be small may be more likely to perform large studies than investigators working in fields where true effect sizes are perceived to be large. Or prejudice may prevail in a hot scientific field, further undermining the predictive value of its research findings. Highly prejudiced stakeholders may even create a barrier that aborts efforts at obtaining and disseminating opposing results. Conversely, the fact that a field is hot or has strong invested interests may sometimes promote larger studies and improved standards of research, enhancing the predictive value of its research findings. Or massive discovery-oriented testing may result in such a large yield of significant relationships that investigators have enough to report and search further and thus refrain from data dredging and manipulation.

Most Research Findings Are False for Most Research Designs and for Most Fields

In the described framework, a PPV exceeding 50% is quite difficult to get.  A finding from a well-conducted, adequately powered randomized controlled trial starting with a 50% pre-study chance that the intervention is effective is eventually true about 85% of the time. A fairly similar performance is expected of a confirmatory meta-analysis of good-quality randomized trials: potential bias probably increases, but power and pre-test chances are higher compared to a single randomized trial. Conversely, a meta-analytic finding from inconclusive studies where pooling is used to “correct” the low power of single studies, is probably false if R ≤ 1:3. Research findings from underpowered, early-phase clinical trials would be true about one in four times, or even less frequently if bias is present. Epidemiological studies of an exploratory nature perform even worse, especially when underpowered, but even well-powered epidemiological studies may have only a one in five chance being true, if R = 1:10. Finally, in discovery-oriented research with massive testing, where tested relationships exceed true ones 1,000-fold (e.g., 30,000 genes tested, of which 30 may be the true culprits), PPV for each claimed relationship is extremely low, even with considerable standardization of laboratory and statistical methods, outcomes, and reporting thereof to minimize bias.

How Can We Improve the Situation?

Is it unavoidable that most research findings are false, or can we improve the situation? A major problem is that it is impossible to know with 100% certainty what the truth is in any research question. In this regard, the pure “gold” standard is unattainable. However, there are several approaches to improve the post-study probability.
Better powered evidence, e.g., large studies or low-bias meta-analyses, may help, as it comes closer to the unknown “gold” standard. However, large studies may still have biases and these should be acknowledged and avoided. Moreover, large-scale evidence is impossible to obtain for all of the millions and trillions of research questions posed in current research. Large-scale evidence should be targeted for research questions where the pre-study probability is already considerably high, so that a significant research finding will lead to a post-test probability that would be considered quite definitive. Large-scale evidence is also particularly indicated when it can test major concepts rather than narrow, specific questions. A negative finding can then refute not only a specific proposed claim, but a whole field or considerable portion thereof. Selecting the performance of large-scale studies based on narrow-minded criteria, such as the marketing promotion of a specific drug, is largely wasted research. Moreover, one should be cautious that extremely large studies may be more likely to find a formally statistical significant difference for a trivial effect that is not really meaningfully different from the null.
Second, most research questions are addressed by many teams, and it is misleading to emphasize the statistically significant findings of any single team. What matters is the totality of the evidence. Diminishing bias through enhanced research standards and curtailing of prejudices may also help. However, this may require a change in scientific mentality that might be difficult to achieve. In some research designs, efforts may also be more successful with upfront registration of studies, e.g., randomized trials. Registration would pose a challenge for hypothesis-generating research. Some kind of registration or networking of data collections or investigators within fields may be more feasible than registration of each and every hypothesis-generating experiment. Regardless, even if we do not see a great deal of progress with registration of studies in other fields, the principles of developing and adhering to a protocol could be more widely borrowed from randomized controlled trials.
Finally, instead of chasing statistical significance, we should improve our understanding of the range of R values—the pre-study odds—where research efforts operate. Before running an experiment, investigators should consider what they believe the chances are that they are testing a true rather than a non-true relationship. Speculated high R values may sometimes then be ascertained. As described above, whenever ethically acceptable, large studies with minimal bias should be performed on research findings that are considered relatively established, to see how often they are indeed confirmed. I suspect several established “classics” will fail the test.

Nevertheless, most new discoveries will continue to stem from hypothesis-generating research with low or very low pre-study odds. We should then acknowledge that statistical significance testing in the report of a single study gives only a partial picture, without knowing how much testing has been done outside the report and in the relevant field at large. Despite a large statistical literature for multiple testing corrections usually it is impossible to decipher how much data dredging by the reporting authors or other research teams has preceded a reported research finding. Even if determining this were feasible, this would not inform us about the pre-study odds. Thus, it is unavoidable that one should make approximate assumptions on how many relationships are expected to be true among those probed across the relevant research fields and research designs. The wider field may yield some guidance for estimating this probability for the isolated research project. Experiences from biases detected in other neighboring fields would also be useful to draw upon. Even though these assumptions would be considerably subjective, they would still be very useful in interpreting research claims and putting them in context.

The problem with science (non-technical)

A SIMPLE idea underpins science: “trust, but verify”. Results should always be subject to challenge from experiment. That simple but powerful idea has generated a vast body of knowledge. Since its birth in the 17th century, modern science has changed the world beyond recognition, and overwhelmingly for the better.
But success can breed complacency. Modern scientists are doing too much trusting and not enough verifying—to the detriment of the whole of science, and of humanity.

What a load of rubbish: 
Too many of the findings that fill the academic ether are the result of shoddy experiments or poor analysis. A rule of thumb among biotechnology venture-capitalists is that half of published research cannot be replicated. Even that may be optimistic. Last year researchers at one biotech firm, Amgen, found they could reproduce just six of 53 “landmark” studies in cancer research. Earlier, a group at Bayer, a drug company, managed to repeat just a quarter of 67 similarly important papers. A leading computer scientist frets that three-quarters of papers in his subfield are bunk. In 2000-10 roughly 80,000 patients took part in clinical trials based on research that was later retracted because of mistakes or improprieties.
Even when flawed research does not put people’s lives at risk—and much of it is too far from the market to do so—it squanders money and the efforts of some of the world’s best minds. The opportunity costs of stymied progress are hard to quantify, but they are likely to be vast. And they could be rising.
One reason is the competitiveness of science. In the 1950s, when modern academic research took shape after its successes in the second world war, it was still a rarefied pastime. The entire club of scientists numbered a few hundred thousand. As their ranks have swelled, to 6m-7m active researchers on the latest reckoning, scientists have lost their taste for self-policing and quality control. The obligation to “publish or perish” has come to rule over academic life. Competition for jobs is cut-throat. Full professors in America earned on average $135,000 in 2012—more than judges did. Every year six freshly minted PhDs vie for every academic post. Nowadays verification (the replication of other people’s results) does little to advance a researcher’s career. And without verification, dubious findings live on to mislead.
Careerism also encourages exaggeration and the cherry-picking of results. In order to safeguard their exclusivity, the leading journals impose high rejection rates: in excess of 90% of submitted manuscripts. The most striking findings have the greatest chance of making it onto the page. Little wonder that one in three researchers knows of a colleague who has pepped up a paper by, say, excluding inconvenient data from results “based on a gut feeling”. And as more research teams around the world work on a problem, the odds shorten that at least one will fall prey to an honest confusion between the sweet signal of a genuine discovery and a freak of the statistical noise. Such spurious correlations are often recorded in journals eager for startling papers. If they touch on drinking wine, going senile or letting children play video games, they may well command the front pages of newspapers, too.
Conversely, failures to prove a hypothesis are rarely even offered for publication, let alone accepted. “Negative results” now account for only 14% of published papers, down from 30% in 1990. Yet knowing what is false is as important to science as knowing what is true. The failure to report failures means that researchers waste money and effort exploring blind alleys already investigated by other scientists.
The hallowed process of peer review is not all it is cracked up to be, either. When a prominent medical journal ran research past other experts in the field, it found that most of the reviewers failed to spot mistakes it had deliberately inserted into papers, even after being told they were being tested.
If it’s broke, fix it
All this makes a shaky foundation for an enterprise dedicated to discovering the truth about the world. What might be done to shore it up? One priority should be for all disciplines to follow the example of those that have done most to tighten standards. A start would be getting to grips with statistics, especially in the growing number of fields that sift through untold oodles of data looking for patterns. Geneticists have done this, and turned an early torrent of specious results from genome sequencing into a trickle of truly significant ones.

Ideally, research protocols should be registered in advance and monitored in virtual notebooks. This would curb the temptation to fiddle with the experiment’s design midstream so as to make the results look more substantial than they are. (It is already meant to happen in clinical trials of drugs, but compliance is patchy.) Where possible, trial data also should be open for other researchers to inspect and test.

Science still commands enormous—if sometimes bemused—respect. But its privileged status is founded on the capacity to be right most of the time and to correct its mistakes when it gets things wrong. And it is not as if the universe is short of genuine mysteries to keep generations of scientists hard at work. The false trails laid down by shoddy research are an unforgivable barrier to understanding. The most enlightened journals are already becoming less averse to humdrum papers. Some government funding agencies, including America’s National Institutes of Health, which dish out $30 billion on research each year, are working out how best to encourage replication. And growing numbers of scientists, especially young ones, understand statistics. But these trends need to go much further. Journals should allocate space for “uninteresting” work, and grant-givers should set aside money to pay for it. Peer review should be tightened—or perhaps dispensed with altogether, in favour of post-publication evaluation in the form of appended comments. That system has worked well in recent years in physics and mathematics. Lastly, policymakers should ensure that institutions using public money also respect the rules.

Friday, July 17, 2015

Upstart Goblin: When to use it

Upstart Goblin: the card that took the yugioh world by force.  Here's a peek:

Now, I’m not sure when Upstart Goblin released officially. But unofficially, Upstart Goblin released when it was advertised by Patrick Hoban. Hoban started off with some controversial claims which has divided the Yugioh world. Here are a few:
-Upstart Goblin is a great card
-Upstart Goblin gives decks a great deal of consistency
-Upstart Goblin should be used in every deck
The first statement is pure opinion, which many individuals share. The second statement is a fact, which I will be focussing on soon. The third statement sent ripples around the Yugiverse. Now lets comprehend the latter two statements. Considering how controversial the third statement is, I’ll cover it extensively. The second one will follow suit.
Upstart Goblin should be used in every deck. Not an entirely false statement, and it in fact applies to many meta decks. I’m not looking to increase the intensity of arguments, so I’ll keep my fan-enraging thoughts to a minimum.
What is the basic function of Upstart Goblin? It grants your opponent a thousand lifepoints, but lets you draw a card. The pros and cons of this card are obvious. But considering the sheer number of people using this card, we can come to one conclusion. 1000 lifepoints is worth a card. Upstart goblins is used in threes, so the conclusion is that its worth letting your opponent start with 11000 lifepoints, as long as you have a 37 card deck.
Now, it’s not possible to come to a conclusion by looking at such a scenario. Many people will continue to argue that the level of consistency achieved in their deck is more than enough, that stacking a deck with quality cards and searchers is more than enough. This is true for certain decks. But consistency in Yugioh is never a bad thing. So, does Upstart Goblin really make a difference when it comes to drawing into your best cards? Does it improve your consistency? Time to do some number-crunching to get our statistical answers.
NOTE: NEVER, and I mean NEVER use 3 upstart goblins in a deck that exceeds the 40 card count. Why? Because in a 41 card deck, using three upstart goblins basically gives your opponent a 1000 lifepoint advantage for free. By removing the upstart, you don’t lose consistency (less cards in your deck), nor do you gift your opponent needless lifepoints. In decks that exceed 42 cards, upstarts basically become pointless fillers.
Goblin Mathematics
NOTE: All statistics are calculated in a 5 card draw, or when you go first. All the statistics increase proportionally if you go second.
Let’s say you run 3 copies of an unsearchable card in your 40 card deck. Lets call this card ‘x.’
Now, the odds of opening with card x is approximately 34%. That’s just about 3 duels. In other words, once a match.
Now lets look at a searchable card. In this case, you use 2 searchers for a card at three in your 40 card deck. Technically, you have 5 cards that’ll let you see card x.
The probability of opening into either card x or its searcher is now 51%. Yes, now you open with the card once in a match for sure, and will even open twice in a match with the card far more often. Now, this is a case where upstart doesn’t make much of a difference. 37 cards or not, if you have 5 of a card in your deck your bound to draw into it. In fact, Upstart only increases this number by approximately 3-4%. In other words, open once more in 8 matches. Not much difference in this case.
What if your looking to draw into doubles, but of two different sets of cards? If your using three destiny draws and three destiny hero dogmas, what are the chances of drawing into one of each in an opening hand?
The probability of this is just about 12%. Upstart’s influence here is limited, increasing this number by around 1.5%.
Up till now, the numbers haven’t justified their case completely. Opening with cards increases once in a bunch of matches. Not magic, and some find it too negligible. Upstart severely improves the chances of drawing into cards you already have a lot of, and more the merrier. But packing multiples of cards leads to the dreaded dead draw. But as I dig deeper, Upstart starts proving its value.
However, the true case for upstarts is made when looking at another interesting situation. Up till now, upstart has only seemed to slightly increase the chances of drawing into all your cards in an opening hand. In other words, Upstart only marginally improves your opening hand.
Where it matters
Now lets look at a situation which is present in many decks, the situation where Upstart makes the real difference. Many meta decks, and archetypes tend to have intricate combinations which requires the presence of 2-3 cards. These combinations, while hard to draw into, provide a winning platform for the deck. Its like opening with a summoner’s art, genome and vanity’s with qliphorts. Relishing prospects, but rare.
My Dark World’s ‘golden combination’ revolves around drawing into three cards. These are mask change second, grapha and tour guide of the underworld. This combination, when played without any interference, can churn out a grapha and dark law on the field, along with a broww in the hand. What are the chances of this? In my deck, a 37 card one, the chances are just about 3% to open with this. Negligible upstart difference.
But Upstart Does Help
But what do these statistics tell you about drawing into these cards later in the duel. As upstart increases the chances of opening, what about the chances of drawing into specific cards. The math behind this is number crunching, but Upstart severely boosts the probability of drawing into certain cards as the game wears on. In my dark world deck, as I activate dealings and allure of darkness, and draw by the handful into my deck, the thinning effect takes its toll. Indeed, my strongest moves not only involve ridiculous card advantage, but can deck thin by 10+ cards in a turn. Once, going second, I deck thinned to the point of having 20 cards in my deck. No, I didn’t use a mill engine either. Not every deck is like this, but most competitive decks that search actively once per turn (or use draw power) will feel a positive effect.
In any case, Upstart significantly improves the chances of drawing into cards on later turns. By a few percent I believe. This is furthermore important when you consider searchers. My tour guide, and grapha are searchable. This leaves me with mask change second. Considering the rapid deck thinning, I should draw into a mask change second within my first 3-5 turns. What does this mean? Consistent dark law. The rate at which the average deck searches means that unsearchable cards are seen far more frequently in 37 card decks, because upstart substantially increases the odds of seeing a specific card as we draw further into our deck. I could always have used 3 mask change seconds, and dropped a searcher, some will say. This brings us to my next point.
Why not use more searchers rather than upstarts?
The fact is, many people would rather fill their deck with searchers rather than upstart goblin. Why should malefic decks use upstarts when they can terraforming into their field spell? Why should noble knights use upstarts when they can cram more equip spells into their deck? All valid but reckless questions. One only needs to look at the dead draw. Direct searchers are good, but above all, they are used primarily for important cards. You won’t see players use 3 searchers for a card they don’t necessarily need more than once at a time. Upstart provides consistency by increasing the probability of drawing into cards, without forcing skilled deck-builders into dead draws.
The BEST example for this can be seen in the plight of Odd-Eyes Pendulum Dragon. Considering its importance to qliphorts, scout had 3 searchers in summoner’s art and many people doubled up with our pendulum dragon. The result? Burning abyss massacred qliphorts in many consecutive YCS regionals. Odd-Eyes was slow, and it lead to various dead draws and average hands. The Yugioh community realized this, slowly cutting Odd-Eyes to 1 or no copies. In place? Upstart Goblins joined the ever-present pot of dualities as deck thinners, leaving a 34 card qliphort deck. Daniele Stella famously won YCS Milan with such a deck, beating a BA player in the final. The case to be made here is Upstarts can increase consistency and playability of your deck without forcing you into poor card choices. That's why I opted for three upstarts over mask change seconds and searchers.
The fact is, with upstart, each card you pick up from your deck is more and more likely to be the raigeki you need. Another benefit of using ‘triple upstart goblin’ is that instead of doubling up on dark hole and raigeki, you can use less, knowing it’ll reach your greedy hands when you need it to. I found upstarts particularly useful, making my mask change second all the more effective when I needed it.

Comparing Sports Legends

When Novak Djokovic beat Roger Federer to win the Wimbledon men’s singles championship on July 12th, he gave his supporters fresh ammunition to argue that he is playing better tennis than anyone in history. It was his 14th victory in his past 21 matches against the Swiss maestro.
Younger fans might presume that only Mr. Federer’s superlative run from 2004-09 could compete with Mr. Djokovic’s dominance. But those with longer memories could make a compelling claim for Rod Laver, who won a record 200 tournaments from 1956-76, or even Bill Tilden, who dominated the 1920s. Mr. Federer’s oft-cited status as the best player ever, and Mr. Djokovic’s as the heir apparent, rest on a widely held but hard-to-prove assumption: because the quality of play has increased so much over time, today’s finest sportsmen must be superior to their predecessors.



Cross-era comparisons are easiest in sports like running, jumping and weightlifting, which are measured in units like time, distance or mass. In general, performance in such contests has improved substantially over the years: the average top-ten finisher in the men’s 100-metre sprint has cut his time from 11.2 seconds in 1900 to just under ten now, and in the marathon from around two hours and 35 minutes in 1939 to two hours and five minutes today. The gains have been greater still in events that require complex equipment or techniques: the current pole-vault world record, at 6.16 metres, is over 50% higher than the best height a century ago.



However, the pace of progress has tended to slow. Most events—with the men’s 100 metres an exception—have settled into a plateau, where new world records are set less often and surpass the old marks by smaller margins. For example, the best men’s 800-metre time has shrunk by a mere 0.82 seconds since 1981, versus almost four seconds in the 26 years before that. And in a few disciplines, improvement has ground to a halt completely. The average times for female short- and middle-distance runners have not budged in 30 years (though some 1980s records by eastern European competitors may have been aided by performance-enhancing drugs). Some “speed limit” is inevitable—humans will never run as fast as an aeroplane, or jump into outer space—and athletes may be approaching it much faster than is widely believed. Mark Denny of Stanford University calculates that most human race times are within 3% of their potential best.



Outside athletics, performance is harder to measure. In bowling, for example, the number of perfect 300 games per year in America rose nearly 40 times during the 30 years to 1999. But connoisseurs attribute much of this to strategically oiled lanes that guide the ball towards its target, rather than any broad-based gain in skill. Golf has demonstrated the opposite pattern: in response to better players wielding better clubs, designers have built longer golf courses with more hazards, such as lakes and bunkers.



Yet even these measurement difficulties pale in comparison with those in interactive sports, in which opponents affect each other. If players improve at the same rate, scoring levels will remain flat. The challenge of comparing players from different eras in games like football—Pelé, Maradona or Messi?—has fuelled many a bar-room brawl. But analysts have devised a few statistical methods to resolve these debates, and estimate how the greats of the past might fare against modern competition.



In a 1985 essay Stephen Jay Gould, a Harvard biologist, proposed using variance among athletes to measure quality of play. If a sport draws on a small population of potential players, mediocre ones will be able to get jobs. Facing inconsistent opposition, the best will produce outstanding results. In contrast, in a sport with a large talent pool, everyone who plays professionally will be reasonably excellent. As a result, the best players will be closer to the average. Gould concluded that the more individual performances in a league differ from each other, the weaker it is likely to be. 



This principle underlies a study by Charles Davis, an Australian researcher. He calculated the standard deviation—a measure of how closely clumped together or spread out players’ performances are—of cricketers’ batting averages in different time periods. He found that variance among batsmen was indeed about 25% lower in 2000 than during the 1930s, when Don Bradman, widely regarded as the sport’s greatest player, was at his peak. However, Bradman exceeded the average of his peers by an unparalleled 4.4 standard deviations, making him a one-in-100,000 outlier. That suggests that he would still be in a class of his own today, though his Test-match average might be in the 70s or 80s rather than his actual 99.94.



Another approach is to look for natural experiments buried within interactive sports. Perhaps the best one can be found in baseball. During World War II, most of the best baseball players were sent off to fight. In order to replenish their rosters, teams had to hire new players. These new players were much worse than the ones they replaced--at the skills they were hired to perform. So the replacement hitters were much worse at hitting than the hitters who went off to war, and the replacement pitchers were much worse at pitching than the pitchers who went off to war.



However, pitchers are only selected for their ability to pitch. They are not selected for their ability to hit. As a result, the replacement pitchers were probably just as good at *hitting* as the pitchers who went off to war were.



This creates a natural experiment. The group of pitchers who played in MLB during the war (the "guinea pigs") were equivalent hitters to the group of pitchers who played in MLB before and after the war. But they were batting against a group of pitchers who were much worse at pitching (the "laggards") than the group of pitchers who played in MLB before and after the war. As a result, we would expect pitchers that played during the war to produce better hitting numbers than pitchers who played before and after the war did, because they were of equal quality as hitters but faced far weaker competition. Sure enough, that's exactly what happened. Pitchers hit about 60% as well as non-pitchers in 1942 and 1946, but 65-66% as well as non-pitchers from 1943-45. This is pretty strong evidence that hitting by pitchers is a good measure of overall quality of play: during the wartime years, when we *know* there was a sharp decline in quality of play, pitchers hit much better than they did in the surrounding seasons.
With the hypothesis that hitting is a good measure of quality of play supported by the war years, we can then zoom out and examine how pitchers have hit during the entirety of baseball history. As the graph shows, their hitting has gotten steadily worse over time. In the early 1870's, when the game was in its infancy, they hit 90% as well as non-pitchers. By 2015, they only hit about 45% as well, a gap of 45 percentage points. That suggests that a league-average hitter in 1871 would hit about 55% as well as an average hitter today.



Next, we apply this method to Babe Ruth. In the 1920's, pitchers hit about 65% as well as non-pitchers. Today, they hit 45% as well. So we can subtract 20 percentage points from Ruth to get an estimate of how well he would perform today. He hit about 45% better than average during his time, so that equates to about 25% better than average today. The best modern players, like Mike Trout or Miguel Cabrera, hit precisely around 25% better than average.



Finally, we can use this measure to estimate an overall effect on wins and losses. A league-average team will by definition win half of its games. If you subtract 20 percentage points of offense--so they score 80% as many runs as average, while allowing an average total--that team would be expected to win about 38% of its games.



Fortunately for fans of Mr. Djokovic, tennis seems to have improved faster than bat-and-ball games. In 2014 Jeff Sackmann, a statistical analyst, examined the performances of players since 1970 who were ranked in the top 50 for two consecutive years. He found that they scored an average of 2.2% fewer return points against other top-50 opponents in the second season than the first, because the players who entered the group in the second year were better than the ones they had replaced. Compounded over 44 years, that pace of improvement suggests that Mr. Laver would struggle to win a single game, let alone a set or match, against Mr. Federer or almost any other modern opponent. And unlike the plateaus seen in many forms of racing, the rate of progress has slowed only modestly to 1.5% in recent years. Even Mr. Djokovic will probably pale in comparison to future talent.