Published Articles

UNFORGETTABLE PEOPLE IN UNFORGETTABLE TIME  

This article below published in „International Gymnasts“ Magazine is very important for me, because it was written by Debbie Poe, an amazing American journalist, which provides me with numbers of very professional photos for my book „One Coach’s Journey from East to West…“

I am so thankful to this talented journalist which gave me feeling that I am doing an important work writing the „Book of my Life“ in hardest time of my Coaching carrier.

Below is Debbie Poe article for IG Magazine about one of my best student  from Gymnastics World of Georgia, Samantha Stadmiller:

I am uploaded this article and located  it among „Published Articles“ in very important section of my website called  „Clear Circle Gymnastics Magazine“. As it stated before in my introduction of this section:

„Everything need to be clear as it is clear in Gymnastics Free Hip Circle …“

This is why all I would like to add is the photo of myself, Mrs. Debbie Poe and my daughter Olesya Zaglada during our unforgettable trip to New York.

You are always in my mind, UNFORGETTABLE Debbie Poe…

DOUBLE „A“ INTERVIEW : WHO IS THE WINNER AND WHAT NEED TO CHANGE !?

I think that photo which I borrowed from the Russian Gymnastics Federation official website is the best addition to the interview which  you for sure will enjoy to read.

Below is a link to the interview made by VTB Media from former Head Coach of Women’s Russian Team Alexander Alexandrov after Olympic Games- 2012 in London, Great Britain.

I am publishing  this interview in my „Clear Circle“ Magazine by very important reason: everything must be clear as a main requirement for the Gymnastics Free Hip Circle. I was very excited and satisfied reading this interview, because all what Double „A“ said was the Truth! He shared his very professional thoughts about American WAG and made a brief (but very serious) analysis of what steps Russian Women’s Team need to undertake in order to be the BEST!

This interview was published on September 12, 2012. Then… what a big surprise I got news that Double „A“ decided to quit his work with Russian National Team!? Am I was really disappointed !?  Of course… This is why I will add to the interview below my own thoughts why the fact that Russia losing the Best Coaches is still in a shade!?

https://vtbrussia.ru/sport/gymnastic/kto-na-samom-dele-vyiigral-olimpiadu/

The comments below is in Russian language . The Google can translate it easy. I wrote it after I got news that Coach Alex Alexandrov is working as National Coach for Brazilian Team and really worried about future of Aliya Mustafina. I immediately sent him my personal email, talked on the phone …

„I did my best for Aliya and for the Russian Team!“ , that’s what „Double A“ said at the end of our discussion.

АЛЕКСАНДР АЛЕКСАНДРОВ: ТРЕНЕР, УМЕЮЩИЙ ПОБЕЖДАТЬ

Делаю небольшой экскурс теперь уже в историю новой российской гимнастики, и привожу выдержку из своей книги, опубликованной в США в конце 2010 года. Надеюсь, что многие, кто читал мою книгу, хорошо помнят, что “ С Гимнастикой по Жизни: Годы, События, Люди», была опубликована практически через несколько месяцев после окончания, победного для женской команды России, Чемпионата Мира в голландском городе Роттердаме осенью 2010 года:

«… I firmly believe that despite the obvious decline of Russian gymnastics, it cannot be written off as a contender at top international competitions and the Olympics. Believe me: when someone is fated to be strong and has indeed already been so, temporary setbacks can be deceiving. Russian gymnastics, with its tradition and history, will do all that it can to return to the ranks of the leaders of world gymnastics and challenge the American powerhouse. This is the law the great live by: to be strong among equals and to achieve victory over the mighty.»

Но, поверьте , что тогда не только знание былой силы и истории российской женской гимнастики , равно как и триумфальная победа на Чемпионате Мира 2010 года, женской команды России, возглавляемом ее наставником Александром Александровым, заставили меня не только поверить в то, что Россия вернется в число лидеров мировой гимнастики, но и заявить об этом вслух, да и еще со страниц книги, опубликованной на двух языках в США.

Что-то намного большее, чем предчувствие, говорило внутри меня: перемены если не произошли, то обязательно произойдут и случится это очень скоро. Ведь я понимал, что командная победа и титул абсолютной Чемпионки Мира, завоеванный Алией Мустафиной, еще не гарантирует реальных победных результатов на Олимпийских играх – 2012 года в Лондоне. До Лондона тогда оставалось всего два года, и как мне казалось,
заветная мечта Российской Гимнастики победить на Лондонской Олимпиаде еще могла свершиться…

Что же так сильно повлияло на меня, и что вызвало такую высокую степень уверенности в том, что серия неудач российских гимнасток может, наконец-то, прекратится, и золотой дождь олимпийских медалей станет реальностью?! Как же много в нашей жизни зависит от случая?! И такой случай, ставший счастливым для наших российских гимнасток и тренеров, казалось бы, произошел! Команду возглавил Александр Александров, в прошлом личный тренер легендарного Дмитрия Билозерчева, а также бывший старший тренер женской сборной команды СССР, работавший с ней на протяжении ряда лет.

Но, поверил я в большей степени потому, что именно он, Саша Александров, широко известный среди зарубежных специалистов, как „Double A“ ( Alex Alexandrov), работая длительное время в США, показывал самые высокие результаты. И, несмотря на то, что в Америке этот высокий профессионал , отдавший почти всю свою жизнь советской и российской гимнастике, работал, преимущественно, с элитой, его познания американской гимнастической системы настолько возросли, что именно он стал практически единственным претендентом на роль ЛИДЕРА РОССИЙСКОЙ ГИМНАСТИКИ, способным побеждать американок!

И он смог привести команду России к победному результату на Чемпионате Мира 2010 года, объединив коллектив тренеров сборной команды России в кулак, противостоящий очень мощной американской команде. И сделал он это, несмотря на поражения и неудачи на ряде предыдущих Чемпионатов Мира.

Казалось, что бороться с набравшими реальную силу американками, просто невозможно. Но свершилось, то, что многие, так же, как и я, до сих пор считают чудом, а именно, встретились два лидера, два равноценных партнера по силе и духу, встретились тренер и гимнастка! Они нашли друг друга, поскольку были созданы друг для друг, для того, чтобы ПОБЕДИТЬ и вернуть команду России на лидирующие позиции в мировой гимнастике.

Прочитав статью о судьбе Алие Мустафиной, рассказанную на страницах „Спорт Экспресса“, бывшим ее наставником, Александром Александровым, считаю необходимым обратить внимание всех специалистов, создававших гимнастику в России, и прославивших Россию на международной арене, не оставить без внимания этот искренний рассказ выдающегося российского тренера, «не по своему хотению, а по «царскому» велению» оказавшегося в далекой Бразилии, и обеспокоенного судьбой своей воспитанницы.

Publishing this post I am inviting the Coaches & Gymnasts  and all people who loves Artistic Gymnastics to take part in the discussion  on theme “ The Coach  & Gymnast Are Partners Forever!“

 

НОВАЯ ПУБЛИКАЦИЯ НА АНГЛИЙСКОМ  И НА РУССКОМ…

Как я и обещал ранее в одной из моих публикаций на сайте,  сегодня я представляю одну из интереснейших книг по судейству гимнастических соревнований , написанную Судьей Всесоюзной Категории по спортивной гимнастике, Юрием Михайловичем  Ободовским, посвятившим все свое „свободное от основной работы время“  судейству  крупнейших Всесоюзных, Всероссийских и Международных соревнований по спортивной гимнастике.

Великолепные знания математического анализа и наличие ученой степени позволили Юрию Михайловичу Ободовскому, выпускнику элитного факультета Московского Авиационного института проанализировать судейство большого количества гимнастических соревнований с позиций его объективности.

В связи с невероятным усложнением трудности гимнастических упражнений с одной стороны, и появлением в целом ряде стран большого количества гимнастов, практически идентичных по своей подготовке и очень близких с точки зрения их оценки, с другой стороны, ситуация с определением победителей и  призеров крупнейших мировых стартов, включая Олимпийские Игры, мягко говоря, усложнилась. Зачастую, даже самая минимальная разница в окончательной оценке упражнения, являющаяся следствием неграмотного, а иногда просто предвзятого судейства,  приводила к  нежелательным последствиям, а иногда, и просто к лишению медалей тех гимнастов, кто их реально заслужил.

Общеизвестно, что объективность судейства всегда была и находится под пристальным вниманием руководства Международной Федерации Гимнастики. Именно, тесные профессиональные контакты  Юрия Михайловича Ободовского с ФИЖ, ключевой организацией по гимнастике, стали серьезным базисом для разработки им комплексной методики оценки судейского аппарата при отборе претендентов на судейство крупнейших международных соревнований.

И, Юрий Михайлович Ободовский  предложил такую методику, фактически разработав удобную  форму сертификации  бригад для судейства гимнастических соревнований любого уровня.

Главным базисом такой  судейской сертификации, являются не только анкетные данные  возможных судей соревнований и формальные подтверждения их  судейских категорий,  но и статистически обоснованные результаты оценки их „судейского творчества“, детально представленные в электронном формате.

Принимая во внимание актуальность и своевременность данной методики, я получил согласие автора на публикацию полного текста его уникальной  книги на своем сайте. Замечу, что данную публикацию я  размещаю на страницах виртуального журнала „ПЕРЕШМЫГ“, в котором, как известно, все должно быть чисто , честно и чрезвычайно интересно для всех.

Итак, пришло время поговорить о чистоте и прозрачности судейства , как говорится на полном серьезе! Итак , А, СУДЬИ КТО !?

 

ОБОДОВСКИЙ Ю. М.

СУДЕЙСКИЕ БИТВЫ У ГИМНАСТИЧЕСКИХ ПОМОСТОВ  (сокращенная версия).

 

Проблемы качества судейства в видах спорта, в которых победитель определяется не прямым результатом (время, вес, длина или высота), а решением группы лиц, называемых судейской коллегией, издавна волновали любителей спорта. Так называемое «субъективное судейство» всегда вызывало много споров и претензий.   

С целью предложить специалистам и любителям гимнастики пути решения проблемы судейства я издал книгу: «Судейские битвы у гимнастических помостов», в которой предлагаю строгие, научно обоснованные критерии качества судейства. В ней показаны многие судейские «ухищрения», о которых не знает зритель, не догадываются участники, не хотят замечать спортивные функционеры. При этом многие из судей и руководителей гимнастики даже не догадываются, что все это можно найти непосредственно в судейских протоколах. Надо только их правильно анализировать. В книге представлено большое количество информации с обоснованием и результатами применения разработанной методики. 

Ниже я представляю сокращенный электронный вариант моей книги, аналогичный принятому в нашей стране автореферату диссертации. В этом варианте представлены основы методики  и основные результаты анализа качества судейства. Информация, являющаяся основанием полученных выводов, будет приведена лишь выборочно. 

С инженерной точки зрения работа судей на соревнованиях по спортивной гимнастике – суть работа системы измерения. Судейские бригады на отдельных видах гимнастических упражнений являются измерительными комплексами, а  каждый судья является чувствительным элементом измерительного комплекса, или датчиком. Сравнение процесса соревнований с процессом работы сложной измерительной системы позволило мне полноценно использовать аппарат математической статистики для создания методики оценки качества судейства.

Впервые я активно использовал разработанную мной по предложению Л.Я. Аркаева методику при анализе качества судейства Чемпионата Европы 1998 (ЧЕ-88) и Чемпионата Мира 1999 г. (ЧМ-99). Результаты этого анализа я доложил 18.03.2000 на  встрече руководителей федераций гимнастики стран – членов СНГ и стран Балтии, в котором приняли участие президент ФИЖ Б. Гранди, генеральный секретарь ФИЖ Н. Бюш, почетный президент ФИЖ Ю.Е. Титов. Уже после этого доклада мной был проведен анализ качества судейства на ЧЕ-00 и Олимпийских Играх 2000 г. (ОИ-00). 

 

  1.   Основные положения методики анализа качества судейства. 1998-2000 г.

 

1.1. На основании проведенных при разработке методики исследований я могу утверждать, что методы математической статистики адекватны для анализа качества судейства исполнения (бригады В) в спортивной гимнастике. Судейство трудности (бригады А) методам математической статистики не подвластно.

1.2. Проведенные исследования показали, что отклонения оценок судей от средних оценок подчиняются нормальному закону распределения. То же я могу утверждать относительно художественной гимнастики. 

При этом у меня есть основания полагать, что никто до меня не исследовал закон распределения отклонений оценок судей.

1.3. Показано, что среднеквадратическое отклонение средней оценки за исполнение позволяет контролировать качество судейства в ходе соревнований.

1.4. Разработана методика контроля «совместного судейства», т.е. выставления судьями оценок на основе переговоров или подглядывания оценок у сидящего рядом судьи. 

1.5. Проанализирован процесс манипуляции оценками, являющийся удобным способом маскировки необъективного судейства.

        1.6. Составлен комплекс показателей для всеобъемлющей оценки качества судейства:

M – среднее значение отклонений оценок судьи в бригаде от средних оценок при постоянном составе судейской бригады;

m – среднее значение отклонений оценок судьи в бригаде от средних оценок для каждой контролируемой группы;

m’ = m – M среднее значение отклонений оценок судьи в бригаде от средних оценок для каждой команды, скорректированное на тенденцию судьи к занижению или завышению оценок.  

S – среднее квадратическое отклонение всей совокупности отклонений  оценок судьи в бригаде от средних оценок при постоянном составе судейской бригады; (мера точности судейства судьи);

s – средняя квадратическая величина отклонений оценок судей в бригаде от средней оценки (исчерпывающая мера точности средней оценки);

σ – среднее квадратическое отклонение отклонений оценок судьи в бригаде от средних оценок в  контролируемой совокупности;

r – коэффициент корреляции между отклонениями оценок судьи от средних оценок, характеризующий степень «совместного судейства» пары судей в бригаде; 

Случаи, когда коэффициент корреляции превышает допустимое положительное значение, следует рассматривать, как случаи «совместного судейства»,  Чем больше этот коэффициент, тем более тесно связано его судейство с судейством другого судьи в бригаде. Случаи, когда коэффициент корреляции оказывается меньше допустимого отрицательного значения, обусловлены наличием в таблице повышенных положительных коэффициентов. 

t – безразмерный коэффициент, используемый в качестве показателя объективности/необъективности отдельного судьи. Величина коэффициента t вычисляется для контролируемой группы на основании приведенных выше показателей. Если коэффициент t попадает в назначенный нами диапазон, судейство считается объективным. Если t меньше нижней границы диапазона, то принимается решение о том, что судья статистически значимо занижает оценки гимнастам выбранной группы. Если t больше верхней границы диапазона, то принимается решение о том, что судья статистически значимо завышает оценки гимнастам выбранной группы. 

R — размах, который определяется как разность между максимальным и минимальным отклонением оценки судьи от средней оценки в контролируемом наборе оценок, деленная на значение S. Если величина размаха превышает верхнее допустимое значение, то это свидетельствует о манипуляции судьи оценками.

Величина размаха R, превышающая верхнее допустимое значение, может  являться признаком манипуляции оценками с целью сокрытия необъективности.

Иногда это реакция судьи, который обнаружил некоторую систематику в своих оценках и счел необходимым изменить характер своего судейства для исключения этой систематики. 

Величина размаха R, меньшая нижнего допустимого значения у нескольких судей в бригаде, может  являться признаком сговора. 

Я считаю этот комплекс показателей исчерпывающим, однако, при изменении правил соревнований он может потребовать коррекции.

1.7. Установлено, что отсутствие комплексного подхода с использованием упрощенных методик приводит к серьезному искажению результатов анализа качества судейства. При этом могут возникать ситуации поощрения не лучших судей и наказания – не худших.

1.8. Разработанная методика дополнительно позволяет выявить «групповые показатели судейства»:

        –  необъективное судейство судьями одной страны участников другой страны;  

  • возникновение судейских альянсов.

1.9. Разработана система индивидуальных карт для каждого судьи, позволяющих контролировать деятельность судьи в течение всего периода его участия в судействе ответственных соревнований. Эти карты могут быть успешно использованы для контроля качества судейства, причем по ним можно прогнозировать качество судейства отдельных судей на предстоящих соревнованиях.  В моей книге можно найти образец такой карты с подробными комментариями.  

 

Примечание: полный текст книги на русском языке, любезно предоставленный мне автором,  можно прочитать по ссылке ниже:

Сокращ.вариант

Below is the link to the English Version of the book

Yu. OBODOVSKY

BATTLES OF JUDGES AT GYMNASTIC PLATFORMS  (abridged version)

D-r Yu. OBODOVSKY

BATTLES OF JUDGES AT GYMNASTIC PLATFORMS  (abridged version)

 The quality problems of judging in sports in which the winner is determined not by a direct result (time, weight, length or height), but by the decision of a group of persons called the panel of judges, had long been concerned about sports fans. The so-called “subjective judging” always caused more disputes and claims.

 In order to offer professionals and gymnastics enthusiasts ways to solve the problem of refereeing, I published a book: “Battles of judges at gymnastic platforms”, in which I offer strict, scientifically based criteria for the quality of refereeing.

I hope that I will be able to show in it many judges‘ “tricks” that the viewer does not know about, participants do not realize, sports functionaries do not want to notice   The most  of the judges and gymnastics managers do not even realize that all this is written directly in the judges‘ protocols.  It is only necessary to analyze them correctly.

Below I present an abridged electronic version of my book, similar to the dissertation abstract accepted in our country.  This version presents the basis of the methodology and the main results of the analysis of the quality of refereeing. Information that is the basis of the findings will be presented only selectively.

From an engineering point of view, gymnastic competitions are the process of testing samples submitted for testing and  the work of judges is the work of a measurement system.

Judging panels for separate types of gymnastic exercises are measuring complexes, and each judge is a sensitive element of the measuring complex, or a sensor.  Comparison of the competition process with the work process of a complex measuring system allowed me to fully use the apparatus of mathematical statistics to create a methodology for assessing the quality of refereeing.

For the first time I actively applied methodology for analyzing the quality of refereeing the 1998 European Championship-88 and the  World Championship-99.  

On March 18, 2000 at a meeting of the leaders of the gymnastics federations of the CIS member countries and the Baltic countries I had reported the results of this analysis. FIG President B. Grandi, FIG Secretary General N. Busch, FIG Honorary President Yu.E. Titov attended this meeting.

After this report I made an analysis of the quality of refereeing at the EC-2000 and the  OG-2000.

 

  1. The main principles of the methodology of judging quality analysis.  1998-2000 y.  

 

 

  • Based on the research conducted in the development of the methodology, I can argue that the methods of mathematical statistics are quite adequate for analyzing the quality of refereeing performance (team B) in gymnastics. The work of team A cannot be evaluated by mathematical statistics.

 

1.2. Studies have shown that deviations of judges‘ scores from average scores are subject to the normal distribution law (Gaussian distribution). I can say the same about rhythmic gymnastics.

At the same time, I have reason to believe that no one before me has investigated the law of the distribution of deviations of judges‘ scores.

1.3. It is shown that the standard deviation of the average score for performance allows you to control the quality of refereeing during the competition.

1.4. The concept of “joint judging” has been introduced. “Joint judging” means that judges give scores on the basis of negotiations or by peeping at the scores of a judge sitting next to him. A technique for controlling over such judging has been developed. 

1.5. The process of manipulating scores, which is a convenient way to mask biased judging, is analyzed.

        1.6. A set of indicators for a comprehensive assessment of the quality of judging has been composed:

M – the average value of the deviations of the scores of the judges in the panel from the average scores with a constant composition of the panel of judges;

m – is the average value of the deviations of the scores of the judge in the panel from the average scores for each controlled group;

m’= m – M is the average deviation of the judge’s scores in the panel from the average scores for each controlled group, adjusted for the judge’s tendency M to underestimate or overestimate the grades.

S – mean square deviation of the scores of the judges in the panel from the average scores with a constant composition of the panel of judges; (a measure of the accuracy of refereeing a judge);

s – mean square deviation of the scores of the judges in the panel from the average scores (comprehensive measure of the accuracy of the average score);

σ – mean square deviation of the scores of the judge in the panel from the average scores  for  controlled group;

r – correlation coefficient between deviations of the judges‘ scores from the average scores, characterizing the degree of “joint judging” of a pair of judges in the panel.

Cases when the correlation coefficient exceeds the permissible positive value should be considered as cases of “joint judging”. The greater this coefficient, the more closely related his refereeing with the refereeing of another judge in the panel. Cases when the correlation coefficient is less than the permissible negative value are due to the presence of increased positive coefficients in the table.

t – dimensionless coefficient used as an indicator of the objectivity / bias of an individual judge. The value of the coefficient t is calculated for the controlled group based on the above indicators. If the coefficient t falls within our designated range, refereeing is considered objective. If t is less than the lower limit of the range, then the decision is taken that the judge statistically significantly underestimates the scores of the gymnasts of the selected group. If t is greater than the upper limit of the range, then a decision is made that the judge statistically significantly overestimates the gymnasts of the selected group.

R — the range, which is defined as the difference between the maximum and minimum deviation of the judge’s score from the average score in a controlled set of scores divided by S.

The value of the range R, exceeding the upper allowable value, may be a sign of manipulation of estimates in order to hide bias.

Sometimes this is the reaction of the judge, who found some systematics in his refereeing and considered it necessary to change the nature of his refereeing to exclude this systematics.

The value of the range R, less than the lower acceptable value for several judges in the team, may be a sign of collusion.

I consider this set of indicators comprehensive, however, when changing the rules of the competition, it may require correction.

1.7. It is established that the lack of an integrated approach using simplified methods leads to a serious distortion of the results of the analysis of the quality of refereeing.При этом могут возникать ситуации поощрения не лучших судей и наказания – не худших.

1.8. The developed methodology additionally allows you to identify „group judging indicators“:

        –  biased refereeing by judges of one country of the participants of another country;

  • the emergence of judges‘ alliances.

1.9. A system of individual cards has been developed for each judge, allowing to control the activities of the judge during the entire period of his participation in the judging of important competitions. These cards can be successfully used to control the quality of refereeing, and on them you can predict the quality of refereeing of individual judges at the upcoming competitions.  You can find a sample of such a map with detailed comments in my book.

 

  1. The results of the application of analysis methods 

 of quality of refereeing for the period 1998-2000

 

Based on the technique that I propose, it is possible, with a high degree of reliability, to most objectively identify the unfair or unqualified refereeing options, and I believe that the developed technique can easily be adapted for use not only in gymnastics, but also for a number of sports, where the result is determined by  judges‘ grades.

For rhythmic gymnastics, I checked this on the example of the European Championship 2001 in Geneva and the “Champions Cup 2001”, held in Moscow.  1544 scores put up by ten judge panels were processed. The obtained results fully confirmed the applicability of the technique for rhythmic gymnastics.

When monitoring the largest competitions for 1998–2000: CHE-98, CHM-99, CHE-2000, OI-2000 – I had processed:

              – More than 40,000 judges‘ scores;

              -140 judges panels B, of which 235 judges were examined;

            -More than 7000 team results – in this case, by team results I mean the results of one team on one apparatus in one type of competition;

             -1200 statistically representative  groups of scores.

As a rule, I chose groups  of participants from one country as such groups. Less commonly, these were groups of scores within a certain range. Sometimes I tried to highlight groups depending on the competition schedule.

Processing such a volume of information made it possible to reliably obtain quantitative indicators of the quality of refereeing for each judge for each competition.

Let us dwell separately on the Olympic Games:

In the competitions of the OG-2000 the judges of the B-panels had issued 10008 scores.

The following judges took part in the composition of B-panels:

  • men – 43 people, including  38 judges represented participating countries;
  • women – 26 people, all represented by participating countries

             Statistically significantly overestimated the scores for compatriots:

      –          men: 66% of judges (25 of 38)

      –          competitions I: 54% (19 of 35)

      –          competitions II: 57% (13 of 23)

      –          competitions IY: 89% (8 out of 9)

      –          women: 65% of judges (17 of 26)

      –          competition I: 50% (12 of 24)

      –          competition II: 32% (7 of 22)

      –          competitions IY: 45% (6 of 13)           

Joint judging was recorded in 43 cases in 15 judging panels: 

     –            men:

     –            competition I: 33% (12 judges )

     –            competition II: 22% (8 judges )

     –            competition IY: 40% (14 judges )

    –             женщины

    –             competition I: 17% (4 judges )

    –             competition II: 13% (3 judges )

    –             competition IY: 8% (2 judges )

        Independent judging was carried out in two judges panels by only three judges and in three panels only by four.

 

                 2.1.1. Average score accuracy

 

Let us turn to an example:  there was an intervention by the chairman of the WTC Jackie Fie in deriving the final scores of the vault for two gymnasts from Russia: S. Khorkina and E. Produnova during the women competitions IY of the WC-99.

 

Tab.  2.1.  Scores of individual participants in the vault.  World Cup 99.  Comp. 4.

 

Participant Fed Final score Score of  J. Fie Average score
Karpenko UKR 9,4125 9.4175 0,1429
Produnova RUS 9,4375 9.2000 9.4500 0,1449
Dong CHN 8,725 8.7175 0,1472
Slater AUS 9,350 9.3425 0,1530
Khorkina RUS 9,2625 9.0000 9,275 0,1754

 

– final score – final score, obtained in accordance with the Code of Points of the competition;

– score of J. Fie – score obtained after the correction of the chairman of the WTC;

             – average score –the average score determined by the scores of all 6 judges; 

             –  s – the mean square deviation of the scores of individual judges B1-B6 from the average score (a measure of the accuracy of the average score).

In Table 2.1.1, the s indicator of Khorkina’s score is the worst in apparatus, but the accuracy of Produnova’s score is better than the accuracy of the scores of two participants: Dong and Slater.  However, the scores of these participants, Ms. J. Fie, did not correct.

The US gymnast White had a worse accuracy of the score on the beam (s = 0.1800), but Ms. J. Fie did not correct this score.

The actions of the head of the WTC without mathematical justification could arouse suspicion of bias.

The question may be asked: “Why are deviations considered with respect to the average scores, and not the final ones?”. The fact is that in the analysis we consider the quality of refereeing, in which all judges participate. The exclusion of individual scores from consideration will lead to the fact that the most important part of the information will be lost: erroneous or unfair scores will be out of consideration.

The characteristic of accuracy s of a separate score allows to reliably identify gross judge errors, and these errors can be detected directly in the course of the competition.

 

              2.2.  Estimation of judging quality of an individual judge 

 

Again, let’s look at an example:  refereeing of the judge from Malaysia at the WC-99.

His refereeing is graphically presented in the table.  2.2.1.

 

 Tab.  2.2.1.  Refereeing judges from Malaysia.  WC-99.  Comp. III.  Floor exercise

 

PUR VEN UKR CUB MAS KOR s  
Немов, RUS. 0,050 0,050 0,100 0,050 -0,250 0,000 0,127
Деферр, ESP. 0,042 0,092 -0,058 0,042 -0,058 -0,058 0,067
Ксинг, CHN. 0,008 -0,042 -0,042 0,008 0,008 0.06 0,038
Рудницкий, BLR 0,067 0,017 0,067 0,067 -0,383 0,167 0,194
Янг, CHN. 0,017 0,017 -0,083 0,017 -0,033 0,067 0,052
Мелиссанидис,GRE  0,108 0,058 0,108 0,008 -0.39 0,108 0,196
Карбаненко, FRA. 0,025 0,125 0,025 0,125 -0,225 -0,075 0,133
Бондаренко, RUS  0,125 0,075 0,125 0,025 -0,375 0,025 0,189
m 0,055 0,049 0,030 0,043 -0,214 0,036
σ   0,042 0,052 0,082 0,039 0,166 0,081

He did not participate in judging comp. I, but comp. IY and II completely revealed the discrepancy of this judge with the level of competition. 

However, he was included in the panel of judges of comp III. Of the 8 scores, the his scores were excluded 5 times, 2 times turned out to be together with other scores at the border, and only 1 time the score for participant from China “fell” into the estimated population.

              The refereeing of the judge from Malaysia was a very revealing, but rare, case when his insufficient qualification  was revealed without serious research.

To determine the quality of refereeing of each judge, it is necessary to fully use the apparatus of mathematical statistics and establish which statistical characteristics of refereeing of a judge should be considered acceptable. The tolerances set by me are given in table 2.2.2.

 

Table 2.2.2.  The tolerances  established for statistical characteristics of judging WC-99

 

Men Women
Comp. I Comp. II Comp. IY Comp. I Comp. II Comp. IY
М  ±0,051 ±0,078 ±0,071 ±0,036 ±0,052 ±0,033
S 0,066 ÷ 0,158 0,054 ÷ 0,131 0,042 ÷ 0,122 0,077 ÷ 0,161 0,042 ÷ 0,124 0,045 ÷ 0,105

 

A detailed list of judges whose refereeing indicators have gone beyond established tolerances is given in my book. Here I will give only the number of such judges:

Men: Comp. I – 3;  Comp. II – 4;  Comp. IY – 2. Moreover, a judge from Malaysia has the worst characteristics in both Comp. II and Comp. IY.

Women: Comp. I — 3;  Comp. II — 1; Comp. IY — 1.

The calculated characteristics allow, to a first approximation, to determine the quality of judging for each judge. I will explain in the next section, how reliable this definition is, what “pitfalls” it contains, and why I say “as a first approximation”. 

 

2.3.  Indicators of „joint judging“ in the judges panels 

 

The calculated indicators M, S of the refereeing of all judges of the period 1998-2000 are presented in the tables in my book.

As usual, let us turn to the most illustrative example.  To do this, consider the correlation matrix in table 2.3.1.

 

Tab.  2.3.1.  WC-99.  Men.  Comp. I.  Bars.  Correlation matrix

Fed. of judge   JPN KOR GEO PUR LAT ROM rдоп
JPN 1,0 -0,256 -0,157 -0,133 -0,278 -0,012 252 уч.

6 судей

-0,29÷-0,04

KOR 1,0 0,577 -0,351 -0,280 -0,311
GEO   1,0 -0,278 -0,190 -0,311
PUR 1,0 0,006 -0,080
LAT 1,0 -0,132
ROM 1,0

Suspected couple: GEO-KOR

The numerical values inside the table are the correlation coefficients already mentioned above. radm shows the acceptable values of the correlation coefficient for 6 judges and 252 participants.

The facts of “joint judging” were revealed by me when analyzing the correlations between the scores in teams. This absolutely routine procedure in mathematical statistics allowed me to identify quite unexpected (for me) phenomena. The number of cases of joint refereeing was so great that I doubted the results. My additional verification, which is described in my book, confirmed the results. 

The matrix shows that the correlation coefficient of judges from Georgia and Korea significantly exceeds the tolerance indicated in the right column of the table, which is convincing evidence of joint judging. I was not too lazy to calculate the number of identical scores given by these two judges: out of 252 ratings, 175 were the same!

One of these judges (let’s call him G.) was present at my report when I mentioned this fact. After the report, he came up and thanked me for not saying his name. And at the EC-2000 he “improved” his way of refereeing: his scores began to differ from the ones of the judges sitting next to him by 0.05 points. My technique easily revealed this „improvement.“ He did not change his method of refereeing when judging at the OG-2000.

 

Tab.  2.3.2.  Features of Judging Judge Г.

 

Competitions Position of

judge  Г.

Fed. of the neighbour judge The number of scores  Correlation coefficient r
Tolerance Real value
ЧЕ-98/1 B1-Fl LAT 66 0,08 0,15
ЧЕ-98/2 B2-H LAT 21 0,24 -0,29
ЧМ-99/1 B3-PB KOR 253 -0,05 0,58
ЧМ-99/4 B4-H THA 30 0,19 0,07
ЧМ-99/2 B3-PB CUB 30 0,19 0,14
ЧЕ-00/1 B1-PB TUR 87 0,04 0,07
ЧЕ-00/2 B3-R TUR 24 0,23 0,50
  ОИ-00/1* B4-V GER 79 0,05 0,44
  ОИ-00/1* B4-V BLR 79 0,05 0,12
ОИ-00/4 B4-Fl ARG 30 0,19 0,65
ОИ-00/2 B2-PB LAT 36 0,16 0,61

Judge Г. has an increased correlation coefficient with both neighbors.

  The bold line in the table divides the judging before and after my report.

It can be seen from table 2.3.2, that after my report judge Г., making sure that there was no reaction from the FIG leadership to my message, began to judge only in pairs, without fear of any sanctions. I fear that it was my report and the lack of reaction of the FIG leadership  prompted judge Г. to maintain the “stability” of his judging system.

Now I will show the reader that a reliable assessment of the judge’s work cannot be performed on the basis of using only part of the information.

 

Table 2.3.3.  To the assessment of the quality of refereeing based on

 simplest statistical characteristics.  OG-2000.  Comp. I.  Men

 

Conditional rating of judge Fed. of judge Position of judge M S Correlation coefficient r
1 GEO B4-V -0,015 0,043 0,438 – GER
2 GER B5-V -0,022 0,048 0,438 – GEO
3 ARG B1-R 0,008 0,059 0,268 – ESP
4 JPN B3-HB  0,011 0,060 Acceptable*
5 UKR B5-HB -0,002 0,062 0,343 – KAZ
6 SUI B3-V -0,009 0,063 Acceptable
7 KAZ B4-HB 0,004 0,064 0,343 – UKR
8 CAN B4-Fl   -0,017 0,062 В допуске
….. ……. …. ……..
14 LAT B3-Fl   0,006 0,068 Acceptable
15 ESP B2-R 0,015 0,069 0,268 – ARG
…. ……. …. ……..
36 BUL B4-R -0,048 0,108 Acceptable

* for comp. 1 OG-2000 the tolerance for the correlation coefficient is -0.40 ÷ 0.05.

The statistical characteristics of the correlation coefficients of some male judges, comp. I OG-2000 are  shown in Table 2.3.3. It would seem that the absolute leadership of the judges from GEO and GER clearly follows from this table.

However, as soon as the reader looks at the right column of the table, it will become clear to him why I called the ratings of judges conditional.

Judges with conditional ranks 1 and 2, 3 and 15, 5 and 7 sat next to each other.  In addition, judges 3 and 15 are from countries with Spanish. The excellent statistical indicators of judges who received “conditional” 1 and 2 places in the table, as well as high 3, 5, 7 places, are explained by the fact that they simply worked in pairs. Therefore, I designated their high rating as „conditional.“

As I said above, “joint judging” distorts the results of the analysis of the quality of judging. It reduces the scatter of scores in the judge panel and creates a false effect of high quality refereeing. 

The value of the correlation coefficient is a reliable indicator of the independence of a judge or the fact of “joint judging”. From the materials presented in table 2.3.3, it follows that it is impossible to obtain reliable results of assessing the quality of refereeing using only the “understandable” part of the information.

Recognition as the best judges, the characteristics of which are headed the table 2.3.3 and the tables in the appendix to my book, can lead to an absolutely false choice of the best and a complete distortion of the assessment of the quality of refereeing. Let me remind the reader that only at OG-2000, more than 40 judges had correlation coefficients  testifying a joint refereeing. If we convert the correlation matrixes into diagonal ones, then the accuracy of judges with significant correlation coefficients will deteriorate and, accordingly, individual results of participants may change. 

2.4.  Individual judge objectivity indicators

 

The problem of bias of individual judges is the most painful issue of refereeing .

A significant part of cases of biased refereeing can be identified on the basis of mathematical statistics. The only source material for the analysis is the score sheet.

For beginning let us consider the example in table 2.4.1.

The meaning of all indicators in the left column of the table is described in section 1.6. It is only necessary to add that n denotes the number of scores in the controlled group.

The table contains samples of 3 scores for comp. I (team composition) or 2 scores for  comp. II, when two participants from one country got into the all-around final.

 

Table 2.4.1.  Examples of biased refereeing at OG-2000.  Women

 

Comp.I Comp.II
Vault UB Beam Floor Vault UB Beam
Fed. of judge AUS CHN UKR CAN ROM CAN ITA ESP
Position of judge В3 В4 В4 В4 В3 В2 В3 В6
Fed. of team GRE JPN GRE CAN RUS CAN ESP
n 4 5 5 5 6 2 3
m‘ -0,089 0,109 0,091 0,072 -0,061 0,074 -0,087 0,091
σ 0,028 0,006 0,012 0,019 0,023 0,012 0,024 0,033
t -4,9 8,6 5,7 5,5 -4,4 4,1 -3,0 3,0
R 0,69 0,10 0,23 0,72 1,09 0,32 0,45 0,75

 

The table will focus on the indicators t:  judges from Canada act quite effectively in the interests of their participants (t = 5.5, t = 4.1, respectively), a judge from Spain also “does not offend” their participants (t = 3.0), a judge from China “favors” Japan, (t = 8.6), Ukraine supports the Greeks (t = 5.7), a judge from Romania actively opposes the Russian team (t = -4.4), the Australians had not shared something with the Greeks (t = -4,9), and Italy opposes the Canada team (t = – 3.0).

The table 2.4.1 shows the already processed results of objectivity control. The FIG leadership usually devote the greatest attention to the problem of judging objectivity.  Below I will give another table illustrating the information used to analyze the quality of refereeing. It presents the scores of participants in the OG-2000 competitions in floor exercises (Comp. IY. Men). This table is the foundation of the proposed methodology and largely repeats the form of the regular score sheet, so that most people involved in gymnastics will easily understand its contents. The scores A for the difficulty – and the scores B1-B6 for execution – are presented in the left half of the table.

In the following columns:

– окнч. –  final scores;

              – средн. – average scores defined as the arithmetic mean of scores B1-B6;

  – s – mean square deviation of the scores of the judges in the panel from the average scores

– ded – average deductions defined as the difference between 10 points and average scores.

The columns in the middle of the table, namely: «окнч.», «средн.», «s» and «ded» are presented for information only, if any of the readers want to conduct an independent analysis.

The deviations of scores of judges B1-B6 from the average scores are presented in the right half of the table, in columns B1 – ср. and further to B6 – ср. These columns contain all the information necessary for the analysis.

The obtained processing results are summarized in lines denoted as M, m, m’, S, s, σ, t and R. Their meaning is described in section 1.6.  

We now consider the table presented, focusing on the indicators t: A judge from Canada obviously favors the Chinese team (t = 4.2), and a judge from the UK favors the Romanian team (t = 4.4), a judge from Georgia is opposed to the Japanese team (t = -3.3), while judges from Japan (t = 4.4) and Russia (t = 3.0) can confidently be accused of helping their teams.

Because  the present material is an extract from my book, I will not explain here the procedure for calculating the indicators t, but limit myself to informing the reader  that I have set the range of this indicator to ± 2.7.

The data for table 2.4.1 I have chosen randomly from a large number of similar tables, which indicate the presence of statistically significant deviations of the scores.

The values of m‘ and s in this table are presented that the reader has the opportunity to get an idea of the actual values of these indicators in various groups of scores, and the range R is for completeness of information.

Based on the analysis of WC-99 (the largest amount of information), the presence of judges‘ alliances was revealed:

  – Hispanic Alliances:  Argentina-Venezuela;  Argentina-Spain;  Puerto Rico – Cuba;  Spain-Venezuela, Spain-Cuba.

  – Far-Eastern alliances: PRK – Korea;  China – Taipei;  Korea-China.

  – post-Soviet alliances: Russia – Belarus;  Russia – Ukraine;  Georgia – Belarus;  Kazakhstan – Ukraine;  Belarus – Russia.

Here is another example:

Judge Mo from China judges a vault in comp.II of the WC-99:

  – 2 participants from France: 4 scores; m = 0.149;  s = 0.054; t = 2.6 (probability of bias> 99%);

  – 3 participants from China: 6 scores; m = 0.209;  s = 0.162; t = 4.5 (probability of bias>99.99%).

Judge Keisel from France judges the bars in comp.I of WC-99:

  – 5 participants from France: 5 scores; m = 0.156; s = 0.098; t = 2.7 (probability of bias> 99.3%);

– 5 participants from China: 5 scores; m = 0.109; s = 0.077; t = 2.3 (probability of bias> 98.9%). When I showed these results to L. Arkaev, he was not surprised and explained to me that there are husband and wife among the delegations of China and France. The ability to identify family relationships through an analysis of judging amused me.

                                                                                    

And here and in my book, I cite far from everything that I managed to identify in the analysis process.  Nevertheless, the obtained results  seem to me very convincing in terms of the effectiveness of the methodology.

 

 2.5.  Manipulation of judges with the scores

 

In section 1.5, I already pointed out that the manipulation of scores is the simplest and most effective way to avoid accusment of bias.

For this purpose, the judge gave the participants of one team 3 scores with significant positive deviations from the average scores, and puts one score with a significant negative deviation from the average score. And vice versa – the judge, who gave the participants of one team 3 scores with significant negative deviations from the average scores, puts one score with a significant positive deviation from the average score. As a result, the coefficient t falls into the range we designated.

The actions of the judges, allowing to change the statistical characteristics of their refereeing in order to conceal the dishonesty of refereeing, were called „manipulation of scores“ by me. Manipulation of scores is effectively controlled by the value of the range of R. The value of R is an indicator of manipulation.

 

Tab.  2.5.1.  Allowable values of the range R depending on the number n of scores 

in a controlled set of scores

 

N 2 3 4 5 6
R min÷Rmax 0,0÷2,8 0,4÷3,3 0,7÷3,7 1,0÷3,9 1,2÷4,1
N 7 8 9 10 12
R min÷Rmax 1,4÷4,2 1,6÷4,3 1,7÷4,4 1,8÷4,5 1,8÷4,7

 

If the value of R exceeds the upper allowable value, then this indicates a judge’s manipulation of the scores. An increased R value does not always indicate the judge’s desire to conceal bias. Perhaps the judge found some systematics in his scores and found it necessary to change the character of his refereeing to exclude this systematics. But even in this case, the judge judges not in accordance with the performance of the participant, but in accordance with the circumstances.

The R value of two or more judges in the panel, lower than the lower allowable value, may be a reason for suspecting the panel or part of the panel in conspiracy.

 

Tab.  2.5.2.  Illustration of manipulation of scores.  OG-2000

 

Women. Comp.2

Vault

Men. Comp.1

Floor

Fed. of judge ROM CAN
Fed. of team BLR KOR
Quantity of scores 4 5
Deviation of the scores of the judge from average  scores 0,0750 -0,0250 -0,0250
-0,1417 -0,1417 0,0833 0,0833
-0,0583 -0,0583 -0,0667  
-0,1917 -0,1917 0,1833 0,1833
0,1250 0,1250
m -0,079 -0,131 0,060 0,092
m‘ -0,087 -0,138 0,077 0,108
S 0,061 0,061 0,061 0,061
t -1,48 4,01 1,63 2,71
R 4,36 2,18 4,02 3,35

 

Consider the table above.

Coefficients t (-1.48 and 1.63), which are reliable indicators of objectivity, unambiguously reject the suspicion of bias for both examples. However, the values of the ranges of R, exceed the allowable values for 4 and 5  scores, (look at the table  2.5.1).  

The maximum deviations in the absolute value of the scores of 0.0750 and -0.0667 respectively, are excluded from the third and fifth columns of table 2.5.2. In this case, the magnitudes of the range R decreased. Thus, with the help of a simple trick, the judges avoided the suspicion of bias, which is the worst for the judge. But at the same time, the t indicators exceeded the allowable values, which is a serious basis for the suspicions of these judges of bias.

The value of R does not always make it possible to unambiguously indicate which violation the judge committed. The application of the range R is considered in more detail in my book.

 

2.6.  Some additional analysis results

 

Complex analysis gives many additional interesting results.

Example: processing of refereeing by one judge of one participant. One judge can give up to 40 scores for one participant per Olympic cycle.  This is a significant sample. I was not too lazy to make some assessments and revealed an extremely interesting case.  One judge (not from Russia) statistically significantly overestimated the scores of Russian participants. Against the background of the totality of the scores given to all Russian gymnasts, the scores, he had given to A. Nemov were statistically significantly underestimated. I can’t explain the reason for this phenomenon, but I consider the fact extremely interesting.

In the process of working on the methodology and analyzing the results, I identified the case when the gymnast from the USA B. Wilson was deprived of a bronze medal in men’s all-around of the WC-99. Later I discovered that at the WC-05 in Melbourne the US gymnast Anastasia Lyukin was incorrectly deprived of the gold medal in women’s all-around. The reason for such errors: incorrect application of rounding rules.  It is described in detail in my book.

 Also I had analyzed the decision taken by WTC FIG to encourage or punish judges of  OG-2000. A detailed analysis of this solution can be found in my book. 

The resulting set of scores allows not only to analyze the quality of refereeing, but also to receive important, in my opinion, data that can serve to improve the rules of the competition and the procedure for their conduct:

– an integrated assessment of the accuracy of refereeing each major competition;

– assessment of the accuracy of refereeing on each apparatus;

– assessment of the accuracy of refereeing the first apparatus according to the schedule of competitions;

– assessment of the accuracy of judging the first participant on the apparatus at the start of the competition, (competition day, shift);

– assessment of the accuracy of refereeing depending on the shift.

Apparently, the large number of comments on the quality of refereeing that I made during the report and in the subsequent correspondence did not arouse the desire of the FIG leadership to take advantage of my suggestions. No wonder L. Arkaev called my report “a bomb.” And in 2004, I stopped my efforts in introducing a full-fledged methodology for analyzing the quality of refereeing. The period 1998-2000 was studied by me in full, and I considered the methodology almost complete.

But in the period 2005-2011, there were a large number of fundamental changes in the rules of the competition:

– transition to closed refereeing – 2005, scores of individual judges became secret;

– cardinal change in the structure of score: it could reach both 16 and 17 points;

– a change in the rules for judging performance: the minimum deduction was 0.1 points, and the next one became 0.3 points – 2006;

             – decrease in the number of members of the judging panel evaluating the execution from 6 to 5 judges; judging panel evaluating the execution was designated by the symbol E;

           – introduction of referees (R) to receive an automatic and quick adjustment system in case of problems with performance evaluations – 2011;

              – clarification of the order of sharing of ranks with an equal scores or the sum of points.

It became interesting for me to check whether my methodology would work under the new rules, and I returned to the study. Limited access to information greatly impeded my work, but with the help of old links I managed to get some information sufficient for this check.

 

 

  • 3. Features of the methodology of quality analysis refereeing for the period 2015-2017

 

  • 3.1.  The research methods carried out during the verification confirmed that the deviations of the judges‘ scores from the average scores are subject to the normal distribution law (Gaussian distribution).

3.2.  It is shown that the standard deviation of the average score for performance allows you to control the quality of refereeing during the competition.

3.3.  The effectiveness of the control technique of “joint judging” has been confirmed.

3.4.  The process of manipulating scores is analyzed.

3.5.  The set of indicators has been supplemented (two additional indicators m ”and t ‚ have been introduced) for a comprehensive assessment of the quality of refereeing in the context of an increase in the minimum penalty for performance.

3.6.  The effect of the appearance of the R – panel.

 Sections 3.7, 3.8, 3.19 are identical to paragraphs Sections respectively. 

The laws of mathematical statistics do not change with the change of rules. But the application of the methodology without taking into account changes in the rules can lead to completely meaningless estimates and conclusions. There are no doubts that the rules will change.

Recently (as far as I know, in 2014), highly qualified specialists have joined the work on judging: Professor Pascal Felber and Dr. Hugues Mercier from the Institute of Informatics of the University of Neuchâtel (Switzerland). I was able to get acquainted with the materials of their presentation on 05/05/2015, and make sure of their qualified approach to the problem.

At  his presentation, Dr. Hugues Mercier said:

“The first objective is to understand exactly what the FIG wants. This includes the FIG itself . 

The second objective is to analyse if the requests and constraints of the FIG are feasible, and if not to consider alternatives 

I will use as simple statistics as possible, but no simpler.”

These words instantly made me an ally of Dr. Hugues Mercier. After getting acquainted with the position of Dr. Hugues Mercier, it seemed to me that I was late with my decision to return to the analysis of refereeing. 

I nevertheless conducted an analysis of the competitions of the WC-15, OG-16 (women), WC-17. And this analysis showed that a lot of work remained for me.

 

  1. The results of the application of analysis methods

 quality of refereeing for the period 2015-2017

 

When analyzing the judging of the competitions for 2015-2017, I had the official data: WC-15, OG-16 (only for women), WC-17, and in the report on WC-15 there was no list of judges of the IY competitions. 

The law of distribution of deviations of judges‘ scores from the average ones was the first thing I checked.

I have  processed:

  • more than 30,000 judges‘ scores;
  • more than 60 combined judging panels E + R, of which 220 judges were examined;
  • almost 5,000 team results.

            In section 3.1. I had confirmed the possibility of applying the methodology developed earlier. But changing the minimum penalty to 0.1 points required a slight change in the order of analysis of objectivity. About this in chap.  4.5.

 

       4.1. Average score accuracy

 

To control the accuracy of the average score, s should be calculated from five scores E1-E5. In this case, intervention in the final score should not be allowed when one of the judges R considers the final score incorrect, but only when the indicator s proves to be unallowable in value. The tolerance on the values f indicators can easily be determined by the results of previous competitions of a similar level.

 

 

  • Estimation of judging quality of an individual judge

 

 

Table 4.2.1.  The tolerances  established for statistical 

characteristics of judging WC-15, ОG-16, WC-17 

 

Men Women
WC 15
Comp.I Comp.II Comp.IY Comp.I Comp.II Comp.IY
М  ±0,053 ±0,091 ±0,086 ±0,059 ±0,084 ±0,069
S 0,087 ÷ 0,287 0,070 ÷ 0,234 0,082 ÷ 0,230 0,077 ÷ 0,305 0,062 ÷ 0,218 0,070÷ 0,226
ОG-16
М  I have no information ±0,088 ±0,086 ±0,096
S 0,086÷ 0,260 0,061÷ 0,257 0,069÷ 0,201
WC-17
М  ±0,067 ±0,092 Didn’t carry out ±0,067 ±0,114 Didn’t carry out
S 0,094÷ 0,318 0,061÷ 0,289 0,081÷ 0,349 0,049÷ 0,317

 

I believe that the reader is more interested in the results of recent years than the results of 20 years ago. Therefore, in my book, I presented tables of statistical characteristics of refereeing for 2015-2017 in a significantly larger volume than for previous years. Here I will limit myself to listing the number of exceeding of tolerances  of the statistical characteristics by the judges of the E panel according to the results of WC-15.

Men:  Comp. I — 3;  Comp. II — 3;  Comp. IY — 5. 

Women: Comp. I — 2;  Comp. II — 3;  Comp. IY — 3.

The reader can compare these data with the corresponding data for FM-99.

Men: Comp. I – 3;  Comp. II – 4;  Comp. IY – 2. 

Women: Comp. I — 3;  Comp. II — 1; Comp. IY — 1.

My book contains tables with the characteristics of the judging of all judges, the data for which were available to me. 

 

 

  • Indicators of „joint judging“ in the judges panels 

 

 

The calculated indicators M, S of the refereeing of all judges of the period 1998-2000 are presented in the tables in my book. Here I will present the results of joint refereeing for the period 2015-2017.

In the book, I have placed 62 correlation matrixes available to me, and here are generalized results on this section.

Number of judges suspected of joint judging:

 

        Women. WC-15.                         Women. OG-16.                  Women. WC-17.    

Comp. I: 93% (26 judges -!)              Comp. I: 28% (judges)        Comp. I: 64% (18  judges)

Comp. II: 21% (6 judges)                Comp. II: 7%  (2 judges)     Comp. II: 36% (10 judges) 

Comp. IY: 39% (11 judges)             Comp. IY: 14% (4 judges)              

 

         Мужчины. ЧМ-15.                                                              Мужчины. ЧМ-17.

Comp. I: 86% (36 judges-!)                                                          Comp. I: 55% (23 judges)

Comp. II: 60% (25 judges)                                                           Comp. II: 48% (20 judges)

Comp. IY: 55% (23 judges) 

 Table 4.3.1. To the assessment of the quality of refereeing based 

on the simplest statistical characteristics.  OG-16.  Comp. I. Women

 

Conditional rating of judge Fed. of judge Position of judge Mod{M) S Correlation coefficient r
1 SLO R1-V 0,007 0,091 В допуске*
2 UKR R2-V 0,004 0,105 В допуске
3 PAN E4-V 0,004 0,114 В допуске
4 BUL E5-V 0,042 0,116 В допуске
5 FIN E1-V 0,020 0,121 В допуске
6 IRL E2-V 0,022 0,122 В допуске
7 RSA E3-V 0,047 0,137 В допуске
8 AUT R2-Fl 0,019 0,143 В допуске
9 ITA E2-Fl 0,041 0,159 В допуске
10 SUI R1-Fl 0,036 0,161 0,138 – GER
11 PER E2-B 0,036 0,174 0,194 – ISR
12 GER E1-Fl 0,054 0,182 0,138 – SUI
13 VEN E4-UB 0,021 0,184 В допуске
14 NOR E4-B 0,015 0,189 0,183 –  GBR
15 LTU E4-Fl 0,005 0,194 В допуске
16 GBR E5-B 0,015 0,195 0,183 – NOR
17 BRA E3-UB 0,035 0,197 0,167 – URU
…….
23 ISR E1-B 0,036 0,213 0,194 – PER

          * tolerance for the correlation coefficient is -0.365 ÷ 0.079.

 

      Table 4.3.1 is similar to table 2.3.3 and presents the statistical characteristics and correlation coefficients of some female judges, comp. I, OG-16. It was compiled in order to confirm the fallacy of choosing the best judges based on the simplest statistical characteristics. But the results turned out to be very unexpected: the characteristics of vault judging turned out to be significantly less than on the other apparatus(see also table 4.3.2).

          If this feature is neglected, then the best characteristics of judging will again be among the judges who used the “joint judging” method.

 

   Table 4.3.2.  Integral characteristics of refereeing on the apparatus.  Summary table

 

MEN Fl H R V PB HB
WC-15 Comp.I 0,22 0,23 0,18 0,12 0,18 0,22
Comp.II 0,15 0,18 0,18 0,11 0,15 0,19
Comp.IY 0,15 0,18 0,15 0,13 0,14 0,20
WC-17 Comp.I 0,21 0,27 0,18 0,14 0,20 0,26
Comp.II 0,19 0,22 0,18 0,10 0,20 0,21
  Mean 0,184 0,216 0,174 0,120 0,174 0,216
WOMEN WC-15 Comp.I 0,19 0,21 0,12 0,27
Comp.II 0,15 0,15 0,10 0,19
Comp.IY 0,14 0,17 0,14 0,18
OG-16 Comp.I 0,14 0,21 0,12 0,21
Comp.II 0,18 0,19 0,11 0,20
Comp.IY 0,17 0,16 0,10 0,14
WC-17 Comp.I 0,20 0,28 0,13 0,26
Comp.II 0,21 0,24 0,10 0,22
  Mean  0,179 0,201 0,115 0,209

 

I repeat: “joint judging” distorts the results of the analysis of the quality of refereeing and may distort the results of competitions. 

 

                 4.4.  The benefits and harms of the R panel

 

The presence of the R panel really allows you to quickly change – I doubt that to correct – the score of the E panel with a significant discrepancy between the scores of the E and R panels. This problem is very relevant.  In the case of an erroneous score, according to the technical committee, it takes a lot of time to correct it. This delays the competition, knocks the participants off the beat, causes irritation of the audience, and interferes with the television reportage.

It is absolutely obvious that it is possible to correct the score given by 5 judges based on the score given by 2 judges if these 5 judges of the E panel were completely incompetent. Fortunately, gymnastics has not yet reached such a point.

My book provides a detailed analysis of the use of the R panel. Here I confine myself to the categorical assertion that this is an absolutely pointless way to improve the quality of refereeing: the score of judge R, which is very different from the scores of the other judges, necessarily has a decisive influence on the final score. The bottom line is that a very different score of the judge of E panel is discarded.  A very different score from the judge R is not discarded and has a decisive influence on the final score.

              Imagine the scope for juding arbitrariness!

Even Longines programmers were confused by the difference between the E-jury Scores and the E-Scores: in the process of analyzing the judging process, I have found a large number of errors in the official reporting materials. Copies of the relevant sheets from the reports with the necessary comments can be found in my book.

Here I will describe the “pearl” that I have “dug up” in the official report on the WC-17.

The book provides a copy of page 337 of this report with the results of comp. III in the men’s vault.

Score of  E panel of a participant Shirai (JPN)  is 9,600. His score according to  R panel  is 9,450.  Score of  E panel of a participant LOPEZ (GUA)  is 9,566. His score according to  R panel  is 9,450.

The difference between E score and ER score of Lopez is even less than of Shirai. But the computer, for unintelligible reason, found it necessary to adjust the final score of Vega Lopez downward. As a result, Lopez was in fifth place, instead of a well-deserved fourth one.

R panels arose in 2011. 6 years is more than enough time to complete such a simple program. I’m not ready to make judgments about the qualifications of programmers who make such mistakes, but the fact that the program is accepted without competent control is obvious to me: either the program does not properly round the scores, then it gets confused in the scores of the E panel and the R panel. This raises the question: is it possible to trust the results calculated using unprocessed mathematical software? The use of computer systems requires the correct and very rigorous development of mathematical support.

It is not for nothing that the computer is sometimes called the „amplifier of incompetence.“

I believe that the appearance of the R panel did not benefit gymnastics.

It is almost impossible to evaluate the quality of the work of the R panel based on the scores made by the judges of this panel.

For example, in the R panel, the difference between the scores is 1 point. Which of the judges overestimated or underestimated the score? Or maybe both judges put dubious scores? The analysis is possible only by combining the scores of the E and R panels. The analysis of judging separately for the E panel is possible in full, but the analysis will be completely effective only using the scores of all the judges (E+R).

 

 

 

  • Individual judge objectivity indicators

 

 

 The analysis of judging presented in this chapter was performed similarly to the analysis conducted for judging 1998-2000, with one significant difference: an increase in the minimum deduction to 0.10 points

Dr. Hugues Mercier gave all judges indulgences by 0.1 points in his presentation in Lausanne on 02.24.2017 on the topic “Evaluating the performance of international gymnastics judges”,  «…a judge giving 9.8 to an athlete deserving 9.7 is never an outlier». I can not agree with the statement „never“.  It is valid for a single score. But now imagine that the judge gave scores with such a deviation to the whole team of 4 or 5 people; team of 3 people can get 6 scores in the vault of the comp. I. Is it necessary to react to this?  And, if necessary, how?

I introduced additional indicators m” and t‘ to solve this problem.

These indicators take into account the possibility of repeatedly distinguishing a score of 0.1 points from a well-earned one.

 

        Table 4.5.1.  Examples of refereeing at OG-16. Women

 

Comp.I  
V UB B FX
Fed. of judge UKR URU FRA CRO KAZ LTU RUS GER
Position of judge ER2 ER1 E1 ER1 E3 E4 E5 E1
Fed. of team FRA CAN ITA USA NED GBR NED RUS
n 4 4 4 4 4 4 4 4
m‘ -0,097 -0,144 -0,207 0,126 0,242 0,178 -0,152 0,166
m“  -0,072 -0,119 -0,182 0,101 0,217 0,153 -0,127 0,141
Ϭ 0,036 0,018 0,119 0,055 0,095 0,031 0,068 0,066
t -4,52 -6,15 -3,28 3,46 4,50 6,75 -3,71 4,37
t‘  -3,36 -5,07 -2,87 2,77 4,03 5,79 -3,10 3,71
R 0,82 0,22 1,30 0,60 0,95 0,32 0,73 0,82
Comp.II   Comp.IY  
V UB V B FX
Fed of judge            FIN CZE RSA ISR FRA URU
Position of judge E5 E3 E5 E2 E2 ER1
Fed. of team CHN ITA RUS BRA USA JPN
n 2 2 3 3 3 3
m‘ -0,192 -0,171 -0,124 0,107 -0,120 0,211
  m“  -0,142 -0,121 -0,091 0,074 -0,087 0,178
Ϭ 0,010 0,010 0,036 0,014 0,008 0,087
t -6,56 -7,08 -4,10 5,51 -3,54 3,51
t‘  -4,85 -5,01 -3,00 3,81 -2,57 2,96
R 0,10 0,13 0,66 0,33 0,09 0,97

The t‘ indicators in table 4.5.1 show: the judge from Ukraine is opposed to the French team, the judge from Uruguay is against Canada, and the judges from France and the Czech Republic do not wish success to the Italian team, the judge from Croatia supports the US team, a Kazakh judge is friendly to Netherlands, a Lithuanian judge is friendly to the UK team, a German judge helps the Russian team, Russia does not want to help Netherlands, the Finnish judge has dissatisfaction with China, Israel supports Brazil, France is dissatisfied  on the US team, the judge from South Africa is opposed to the Russian team, the referee from Uruguay wants to Japan’s success.

In table 4.5.1 I have limited myself to only a few examples of judging, and only in OG-16 and only among women.

In the following table 4.5.2, I give all cases of significant understatements or overestimations by judges of scores for their teams in all competitions available to me from 2015-2917.

In this case, the judge from Iceland should probably be excluded from the table, as  the scatter of scores on the Icelandic team requires a separate trial.

 

Table 4.5.2. Refereeing by the judges their teams 

 

WOMEN MEN
WC-15 OG-16 WC-17 WC-15 WC-17
Comp.I Comp.II Comp.II Comp.IY Comp.I Comp.I Сор.2 Comp.I
UB Fl Fl Fl B UB V Fl Fl PB H V Fl
Fed. of  judge DEN CZE LAT SUI CHN RUS RUS COL DEN SWE FRA CHN ISL
Position of  judge E4 E2 E1 ER2 E3  E4 E3 E1 E3 E2 E3 E5 E4
n 3 3 3 5 2 2 3 2 3 2 5 2
m‘ -0,486 -0,295 -0,282 0,160 0,270 0,246 0,130 -0,277 -0,324 -0,129 0,197   0,129 -0,479
  m“  -0,453 -0,262 -0,249 0,140 0,220 0,196 0,097 -0,227 -0,291 -0,078 0,177 0,089 -0,429
Ϭ 0,12 0,06 0,13 0,02 0,03 0,07  0,03 0,07  0,12 0,04 0,13 0.00 0,162
t -7,09 -8,09 -3,65 9,83 7,35 3,80

 

5,04 -6,07

 

-4,41 -4,27 3,32 6,90 -4,14
t‘  -6,61 -7,17 -3,22 8,58 5,98 3,02 3,77 -4,16 -3,97 -2,74 3,03 4,28 -3,70
R 0,83 0,50 1,50 0,20 0,29 0,49 0,59 1,01 1,01 0,37 1,26 0,00 1,10

 

Thus, judges according to the results of refereeing of their countries were distributed as follows:

– 7 judges statistically significantly underestimated the scores;

– 97 judges statistically insignificantly underestimated the scores;   

– 81 judges statistically insignificantly overestimated the scores;

– 6 judges statistically significantly overestimated the scores.

Compare these results with the refereeing results of their countries at the competitions EC-98, WC-99, EC-2000, OG-2000:

–  6   judges statistically insignificantly underestimated the scores;

  –  83  judges statistically insignificantly overestimated the scores;

  – 41 judges statistically significantly overestimated the scores.

 I believe that  judges became more afraid of the sanctions of the FIG leadership more than repressions by the leadership of national federations as a result of the transition to closed refereeing,

An example of this, I consider refereeing a judge from Taiwan gymnasts from China. As a judge of comp. II at WC-17, this judge statistically significantly overestimated the gymnasts from China: his indicator t‘ for China is 3.94. Apparently, he believed that the control of the objectivity of judging is carried out only on the basis of nationality.

In section 2.4 I could speak with confidence about the bias of refereeing.

A significant part of the judges participating in the judging gave too high scores for their teams. Now the picture is fundamentally different: the judges became afraid to help their participants. I even considered it necessary to exclude the word “bias” from the name of the table 4.5.2.

A significant reduction in the number of judges who “help” with their own is achieved, in my opinion, by the total intimidation of judges.

However, it is almost impossible to imagine that the judges decided to completely neglect the interests of their teams: a lot of factors affect the judges‘ desire to “help their participants”.

In section 2.4, I clearly identified judges‘ alliances. It was not too difficult, because  the judges almost uncontrollably “helped their participants” and did not look for a large number of allies.  

 After changing the rules, most of these alliances should have been preserved, but for their identification it is necessary to create the appropriate software. This does not present any particular difficulty: only in comp. I of the WC-17 in men I have identified more than 40 cases of overstatement of scores for “my own” and almost 50 cases of underestimation of scores for “other participants”.

 

                           4.6. Manipulation of judges

The manipulation of scores in order to conceal the overestimation of scores for the gymnasts from the team of their country was significantly reduced due to the decreasing to almost zero of cases of help to “their” participants. Therefore, the problem of manipulation of judges with assessments has lost interest to some extent. Perhaps it will manifest itself in a more detailed analysis of the judging procedure.

 

  1. Conclusion

 

  1. The number of the overestimation of scores for the gymnasts from the team of their country decreased to almost zero after the introduction of closed refereeing.  I guess, that the most of judges were intimidated by the leadership of FIG. 
  2. 2.  There is a reason to believe that the judges replaced the overestimating of the scores for their participants with arrangements with judges of the neutral countries.
  3. High correlation of judges‘ scores in one panel remains. Nearly twenty years have passed since my report in March 2000, where the correlation of judges‘ scores was first mentioned. But the judges still have the opportunity to coordinate the scores with the adjoining judges, and sometimes with the judges who are at a distance.
  4. Creating and using R panels is against the rules of mathematical statistics, and also increases the possibility of judges‘ arbitrariness, which can lead to distortion of the results of competitions. I believe that the appearance of the R panel did not benefit gymnastics.
  5. 5.  D-r Hugues Mercier had deslared: «a judge giving 9.8 to an athlete deserving 9.7 is never an outlier». This declaration is valid for a single score. But the team can consist of 3, 4, 5 athletes. A team can perform up to 6 vaults. Repeated difference between a score of 0.1 from a well-earned score may be a sign of biased refereeing. I substantiated my position on this issue in section 5, where I introduced introduced additional indicators m” and t‘.
  6. 7. The accepted change in the values  of deductions for execution errors led to the fact that the S indicators characterizing the accuracy of refereeing in the vault became statistically significantly less than the S indicators characterizing the accuracy of refereeing on the other apparatuses. I believe that when developing new rules, additional attention should be paid to this issue.
  7. 8. Using methods of mathematical statistics for comp III due to the small sample size demands combination of the results of these competitions with a priori information. It is necessary to draw up an individual judge’s card containing relevant a priori information.  I developed the form of such a card, and the sample is placed in my book.
  8. 9. Lack of qualified control when accepting software leads to errors in official reports on competitions, up to an incorrect determination of the places of participants in competitions (see the last three pages in my book).

 

List of Literature

 

 

  • Ободовский Ю.М. «Судейские битвы у гимнастических помостов.» Москва. Изд. «Авторская мастерская.» 1918 г.
  • Dr Hugues Mercier. «Evaluating the performance of international gymnastics judges». Universite de Neuchatel. 1917.

 

 

BRIEFLY ABOUT THE AUTHOR 

 

The first major competition was the championship of the Armed Forces of the USSR in 1960, a vault judge.

For the first time he headed the Secretariat of the USSR Championship in 1961.

He graduated from the Moscow Aviation Institute in 1962.

Judge of the Republican (Russia) Gymnastic rating in gymnastics since 1962.

The member of the Presidium of the Judge Board of Moscow in 1963-1967.

Candidate of Technical Sciences since 1971.

Senior Researcher since 1989.

Judge of the USSR Gymnastic rating since 1990.

In 2000, he made a presentation on the methodology for a comprehensive analysis of the quality of refereeing at a meeting of the heads of gymnastics federations of the CIS and Baltic countries, which was attended by the president of the International Gymnastics Federation (FIG), B. Grandi, secretary general of the FIG  N. Bueche, honorary president of FIG. Y. Titov.

A book: “Battles of judges at the gymnastic platforms”, dedicated to the problems of quality control of refereeing in gymnastics was published in 2018.

Those who wish to receive this book can contact the author by phone:  +7 910 444 06 55 or by E-mail: obodovski@inbox.ru.

Here is an abridged version of the book, as  its full version, apparently, caused difficulties for many officials to whom this book was sent. There are some amendments and clarifications in  the abridged version.