Showing posts with label Quality. Show all posts
Showing posts with label Quality. Show all posts

Monday, May 24, 2021

@Wikimedia needs your support because what it does, what we do is not enough

 An article in the "Daily Dot" insists that Wikimedia has plenty of money.  This is based on the growth of Wikimedia budgets and yes, it has grown substantially over time. Particularly the English Wikipedia provides a lot of content and serves some 50% of the Wikimedia traffic. 

When people analyse its content, it becomes problematic. Even though its content is referenced, many of the references are old and could do with new insights that science brings on a regular basis. The content is male oriented and thanks to projects like "Women in Red" it has improved substantially but not enough. 

We know all mayors of Denver and we do not know National government ministers of African countries. Lists are to be maintained on EVERY Wikipedia, English consensus insists, and they are not properly maintained as a result. Not even on the English Wikipedia.

Money buys you things. When you donate to the WMF, you gain a sense of ownership. That is important; we may not need more money but we do need a sense of ownership in India, Columbia, Nigeria and Guinea. When the other 50% of Wikimedia traffic takes ownership away from those who had enough, we find topics with more real world relevance. Commons becomes usable in the other 299 languages and we seek out these 299 communities to make it work for them.

Given that we don't do enough for 300 languages, given that we can do much better, I will argue that Wikimedia needs more support, even money.

Thanks, GerardM

Sunday, April 12, 2020

False friends and ListeriaBot - finding a way out of an impasse

ListeriaBot is a bot that maintains lists based on information in Wikidata. In this blogpost I will explain what a Listeria list is, what it is used for. I will point out its qualitative benefits and explain how Listeria can be instrumental to limit bias, stimulate collaboration and help us share in the sum of the knowledge available for us.

The heart of a Listeria list is a query. In this query it is defined what data is retrieved from Wikidata, it includes the order of presentation and shows this information in a language depending on the availability of labels.

Listeria lists are defined only once and every day a job run by the ListeriaBot updates all lists with the latest data from Wikidata. In this way available information is provided even when articles are still to be written. When there is an article to read, the label is shown in the upright position, when there is not is shows in cursive.

The biggest difference between a Wikipedia list and a Listeria list? No false friends. When you seek a specific "Rebecca Cunnigham", it is really powerful to know that your Prof Cunningham will always be known as Q77527827 and is also authoritatively known by other identifiers. From a qualitative point of view, particularly in lists, red links even blue links such disambiguation is a big thing. At this time a typical Wikipedia list has an error rate because of disambiguation issues of around 4%. I frequently blogged about this, the Listeria list I often referred to is for the George Polk award.

Maintenance is another reason to choose for Listeria lists. This was documented by Magnus, a list was maintained up to a point in time as a Listeria list and for all the wrong reasons human qualities were to prevail. Magnus compared the results after some time and the human maintained list proved to be the poorly maintained list.

Categories are lists of a kind, for many categories it is defined what they contain. Consequently Wikidata is easily updated from Wikipedias and can serve as a source for updating categories as well.

Ok, the impasse. ListeriaBot is blocked because of a false friend issue. The objective is to find a resolution that will benefit us all. The false friend issue is that images can have a same name in both Wikimedia Commons and in English Wikipedia. The existing algorithm for showing pictures is that local pictures take precedence. When ListeriaBot is to do things differently, it can. Thanks to the wikidatification at Commons, we can indicate with a Wikidata identifier what a picture "depicts". Wikidatification of images can also be introduced for pictures at English Wikipedia and it is then becomes easy to always show what Commons has unless a preference is given to show a specific image for a particular project.

I have been told that I do not assume good faith. When I see the extend people care to go to resolve this issue I am only amused. The objective of what we do is share in the sum of all knowledge and do this in a collaborative way.

English Wikipedia fails spectacularly by assuming that their perceived consensus is in the best interest of what we aim to achieve. There is no reflection on the quality brought by Listeria, there is no reflection on how its quality can substantially be improved. I fail to understand what they achieve except for feeling safe by insisting on dated practices and dated points of view.

I wish we could be one community that is known by a best of breed effort with one common goal; sharing the sum of all the knowledge that is available to us.
Thanks,
        GerardM

Saturday, February 15, 2020

Wikipedia consensus? - It is who you ask but what are the facts

An article in VICE starts as follows: 'Wikipedia consensus is that an unedited machine translation, left as a Wikipedia article, is worse than nothing'. This article is problematic in so many ways, it starts with this premise because the Cebuano Wikipedia does not contain machine translation. It contains machine generated text and, to add insult to injury this same article states: 'the majority (generated articles) are surprisingly well constructed'.

An article like this can be sanity checked. Principles come first;
  • This is about a Wiki in contrast to the Nupedia approach. 
  • Wikipedia’s founding goal is to make knowledge freely available online in as many languages as possible.
  • There is a difference between opinions and facts
It is important how arguments are made. When "highly trusted users who specialize in combating vandalism" are introduced and comment that "many articles are created by bots", it does not follow that the quality is low nor that this is to be considered vandalism but the implication is made.

It is a fact that the Cebuano Wikipedia has 5,378,563 articles and also that there are some 16.5 million people who understand Cebuano. There is however no relation between these two facts. More relevant is that the wife of Sverker Johansson has Cebuano as her mother tongue and his two kids learn from their maternal cultural heritage also thanks to the work he does for the Cebuano Wikipedia. That is very much a classic Wiki approach.

In contrast the English Wikipedia has its bot policy preventing the use of bots for generating content. These notions should be local to the English Wikipedia and need not have relevance elsewhere. These highly trusted users can be expected to proselyte this point of view and thanks to this POV they take away a source of information without offering any credible alternative for the existing lack of information available to the rest of the world. At the same time the English Wikipedia is biased in the information it provides and does not provide the same quality of service for the domains selected for the Cebuano Wikipedia.

Sadly the Wikimedia Foundation itself makes no effective difference in support of the "other" languages it is said. An alternative to the LSJbot was introduced and it may be able to make a difference but as it does not provide a public facing service making it very much a paper tiger. Even worse are the Nupedia notions in the combination of two things: "Due to its heavy reliance on Wikidata entries, the quality of content produced is heavily influenced by the quality of the Wikidata available." and "It can discredit other Wikipedia entries related to automatic creation of content or even the Wikipedia quality.” These notions are problematic for several reasons.
  • No information is preferred over little information when our service to an end user is considered
  • Quality of information is framed in the light of existing Wikipedia entries. Whose Wikipedia entries are we considering? They are however irrelevant as our aim is to inform our end users; they do not cover the same subject.
  • When the quality is considered of Wikidata .. Why, it is a wiki and its quality is improving particularly as so many eyes shine their light on it.
  • We can inform, in any and all languages, and we do not even have to call it Wikipedia, we do not even have to save it in a Wikipedia when we only cache the results from the automated text generation.
  • When we cache results of automated text generation, texts can be generated again when the data is expanded or changed.
So far the critique of the VICE article, but then again does English not have its own problems?
  • Its 1,143 administrators and 137,368 active users are struggling to keep up, when you compare it with the 6 administrators and 14 active users for the Cebuano Wikipedia it is understandable that, as they grow, the English have to rely more and more on bots and artificial intelligence.
  • Magnus has demonstrated that the maintenance of lists is better served not by editors but by using the data from Wikidata
  • The Wikipedia technology has a problem with false friends. Arguably some 4% of list entries are wrong because the wrong article is linked to. When links are solidified by using Wikidata identifiers instead, this problem disappears in the same way as the problems with interwiki links disappeared.
The biggest problem "Wikipedia consensus" has is that it was formulated in the past by a tiny in-crowd making up the "accepted" big words for the rest of us and worse they can not be swayed from their POV by facts.
Thanks,
      GerardM

Tuesday, November 12, 2019

Instant gratification at @Wikidata

As I write this, it is 11:46am at 09:26am I added papers to prof Hafida Merzouk. The edits are picked up by Reasonator but not by Scholia. In a similar way, edits done are not picked up by Listeria.

Instant gratification is now a thing of the past, the work done at Wikidata may eventually be picked up in a Scholia or Listeria but it is not funny. Can I tweet about the things I find or have done when Wikidata no longer reflects the relevant changes?

This may sound like trivial but it does mean that when I look back at my work that  there is no longer a timely way to do so.

Instant gratification motivates and it is a factor in maintaining quality. We are losing it.
Thanks,
      GerardM

Friday, November 08, 2019

Bias in @Wikidata and a SMART approach

When at the WikidataCon quality was presented, it was rated from 1 to 5. This approach has its own bias because it does not consider what may not be there. What is not there can be made visible using assumptions like: "a university has more than one employee" (employee includes professors) and, every country has at least one university..

The bias in Wikidata starts with the way it is mostly used and consequently how it is taught. People are shown what Wikidata looks like, immediately followed up with training in the use of query and the use of tools. At every level it takes considerable skills to make a use of Wikidata. The first hurdle to overcome is to understand the data in a single item. When your language is not English you are toast. This is Cape Town in Newari and this is a useful presentation using Reasonator. With Reasonator the information is easy to digest and adding missing labels is just one click away.

The second hurdle is knowing what bias it is you want to remedy. For a known bias like the gender gap, the Women in Red have lists of missing Wikipedia articles. A Wikidata gap is expressed by the absense of data. Listeria lists are great at that.. These are all the universities of Africa.. If you do not get the extend of what we miss, you have some thinking to do. When you apply this principle to the science of Africa, you find a lot of lists and the biggest issue remains; missing lists.

When you tackle a missing subject like I did for the "Affiliates of the African Academy of Sciences", you will find a source as a reference for the group and a reference on every affiliate. To ensure that the data is relevant and actionable, I added all of them, linked them to ORCiD and/or Google Scholar enabling SourceMD to link them to their papers. I added nationality because this may trigger inclusion on the Women in Red lists and when it was obvious, I added employers so that they may be included as a scholar on African University lists..

When we as a movement want to fight bias, we have to consider the use of lists and particularly Listeria list to show the developments of a subject. With lists available on many Wikipedias, it becomes possible to gain traction on what we miss. This approach is distinctly different as it acknowledges the need for more support for item based editing and it makes the point that missing data is a quality issue that needs to be addressed as a fundamental issue.
Thanks,
      GerardM

Thursday, November 07, 2019

@Wikipedia talks about @Wikidata

"WD is unreliable. WP:V and WP:RS are completely ignored (from any editors). International NPOV is a problem too." It is so SMART, that the best I can do is ignore it. Then again it is an open invitation to talk about Wikipedia..  There is no Wikipedia there are over 300 Wikipedia language editions.. so even the acronyms are lost on me as there is no one Wikipedia to rule them all.. 

So forget about acronyms and lets talk Wikidata and by inference raise issues particularly for the English Wikipedia where appropriate. First, Wikidata includes more items than there are subjects raised in any and all Wikipedias. Its quality can be considered in many ways and verifiability is largely ensured because of the association with other "authorities" about a subject. Thanks to the increased use of open data, it is possible to verify that specific statements are shared, increasing the likelihood that they are correct. For some information like for scientists who are a member of the AAS Affiliates Programme, we have/may have references to the authoritative source. Such references may be on a project or on an item level, it makes verifiability easy and obvious. 

Wikidata has an issue with all kinds of gaps in its coverage. For many African countries no universities are known, there are hardly any scholars associated with them. Thanks to Listeria functionality we can monitor if and when data is added. Many a Wikipedia do not have such tools because of the aversion of Wikidata by some. At the same time projects like Women in Red rely on Listeria lists and by inference Wikidata to know what to work on.

In tools like Reasonator and Listeria lists are generated and, when you compare them with Wikipedia lists, the quality is measurably better. I published frequently in the past about the Polk award.. In its lists Wikipedia has a likely error rate of six percent. When they fudge the record by not linking at all, the quality of a Wikidata lists is even better because it is much better at linking items than Wikipedia is at linking red links.  There is a solution, it just requires a willingness by Wikipedians to cooperate. 

I understand what is meant by "international NPOV" and it is where Wikidata is by definition better than an individual Wikipedia. By definition because Wikidata represents data from ALL Wikipedias. Thanks to the people of DBpedia, there is a potential to highlight where Wikipedias differ and it is more likely that the fruit of their labour will enrich Wikidata than Wikipedias.

So a Wikidatan walks into a bar..
Thanks,
       GerardM

Thursday, September 26, 2019

The lowest hanging fruit in #DBpedia

What I hate with a vengeange is make work. DBpedia as a project retrieves information from all the Wikipedias, wrangles it into shape and publishes it. In one scenario they have unanimous support from one or more Wikipedias agreeing on the same fact and, they all may have their own references.

We should import such agreeable data without further ado. An additional manual step to import to Wikidata is not smart because manual operations introduce new errors. Arguably when there is no unanimous support manual intervention may improve the quality but given the quantity of the data involved, it means that a lot of data will not become available. THAT in and of itself has a negative impact on the quality of available data as well.

So what to do.. Harvest all the data that is of an acceptable quality, that is the data DBpedia accepts for its own purposes. Enable an interface where people verify the data where their project is challenged.

When we truly aim to engage people, we enable them to target the data they want to work on. I will happily work on scientists but do not expect me to work on "sucker stars". More than likely there will be people who care about soccer stars but not about "crazy professors".
Thanks,
      GerardM

Wednesday, September 25, 2019

With #DBpedia to the (data) cleaners

The people at DBpedia are data wranglers. What they do is make the most of the data provided to them by the Wikipedias, Wikidata and a generous sprinkling of other sources. They are data wranglers because they take what is given to them and make the data shine.

Obviously, it takes skill and resources to get the best result and obviously, some of the data gathered does not pass the smell test. The process the data wranglers use includes a verification stage as described in this paper. They have two choices for when data that should be the same is not; they either have a preference or they go with the consensus ie the result that shows most often.

For data wranglers this is a proper choice.. There is an other option for another day, these discrepancies are left for the cleaners.

With the process well described, the data openly advertised as available, the cleaners will come. First people akin to the wranglers, they have the skills to build the queries, the tools to slice and dice the data. When these tools are discovered, particularly by those who care about specific subsets, they will dive in and change things where applicable. They will seek the references, make the judgments necessary to improve what is there.

The DBpedia data wranglers are part of the Wikimedia movement and do more than build something on top of what the Wikis produced; DBpedia and the Wikimedia projects work together improving our movement's qualities. With the processing data generally available this will become even more effective.
Thanks,
        GerardM

Sunday, September 22, 2019

Comparing datasets, bigger or better or it does not matter?

When Wikidata was created, it was created with a purpose. It replaced the Wikipedia based interwiki links, it did a better job and, it still does the best job at that. Since then the data has been expanded enormously, no longer can Wikidata be defined by its links to Wikipedia as it is now only a subset.

There are many ongoing efforts to extract information from the Wikipedias. The best organised project is DBpedia, it continuously improves it algorithms to get more and higher grade data and it republishes the data in a format that is both flexible and scalable. Information is also extracted from the Wikipedias by the Wikidata community. Plenty of tools like petscan and the awarder and plenty of people working on single items one at a time.

Statistically on the scale of a Wikidata, individual efforts make little or no impression but in the subsets the effects may be massive. It is for instance Siobhan working on New Zealand butterflies and other critters. Siobhan writes Wikipedia articles as well strengthening the ties that bind Wikidata to Wikipedia. Her efforts have been noticed and Wikidata is becoming increasingly relevant to and used by entomologists.

There are many data sets, because of its wiki links every Wikipedia is one as well. The notion that one is bigger or better does not really matter. It is all in the interoperability, it is all in the usability of the data. Wikipedia wiki links are highly functional and not interoperable at all. More and more Wikipedias accept that cooperation will get them better quality information for its readers. Once the biggest accept data as a resource to curate the shared data the act of comparing data sets is improved quality for all.
Thanks,
      GerardM

Saturday, September 07, 2019

@Wikidata - #Quality is in the network

What amounts to quality is a recurring and controversial subject. For me quality is not so much in the individual statements for a particular Wikidata item, it is in how it links to other items.

As always, there has to be a point to it. You may want to write Wikipedia articles about chemists, artists, award winners. You may want to write to make the gender gap less in your face but who to write about?

Typically connecting to small subsets is best. However we want to know about the distribution of genders so it is very relevant to add a gender. Statistically it makes no difference in the big picture but for subsets like: the co-authors of a scientist or a profession, an award, additional data helps understand how the gender gap manifests itself.

The inflation of "professions" like "researcher" is such that it is no longer distinctive, at most it helps with the disambiguation from for instance soccer stars. When a more precise profession is known like "chemist" or "astronomer", all subclasses of researcher, it is best to remove researcher as it is implied.

Lists like members of "Young Academy of Scotland", have their value when they link as widely as possible. Considering only Wikidata misses the point, it is particularly the links to the organisations, the authorities (ORCiD, Google Scholar, VIAF) but also Twitter like for this psychologist. We may have links to all of them, the papers, the co-authors. But do we provide quality when people do not go into the rabbit hole?
Thanks,
      GerardM

Sunday, January 27, 2019

@Wikidata #quality - one example: Leonardo Quisumbing

Quality happens on many levels. Judge Leonardo Quisumbing passed away and a lot of well meant effort went into his Wikidata item.  The data is inconsistent with our current practice so in the Wikidata chat people were asked to help fix the data.

Judge Quisumbing held many positions, one of them was "Secretary of Labor and Employment". This is a cabinet position and it follows that Mr Quisumbing was also a "politician". It is one thing to include this position and occupation to a person, from a quality point of view it is best to include a "start date" a "replaces" an "end date" and a "replaced by". The problem: the predecessor and successor do not exist in Wikidata.

Many a secretary of Labor do have a Wikipedia article and they are included in a category. Using the "Petscan" tool it is easy to import all those mentioned. Typically the quality of the info is good however there is always the "six percent" error rate. Indeed one person was erroneously indicated as a "secretary of labor".  The problem is that people who only care about quality on the item level are really hostile to such imported issues. They are best ignored for their ignorance/arrogance.

A next level of quality is to complete the list with all missing secretaries. This can be done warts and all from the Wikipedia article. It results in a Reasonator page that includes all the red and black links of the article. Many new items are created in the process and having automated descriptions are vital in finding as many matches as possible.

Judge Quisumbing became an "Associate Justice of the Supreme Court of the Philippines" and became the senior associate justice in 2007. Adding associate judges from a category was obvious, adding senior associated judges is a task similar to secretaries of labor. However, a senior is the first among the many and consequently it requires a judgment call on how to express this.

Given that Wikidata is a wiki, you do the best you can to the level that has your interest. There is still a need to improve the Wikidata item for judge Quisumbing but that is for someone else.

Thanks,
       GerardM

Sunday, January 20, 2019

@Wikidata - #Quality in a #Wiki environment

What quality is, quality in a data environment has been studied often enough. Lots of words are spend about it but one notion is always left out. What is data quality in a Wiki environment. How does that translate to Wikidata.

First of all; Wikidata serves many purposes. The initial purpose of Wikidata was to replace the in-article "interwiki" links. They were notoriously difficult to maintain, often wrong. A single Wikidata item replaced the links for a subject in all Wikipedias and this brought stability and a high level of confidence in the result. Over time the quality of the "interwiki' links went down; there are fewer people involved adding and curating these links and it is seen as a quality issue when new items are generated for new articles; they do not have statements and are often not linked. There have been protests against these new additions.

A second purpose is the use of Wikidata statements in Wikipedia templates. Assessing data quality becomes complicated as there are micro, mesa and macro levels of quality at play. The micro level: is sufficient data available for one template in one Wikipedia article. The mesa level: is sufficient data available for one template in Wikipedia articles on the same topic. The macro level: is the same data available for all interested Wikipedias and do we have the required labels in those languages.

Quality considerations are driven by this approach. On a micro level you want all awards for a scientist to be linked on an item. On the mesa level you want all recipients of an award to be linked to their item. On the macro level you want all awards to have labels in the language of a Wikipedia and have all local considerations been met.

Standard quality considerations in a Wiki environment are not helpful; they are judgemental. People contribute to Wikidata and all have their own purposes. A Wiki is a work in progress and when quality assessments are to be performed, the question should focus on the extend a specific function is supported. What people seek in support also changes; as long as there was no article for professor Angela Byars-Winston it was fine only to know about her for one publication. Now that Jess Wade picked her for an article, it may be relevant that she is the first and so far only person known to Wikidata who was a "champion of change" and that more papers are identified for her.

Wikidata includes many references to scientific papers and authors. However, so far it serves no purpose. Allegedly there is a process underway that imports papers used as citations in the Wikipedias but it is not clear what papers are used in what Wikipedia article. So far it is a big stamp collection, a collection with a rapidly growing quality. A collection that highlights authors who are open about their work and who share the details of their work at ORCID. In effect, this data set indicates that the relevance of a scientist improves by being open.

Wikidata invites people to add/curate the data that is of interest to them. Particularly the esoteric data, data about subjects like African geography, Islamic history need a lot of tender loving care. It is where Wikidata and the large Wikipedias are weak. For as long as Wikidata is largely defined by the large Wikipedias it will reflect the same biases and these biases will be hard to assess and curate.
Thanks,
      GerardM

Monday, August 20, 2018

#Wikidata - Never heard of "R Andrew Moore"?

When quality in scientific papers, papers that are to be used as sources in Wikipedia, is important, it is relevant to include the papers published by the Cochrane Database of Systematic Reviews.

For one author, R Andrew Moore, I invoked the QuickStatementsBot and he added Mr Moore, added some 131 links to publications. It was a subset of what could be done but hey I already felt I was living dangerously.

When you then have Scholia look at Mr Moore for missing information .. there is a lot. But there is so much information.  In the diagram you see his co-author graph, and you will also find that Mr Moore published 123 times for Cochrane.

There are many publications and arguably we may want them all when we are to share the sum of all knowledge. Not all publications are equal and  Cochrane is special because it reviews what is published elsewhere; their bottom line is not commercial, their motto is Trusted evidence. Informed decisions. Better health. It is what aligns best with what we aim to do.

As I learn how to add authors who published for Cochrane, I will do maybe one a day. When other people take an interest, slowly but surely meta data on research gains relevance in Wikidata and with a little help of our Wikipedia friends we will provide better information.
Thanks,
      GerardM

Monday, May 14, 2018

#AfricaGap - #Wikidata; its quality as Wikidata matures

Currently there are 45 countries that I monitor for their national politicians. When I add a specific national "position", I do several things; I add existing politicians that are known in a particular category and I include a definition of what that category contains.

I give hardly any attention to details; my objective here is simple I want to see how this (underdeveloped) data evolves. There is a huge gap in what we know about Africa and as it is, we hardly inform about Africa, we need Africans to help us gain the most basic facts straight for ourselves.

As Wikidata matures, we gain subsets of data that is of varying quality. The most mature living data are our interwiki links. It is live data and it serves a purpose. Changes require attention to detail it has an immediate effect in the discoverability of information. When data comes alive, when it serves a purpose, it has people who will invest their time to get the data right. They will give attention to detail because that serves their purpose.

For arcane subjects like the Ottoman Empire, even Africa, there are few people who find a purpose in the data. Arguably there is so little data that almost everything added is a 100% gain in quality (a person exists, he is a member of parliament of ***, I do not understand African names so it could be male or female I do not know). Sometimes there are whole lists of people like these people from the Bosnian Eyalet, it is easy enough to complete such a list. But will it serve a purpose? How to give it a purpose?

There is no uniform quality to Wikidata. There are whole areas where we are 100% of the mark as we do not have the data nor the ability to link to data elsewhere. There are other areas like in biomedical literature where our quality is such that it is actually useful. As this becomes known thanks to its evangelists, more attention is given by a wider public and more attention to detail is given in the process.

Arguably the quality of subsets of our data depends on its usefulness. When it is useful, people will come and give the attention to detail as it serves their purpose.
Thanks,
      GerardM

Sunday, February 11, 2018

#Wikidata - William Gorges, first colonial governor of the Province of Maine

Mr Gorges was born in Britain, he died in Britain. He was tasked to oversee investments for two years by a nephew and as a result he was the first colonial governor of the province of Maine. Consequently he is said to be a citizen of the USA, (he died in February 1658)..

The problem with nationality and citizenship is that we tend to adopt people as belonging to something that did not exist at the time and consequently it is a falsehood. It is the same with all these generals, governors of the confederacy; they did not identify with the United States of America, they had their own state they swore allegiance to, so why call them citizen of the USA? How dare we?

It is the same for people from Wales, Scotland, Northern Ireland. They may have opposed the Brits but from a nationality point of view, their behaviour was judged by the British laws.

Associating people with states / nations that may not even have existed at the time are false facts, pure and simple.
Thanks,
       GerardM

Friday, February 09, 2018

#Wikimedia and #Cochrane - sharing resources and sharing results

Jane Falconer, a medical librarian, wrote a real interesting blog post. She writes about the importance of reliable information to front line health professionals and stresses the importance of systematic reviews that are conducted according to recognised and tested methodologies.

The big problem: what to include in the systematic review, and what to exclude in projects like are PRISMA and Cochrane. This is the same problem we face when we seek sources for Wikipedia articles and, the Wikimedia solution to provide sources is the "The Wikipedia Library Card Platform".

Cochrane and the Wikimedia Foundation are partners and one scenario I can see is one where this partnership is intensified. When Cochrane shares its results with Wikidata, they can have all the data of Wikidata anyway the quality and the relevance of the Wikidata data improves. When Cochrane volunteers may share the Library Card Platform, it would bring a major contribution to the volunteers at Cochrane. The relevance of the data at Wikidata will improve substantially. This in turn will help us verify the content of medical information and the quality of the sources in all our Wikipedias.
Thanks,
     GerardM

Sunday, December 10, 2017

When #Wikidata is good for something

When #Wikidata is good for something, it shines. It does not take much prodding to find people to improve on what it does so well and consequently when Wikidata is useful, quality follows easily.

The promise of  a useful Wikidata was delivered at its start by having it replace the native interwiki links of Wikipedia. Within a month the quality of Wikipedia links had improved dramatically and at this time corner cases are still worked improving quality even more.

The WikiCite project is really important in many respects and it has so much more to offer. It is useful because it brings many initiatives and projects together under one roof. It is why scientific papers are included, including its authors. We find that more and more authors are included as well and they are often linked to the ORCID, VIAF and other external identifiers of this world. This has great value because it allows Wikipedia articles and information maintained elsewhere to be linked. What it can be used for is limitless. End users will find new and interesting ways to use the data and make it into information.

When Wikidata is to be good for Wikimedia projects, this information brought to Wikidata because of WikiCite has great potential. It largely reflects the citations in all the Wikipedias and consequently through linked so external sources we could know what sources are problematic, retracted or bought by interested parties. We could, we don't. When we did, we would provide weight against propaganda and fake news.

The big thing holding us back is trust. Wikipedians need to consider a Wikidata that is not only used for links and that can be trusted for high level maintenance of its citations. Wikidata is to appreciate its use and trust that its information will be used and that this will increase its value and quality. WikiCiters have to understand that Wikidata is not a stamp collection only including publication data. It must include information about retractions, about papers considered problematic for political or scientific reasons (or both).

When Wikidata is to be good for something; we should expand our collaboration with Cochrane, Retraction Watch and organisations like it. There is everything to gain; quality, contributors and relevance.
Thanks,
      GerardM

Sunday, November 19, 2017

#Wikidata vs #Wikipedia - Rukmini Maria Callimachi

Mrs Callimachi did not only win the Polk Award, she is both a journalist and a poet and did not only win journalism awards. One of the awards, the Michael Kelly Award is hidden on the Wikipedia article of Michael Kelly

This article is about how Wikidata and English Wikipedia can help each other. The Wikipedia article lists seven awards and this makes it easy to add other award winners for them as well.

Thanks to Magnus' awarder, this is fairly easy but some awards hide out as part of an article and the award has to be added in Wikidata.  It may be one reason why later awards are missing. The religious award she is said to have won, it is a different award with a similar name. The award and the organisation that confers it had to be created.

The point, we can compare data at a Wikipedia with what we have on Wikidata. They should match. When they do not, there is an issue. Copying the data from Wikipedia is easy and it is the obvious thing to do. When Wikipedians decry the quality of Wikidata, they should reflect on why this is the case. When we collaborate, we will slowly but surely improve our quality. In the final analysis our aim is the same; share in the sum of all knowledge.
Thanks,
      GerardM

Saturday, November 18, 2017

#Wikipedia vs #Wikidata - the George Polk Awards

Some Wikipedians consider Wikidata inferior, so much so that they agitate towards a policy that bans Wikidata in "their" Wikipedia. They are welcome to their opinion.

I do bulk imports from Wikipedia and all the time I suffer the consequences. Some three to four percent of their data is wrong for all kinds of reasons, reasons that are manageable with proper tooling.

The George Polk Award is an award for journalism and it got my attention again because the International Consortium of Investigative Journalists received it for their work on the Panama Papers. I noticed that many people listed who had been awarded the Polk Award did not have articles in Wikipedia, that many of the link in the list of award winners pointed to the wrong person and that many award winners did not even have a "red link".

I am in the process of checking all the links and adding the date for the award. I found many issues among them a civil war general and many others false friends. I am adding items for the people who do not have an English article and, I have to check each of them because several do have articles in other languages. It is a lot of work and it is not as useful as it could be because Wikipedia hates Wikidata and we do not collaborate, we do not work together.

There is a Listeria list of winners and slowly but surely it will contains the information that is similar to the English Wikipedia list article. Similar but not the same;
  • the false friends will not be there, 
  • there will be no red or black links
  • people who won the award twice will be missing
Why do this, why spend so much time on one big list? Well, in this day and age of "fake news" we should celebrate journalism but having all this information in Wikidata allows for all kinds of tools as well. We can check for false friends, we can check if the articles on the award winners include the award but also if there are "winners" who are not known in this list and in the source available for the George Polk winners..

I am not a Wikipedian and truthfully I hate the endless and senseless bickering that is going on. So let me work on the data, make it available to tools. Now you Wikipedians, you may choose not to show Wikidata data in your infoboxes but you will not make your errors go away without collaboration. Yes, you can quote a source but when your data is not in line with what the source states, having a source does not do you good, effectively you provide fake information.

My request to the reasonable people at Wikipedia and Wikidata, let us work together and see how we can improve quality. Lets link wiki links (blue, red and black) to Wikidata and improve the quality of what is on offer first.
Thanks,
       GerardM

Wednesday, November 08, 2017

#Wikidata as a Wiki versus the data consumers’ perspective

Wikidata is a Wiki. It follows that many people with many agenda's add data to Wikidata. It is a continuous process and as is usual in a Wiki, all contributions that fit the notability requirements of the project are welcome.

The consumers' perspective seen from a Wiki point of view is a bit awkward. There is nothing but active contributors that work towards any of the quality considerations. Even when there is a reasonable quality for some, it may not be enough for others.

Both Wikipedia and Wikidata are Wikis. Both have issues from a consumers' perspective. They are already explicitly integrated through the interwiki links and implicitly through the Wiki links. One of Magnus's tools makes this visible.

When you then consider George Polk and the George Polk Award it becomes obvious that Wikis have an issue from a data consumer's perspective. In some Wikipedia articles the two are conflated. In others there is a separate list of award winners. Many of the award winners do not have an article and some of the award winners refer to the wrong person. Wikidata could do with more data; the data was imported from Wikipedia and several of the wrong persons are still wrong in Wikidata.

Both Wikipedia and Wikidata consume each others data. Both are Wikis. There is no superiority in either project but they could compare their data and curate the differences.
Thanks,
      GerardM