Skip to main content
Intended for healthcare professionals

Abstract

Abstract

This study is a systematic examination of the open access status of research in two flagship language testing and assessment journals: Language Testing and Language Assessment Quarterly. Coding and analysing 898 articles, we investigated (a) the prevalence of open access in four aspects—open manuscripts, open materials, open data, and open code, and (b) the relationship between open access and various characteristics of research, tests, and researchers. Our study revealed a positive trend in the adoption of open access over time, with open manuscripts and materials showing notable increases. Open code and data have remained scarce, though with a recent uptick from a low base. Notably, logistic regression results suggest inequitable participation in open access as authors from the Global South were less likely to have open manuscripts. Recognising the potential role of flagship journals as trend and standard setters, we call on the field to (a) shift towards more equitable open access models, (b) balance intellectual property concerns with validation needs, (c) recognise open code and open data with protected access via dedicated badges, and (d) adopt Research Transparency Statements, a new reporting structure inclusive of methodological and epistemological differences in open research practices.

语言测试与评估领域的开放获取:以旗舰期刊为例

本研究系统地考察了两本语言测试与评估领域的旗舰期刊 Language TestingLanguage Assessment Quarterly 中研究的开放获取状况。通过对898篇文章的编码和分析,我们调查了:1)开放获取在研究手稿、材料、数据和代码四个方面的普及程度;2)开放获取与一系列研究特征、测试特征和研究者特征之间的关系。研究结果显示开放获取随时间呈上升趋势,其中开放手稿和材料有明显增长。开放代码和数据一直呈现稀缺状态,尽管近期有所上升。值得注意的是,逻辑回归分析结果表明,开放获取的参与存在不平等现象,来自“全球南方”的作者分享开放手稿的概率更低。考虑到旗舰期刊作为趋势和标准制定者的角色,我们呼吁语言测试与评估领域:1)转向更加公平的开放获取模式;2)在知识产权保护与研究验证需求之间取得平衡;3)通过专门的徽章认可开放代码和受保护访问的开放数据;4)采用一种更加包容方法论和认识论差异的新报告结构——研究透明度声明。

الوصول المفتوح في اختبار اللغة وتقييمها: حالة مجلتين رئيسيتين

هذه الدراسة عبارة عن فحص منهجي لحالة الوصول المفتوح للأبحاث في مجلتين رئيسيتين لاختبار وتقييم اللغة: اختبار اللغة وتقييم اللغة ربع سنوي. قمنا بترميز وتحليل 898 مقالة، حيث قمنا بالتحقق من أ) انتشار الوصول المفتوح في أربعة جوانب - المخطوطات المفتوحة، والمواد المفتوحة، والبيانات المفتوحة، والنص البرمجي المفتوح، و ب) العلاقة بين الوصول المفتوح والخصائص المختلفة للبحث والاختبارات والباحثين. تكشف دراستنا عن اتجاه إيجابي في تبني الوصول المفتوح بمرور الوقت، حيث تظهر المخطوطات والمواد المفتوحة زيادة ملحوظة. ظلت النصوص البرمجية والبيانات المفتوحة نادرة، على الرغم من الارتفاع الأخير من قاعدة منخفضة. ومن الجدير بالذكر أن نتائج الانحدار اللوجستي تشير إلى مشاركة غير عادلة في الوصول المفتوح حيث إن المؤلفين من الجنوب العالمي كانوا أقل احتمالاً أن يكون لديهم مخطوطات مفتوحة. إدراكًا للدور المحتمل للمجلات الرائدة كمحددين للاتجاهات والمعايير، فإننا ندعو المختصين إلى أ) التحول نحو نماذج الوصول المفتوح الأكثر إنصافًا، ب) الموازنة بين مخاوف الملكية الفكرية ومتطلبات التحقق، ج) الاعتراف بالنص البرمجي المفتوح والبيانات المفتوحة مع الوصول المحمي عبر شارات مخصصة، و د) اعتماد “بيانات الشفافية البحثية”، وهو هيكل مقترح لإعداد التقارير يشمل الاختلافات المنهجية والمعرفية في ممارسات البحث المفتوحة.

Introduction

Recent years have witnessed a burst of interest in open science within applied linguistics, with many calling for a shift towards more transparent and robust research practices. Broadly speaking, open science (also known as open research or open scholarship) is an umbrella term that refers to various movements and practices aiming to make our research and related scholarly activities more open, transparent and equitable (UNESCO, 2021). This has taken shape through various developments such as the creation of open repositories like Tromsø Repository of Language and Linguistics (UiT The Arctic University of Norway, n.d.) and IRIS (Marsden & Mackey, 2014), and initiatives like OASIS (Alferink & Marsden, 2023), TESOLgraphics (Chong & Sato, n.d.), the PostPrint Pledge (Al-Hoorie & Hiver, 2023), and the AILA research network “Open Applied Linguistics” (Liu et al., 2023). The mainstreaming of open science within applied linguistics is also evidenced by its growing presence in academic discourse, such as the edited volume Open Science in Applied Linguistics (Plonsky, 2024), and special issues in leading journals (e.g., Language Testing, Language Learning, and Studies in Second Language Acquisition).
Despite these developments, attention to different aspects of open science is unevenly distributed: While there have been increasing discussions on replications (e.g., Marsden et al., 2018; McManus, 2022; Porte & McManus, 2019), and meta-research (Al-Hoorie & Hiver, 2024) on data handling and reporting (Isbell et al., 2022), open access has received less attention, both in applied linguistics (but see Bolibaugh et al., 2021; Marsden & Plonsky, 2018, on open data) and in language testing and assessment (henceforth language testing) in particular. Nonetheless, one of the most important goals of the open science movement is making research openly available, including the manuscript, the materials and data associated with it and the code used to analyse the data.
Language testing, as a research field with substantial commercial and industry presence, is faced with unique challenges relating to open access, relative to other fields in applied linguistics (Burton, 2023; Isbell & Kim, 2023). On one hand, commercial interests, legal considerations, and test security concerns influence and sometimes prohibit data and materials sharing; on the other hand, public transparency and accountability are particularly needed in language testing, especially when it comes to high-stakes tests and related validation research. In high-stakes contexts such as university admissions and immigration, for instance, the fairness and justice of language assessments should be subjected to public scrutiny (Kunnan, 2014, 2018), which is not possible if the research is not transparent or accessible. Recognising this need, Language Testing has worked actively to promote initiatives such as Registered Reports (Harding & Winke, 2022) to enhance transparency and reduce research waste (Isaacs & Chalmers, 2023), and normalising conflict-of-interest reporting (Isaacs & Winke, 2024).
While language testing researchers have long been concerned with the reliability and validity of tests and their uses (e.g., Chapelle, 2020; Douglas, 2014), it is equally important to critically examine the reliability and validity of language testing research. In this sense, open access is of particular relevance to the field of language testing as it plays a key role in enabling wider scrutiny and independent verification of language testing research. Exploring the status of open access in language testing research is important because it can provide insights on what factors might influence the uptake of open access and where future support might be needed.

Literature review

Open access

Open access (OA) aims to make research freely available online, eliminating most access and copyright restrictions, while still respecting the need to attribute the work to its original author (De Silva & Vance, 2017; Suber, 2012). Over the years, multiple OA publishing models have been developed, ranging from the for-profit gold OA to the free diamond/platinum OA. Specific definitions of each publishing model can be found in Table 1.
Table 1. OA models.
OA modelDefinition
Green OAAuthors self-archive their work (accepted or earlier versions) in personal or institutional repositories, making it accessible without cost.
Bronze OAArticles are free to read on the publisher’s website without a clear licence for reuse, typically without any publication fees.
Hybrid OASubscription-based journals allow individual articles to be openly accessible if the author pays a publication fee.
Gold OAPublishers make articles freely available on their website immediately upon publication, often funded by article processing charges (APCs) paid by authors or sponsors (also referred to as the “pay-to-publish” or “author-pays” model).
Diamond/Platinum OAJournals provide immediate access to articles without charging authors or readers.
The OA movement originally emerged to counteract the high costs of subscription fees by major publishers and has expanded significantly over the past few decades (Pinfield et al., 2020), as evidenced by the surge of OA journals and institutional and discipline-specific repositories (e.g., arXiv). This expansion has been propelled by policies from research funders worldwide. Examples include mandates from organisations like the National Institutes of Health in the United States, the Horizon 2020 programme (European Commission, 2013), initiatives such as Plan S (coalition S, n.d.), and US government’s recent policy move to require federal agencies to make taxpayer-funded research freely available by 2025 (Brainard & Kaiser, 2022). According to meta-research on OA policies (Huang et al., 2020), UK policy and funding changes in 2012 have led to an increase in gold or hybrid OA, with a similar boost in green OA around 2015 tied to eligibility requirements for the 2021 Research Excellence Framework. All these examples point to the growth of incentives and influences on OA adoption.
While traditionally with a primary focus on the unrestricted availability of research manuscripts, in recent years OA has been more widely accepted as a subcomponent of the broader open science movement that extends beyond the end-product of the research process to include open materials, open data, and open code. With these additional dimensions, OA contributes to a more open and transparent science where knowledge is not only shared but also verifiable, reproducible, and cumulative.

OA in the field of language testing

In our examination of OA in language testing, we also adopt this broadened view of OA to encompass four aspects: open manuscripts, open materials, open code, and open data (see Table 2 for definitions). Each of these aspects is closely connected with open science values and principles such as equity, transparency, and reproducibility (UNESCO, 2021). Acknowledging that these issues are all interconnected, in the following sections, we review each aspect in the context of language testing, in relation to values, benefits, and challenges that are most pronounced in that aspect.
Table 2. OA aspects examined in this study.
OA aspectsDefinition
Open ManuscriptsWhether an article is OA (regardless of the specific type of OA)
Open MaterialsWhether an article contains supplementary materials (e.g., test items, survey scales, and interview protocol)
Open CodeWhether an article has analytical code (e.g., R scripts) being openly available
Open DataWhether an article has associated data being openly available (e.g., uploaded to open repositories such as IRIS and Open Science Framework)

Open manuscripts: Equity and accessibility

In the context of language testing, OA has the potential to play a crucial role in bridging the divide between the Global North and Global South. A core principle of OA is equal access to knowledge, as publicly stated at the outset of the OA movement (e.g., the Budapest Open Access Initiative). In addition, OA can also contribute to the enhancement of language assessment literacy, which allows diverse stakeholders to engage with research findings and supports tailored training and awareness raising (Harding & Kremmel, 2016; Kremmel & Harding, 2020).
Despite these potential benefits, OA faces challenges that could inadvertently widen the gap it seeks to close. The gold OA model, which often requires authors to pay APCs, has led to concerns about the emergence of a “pay-to-say” barrier, restricting the ability of less affluent researchers to publish their work (Šimukovič, 2018, p. 1). This has prompted criticism that OA may perpetuate a form of neo-colonialism if it primarily improves access to Northern science while marginalising research from the Global South (Piron, 2018). The reliance on deals between well-funded research institutions and for-profit publishers, or the individual researcher’s own funding, places researchers in under-resourced institutions at a disadvantage. Flagship journals in language testing, such as Language Testing and Language Assessment Quarterly, operate under the hybrid model. Another specialist journal, Language Testing in Asia, adopts the gold OA model (though with discretionary waivers up to 100% of APCs for authors based in low-income countries).
While flagship journals in the field still operate under commercialised OA models, there is a growing recognition of the need to support more inclusive and equitable access to research—through the adoption of the diamond/platinum OA model (Andringa et al., 2024). Prime examples of diamond/platinum OA journals include Studies in Language Assessment (formerly Papers in Language Testing and Assessment, which has been diamond OA since 2006), Language Education & Assessment (since 2018), and International Journal of Language Testing (since 2011). These journals represent an important step towards mitigating the risks associated with OA for marginalised communities and scholars from the Global South (Chan et al., 2020). In addition to these initiatives at the level of individual journals, it is important to note that authors can also contribute to equitable access of their own research by adopting the green OA model—sharing the accepted versions of author manuscripts on open repositories such as IRIS, Open Science Framework (OSF) and their own websites. Indeed, calls for authors to share their published manuscripts have already been reiterated in applied linguistics to promote more equitable scholarship (Al-Hoorie & Hiver, 2023).

Open materials and code: Transparency and reproducibility

In the context of language testing, the imperative for methodological transparency is especially critical due to the complexity of the constructs being measured and the implications for educational policy and individual outcomes (e.g., Burton, 2023). Depending on the specific research design, research materials can take various forms, ranging from questionnaires and test questions in quantitative research to interview protocols in qualitative research. Making these materials openly available enables the methods and outcomes of language testing research to be independently verified (e.g., through replication studies) or scrutinised by researchers or the public.
The reliability and validity of research findings are increasingly recognised as crucial in the broader field of applied linguistics (e.g., Liu & Marsden, 2024) and subfields such as TESOL (Al-Hoorie et al., 2024). Notably, a recent special issue in Language Learning showcasing replications and registered replication reports (Godfroid & Andringa, 2023) found none of the replication studies provided full support for the original findings. This highlights the urgent need for a deeper understanding of the replicability of existing research in the field. Methodological transparency via open materials and code becomes even more important as such openness is crucial not only for attempting replication but also for understanding the reasons behind divergent results (cf. Hiver & Nagle, 2024).
Journal policies to enhance research transparency are increasingly evident, with Journal of Child Language requiring full information about methods and study materials since 2018, and Language Learning and Applied Psycholinguistics adopting similar requirements in 2020 and 2022, respectively. In the field of language testing, Language Testing has taken the lead in supporting replication studies and incentivising open science practices through Open Science Badges (Harding & Winke, 2020).
In contrast to the more established tradition of sharing research materials, the emphasis on open code—essential for the verification of data analyses—represents a more recent innovation in open science practices in the field that is not yet widely recognised with a dedicated badge. Open code is underpinned by the idea of making the research (and analytical) process transparent and shareable, which currently often takes the form of code scripts (e.g., R or Python script). While still uncommon, open code extends beyond quantitative scripts to encompass transparent and sharable methodologies in qualitative research as well. It includes any documented processes such as codebooks, thematic frameworks, or analytic memos that elucidate how data are interpreted and analysed, a point which we return to in section “Discussion.” A prime example of open data and code policy is the one issued by the Journal of Memory and Language in 2019, which requires code and data be released upon publication. Empirically examining the effects of this policy, Laurinavichyute and colleagues (2022) conducted meta-research and found positive impacts on research transparency and reproducibility. Specifically, it was found that the presence of the analysis code was the strongest predictor of successful reproduction of published results using the same data, increasing the probability by almost 40%.
Despite the above illustrated potential open materials and code hold for the field of language testing, the status quo in language testing research still presents uncertainties. For instance, in Language Testing, the exact number of authors who have shared materials or data remains unknown, and only a handful of papers have received recognition for doing so since the introduction of Open Science Badges (which currently does not include open code as mentioned above) in 2020 (Burton, 2023).

Open data: Ethical, practical, and methodological challenges

Among all aspects of OA, open data is one of the most controversial. Various forms of data might be collected, generated, or used in the research process, ranging from responses to test questions to interview transcripts. In language testing, perhaps more so than other subfields of applied linguistics due to its salient industry presence, there is the tension between the free flow of scientific knowledge and ethical and commercial considerations. The FAIR guidelines for open data encapsulate the ideal of data being Findable, Accessible, Interoperable, and Reusable, promoting a long-term vision of cumulative knowledge building (Wilkinson et al., 2016). More and more funders (e.g., National Institutes of Health in the United States) are now mandating not only the publication of OA articles but also the corresponding data sets (Kozlov, 2022). Similarly, journal policies have played a positive role in promoting open data, and more journals in the field of applied linguistics, as reviewed above, now implement policies that promote open science practices (section “Open materials and code: transparency and reproducibility”). Some companies, such as Educational Testing Service, also offer restricted access to their test data where interested researchers submit a formal request to be reviewed by the respective company.
Despite increasingly clear directives for open data, the move towards more data sharing remains slow. The ethical dimensions of open data are multifaceted, encompassing the need for transparency, participant privacy, and the integrity of the research process. Concerns over personal ownership, fear of intellectual property theft, lack of relevant knowledge, skills, and time, and commercial sensitivity hinder the sharing of research materials including data (Liu & De Cat, 2024; Marsden & Morgan-Short, 2023). Commercialisation of research via university–industry collaboration, which is not uncommon in language testing research, may prohibit the sharing of proprietary materials and (operational) test data (see Isbell & Kim, 2023, for a recent review of developer involvement in language testing). For instance, a research team at a university may receive funding from a test developer (e.g., Cambridge Assessment English, Duolingo, Educational Testing Service, and Pearson) to conduct research on a large-scale standardised exam and the proprietary materials and data would typically be kept confidential. Test security, a concern specific to language assessment, may also take priority over full transparency in many contexts (Hughes & Porter, 1984). If a testing company’s intellectual property is compromised or leaked, the value of the test itself becomes lower and risks harming test takers or test score users. In addition, testers also have ethical responsibilities to keep test content secure (e.g., code of ethics by the International Language Testing Association).
Methodological concerns further complicate the issue. While the discourse on open qualitative data in language testing research (or applied linguistics) is still nascent, many concerns raised within other fields about the sharing of open data resonate with potential issues that the field may encounter. Interviews exploring test-takers’ cognitive processes or perceptions yield rich narratives that are inseparable from the specific research context, a nuance likely to be diminished once decontextualised for open sharing (Creswell & Poth, 2018). Furthermore, the microscopic detail that makes qualitative data so informative also poses re-identification risks, even when anonymised—such risks are amplified in smaller research communities where participants may be more easily recognisable (Elman et al., 2010). From a methodological standpoint, the potential behaviour modification of participants, knowing their data will be openly shared, may skew insights into their cognitive processes and pose a threat to the authenticity of findings in language testing research.
Even in cases when these barriers have been overcome, the usability of data remains less than satisfactory—open data sets are often accompanied by insufficient documentation which makes reanalysis and reuse very challenging, thereby prompting the need to have research materials and analysis code alongside the data (Laurinavichyute et al., 2022). In addition, the sustainability of repositories for open materials is a concern, often dependent on inconsistent funding or individual efforts (Liu & Marsden, 2024; Marsden & Morgan-Short, 2023).

The present study

Our literature review shows the potential of OA for enhancing the transparency of language testing research. However, it also reveals controversies and challenges in each aspect of OA. Before meaningful policies and initiatives can be implemented to promote OA, it is crucial to first develop a systematic understanding of the OA practices and trends in the field.
As an initial step, it is informative to examine the flagship journals in language testing. Albeit a small subset of the field, flagship journals such as Language Testing (LT) and Language Assessment Quarterly (LAQ) play a crucial role as standard and trend setters. Their policies and practices can have significant influence in the field. Currently, both LT and LAQ adopt the hybrid OA model, with APCs being US$3,700 for LT and US$3,300 for LAQ. LT offers Open Science Badges, accepts Registered Reports, and promotes broader engagement through initiatives such as multilingual abstracts (e.g., Harding & Winke, 2022; Isaacs & Winke, 2024; Winke, 2023). LAQ requires a data availability statement in line with Taylor and Francis’ policies and also supports multilingual abstracts. For more details on the journal policies, readers can refer to each journal’s website and our supplementary materials in OSF.
Specifically, we focused on four aspects of OA—open manuscripts, open materials, open data, and open code. We also explored the associations between OA and various factors that could potentially influence its implementation, including research characteristics (article type, year of publication, number of authors, funding), test characteristics (test type, target language), and researcher characteristics (author affiliation, socioeconomic division). This targeted approach aimed to provide a detailed understanding of the extent to which OA practices have been embraced in leading journals and how various factors predict their adoption.
Our research questions were as follows:
Research Question 1 (RQ1). To what extent is the research in the two flagship journals openly accessible (in terms of open manuscripts, open materials, open data, and open code) and has the pattern changed over time?
Research Question 2 (RQ2). To what extent are research characteristics (article type, year of publication, number of authors, funding), test characteristics (test type, target language), and researcher characteristics (author affiliation, socioeconomic division) associated with OA (i.e., open manuscripts, open materials, open data, and open code) in the two flagship journals?

Method

All of our data processing and statistical analysis was performed with R 4.3.2 (R Core Team, 2023). To maximise methodological transparency and reproducibility, we uploaded our data set, code book and R code to IRIS and OSF (Liu et al., 2024), with details concerning all sections described below.

Data acquisition

We chose Web of Science (WoS) as our data source for its comprehensive indexing of metadata to facilitate subsequent analysis (Figure 1). Both LT and LAQ were indexed from 2008. Data retrieval was conducted on 16 October 2022 and thereby articles published after this date in 2022 were not included as part of our data set. We excluded two types of articles, namely, “biographical item” and “correction,” which resulted in a final data set of 898 records (LT = 491, LAQ = 407). We additionally created a secondary data set composed of articles from Language Testing in Asia (LTA; k = 112), a gold OA journal, as we were interested in comparing different publishing models. As this journal was only indexed in the Web of Science from 2019, all analyses involving this data set were treated as secondary. Aside from a substantial difference in open manuscripts (100% of all publications in LTA compared to 24.3% in the two hybrid OA journals in our secondary data set), the results of the secondary analysis revealed no significant differences in the prevalence of the other OA practices between the two flagship journals and LTA within the examined time period (2019–2022). The specific results can be found in the supplementary materials.
Figure 1. PRISMA diagram adapted from Page et al. (2021).
*Language Testing in Asia is a gold Open Access journal that was only indexed in the Web of Science from 2019. All analyses with this data set are therefore secondary analyses and are reported for interested readers in our supplementary materials.

Data coding and reliability checks

For each included article, we extracted and/or manually coded the metadata listed in Table 3, all of which we believe might potentially influence OA. While most of our coded factors were what one might consider “the usual suspects,” we also included the number of authors based on the possibility that larger author teams may navigate OA differently from single authors. We also included target language to explore how the language focus of high-stakes language tests, predominantly English, might exhibit distinct OA patterns in comparison to tests in other languages. Most of our codes were straightforward operationalisations of the corresponding variables. For variables with potentially ambiguous coding rules (e.g., commercial vs. non-commercial tests), we consulted experts with established standing in the field and editorial roles at major language testing journals to follow the conventions of the field.
Table 3. Coding scheme.
VariablesCodesNotes
OAOpen manuscriptsY, NAll OA categories (e.g., green, bronze, gold) as recorded in WoS metadata were aggregated as Y.
 Open codeY, NFull text was examined to code this. If links are provided to secondary materials on the publisher’s website or other repositories (e.g., IRIS, OSF), the content was manually inspected too.
 Open dataY, NSame as above.
 Open materialsY, NSame as above.
Research characteristicsArticle typeempirical, non-empiricalThe codes differentiate purely conceptual, theoretical, or methodological articles (coded as “non-empirical”) from applied, substantive, and other studies e.g., those that investigate RQs with primary or secondary data in any form collected and reported (coded as “empirical”). Meta-analysis, narrative review and systematic review were coded as “empirical” too.
 Year of publicationinteger ranging from 2008 to 2023Early access articles as recorded by WoS metadata were also coded as 2022.
 Number of authorsintegerAuthor names were examined to extract the count of authors for each article.
 Fundingacademic, industry, mixed, unfundedFunding agencies were extracted and manually coded. University and government funding was coded as “academic”, and all other types of funding was coded as “industry”. “Mixed” indicates the presence of both academic and industry funding.
Test characteristicsTest typecommercial, non-commercial, mixed, NAWe used “fee paying” as the rule for classifying whether a test is commercial or not (regardless of who administered the test). “Mixed” indicates the presence of both commercial and non-commercial tests in the study. Articles without a focus on specific language tests were coded as “NA”.
Target languageEnglish, LOTEs, English + LOTEs, NAFor languages other than English (LOTEs), individual language names were coded and then aggregated as “LOTEs” in subsequent analysis. Articles without a focus on specific language tests were coded as “NA”.
Researcher characteristicsAuthor affiliationacademic, industry, mixedAuthor affiliations were extracted and coded. University and government affiliations were coded as “academic” and all other types of affiliations was coded as “industry”. “Mixed” indicates the presence of both academic and industry affiliations.
 Socioeconomic divisionGlobal North, Global South, mixedThe list of countries by regional classification by Wikimedia Foundation (2023) was used to classify the geographical regions of author affiliations as “Global North” or “Global South”. “Mixed” indicates the presence of both Global North and South affiliations.
To ensure the quality and replicability of our coding, we went through a three-stage coding process, involving four coders in total. Coder 1 is an independent coder outside the research team. Coders 2–4 are the authors of this article. In Stage 1, the coding scheme was piloted. Coder 1 coded 40 records across the report pool. Coder 2 coded 14 of the 40 records as a reliability check. Inconsistencies between the two coders which were resolved through group discussions, and the code book and coding strategies were refined accordingly. In Stage 2, Coder 1 coded the rest of the report pool following the updated coding scheme. Coders 2 and 3 coded in parallel randomly sampled records as a reliability check (Coder 2 = 70; Coder 3 = 65). In Stage 3, Coder 4 recoded all inconsistent records between Coder 1 and 2, and those between Coder 1 and 3. Uncertain cases were discussed between Coders 2, 3, and 4 to reach 100% agreement on all double coded records (labelled as Coder234 in Table 3).
Following the recommendations by Norouzian (2021), we used S index as the inter-rater reliability (IRR) metric. S index (Falotico & Quatto, 2015) is a more robust measure of agreement between coders that overcomes several limitations of popular measures such as Fleiss’ Kappa and percentage agreement. Specifically, we used the meta_rater package (Norouzian, 2021) to calculate the S index for each variable (which theoretically ranges between −1 and +1 for two coders).
Table 4 shows all IRR values calculated for each coding stage. For Stage 1 and 2, the IRR was calculated between the pairs of coders that coded the same record. For Stage 3, the final IRR was calculated between Coder 1 and the combined record of Coders 2, 3 and 4 since the three coders collectively discussed all double coded records to reach consensus. Note that Coder 1’s original records and the consensus records between Coders 2, 3, and 4 were used to calculate our final IRR, which is intended as an indication of the coding quality of Coder 1 (who coded the entire data set). As shown in Table 4, all variables have excellent or adequate IRR and therefore were all retained in subsequent analyses. Note that test type (commercial, non-commercial, mixed) has a comparatively low IRR. This may be due to the fact that the adopted classification rule (i.e., fee paying) would in some cases involve using external information outside the full text of the article to verify and thereby more likely to result in differing judgements by different coders. Due to this limitation, analyses using different classification rules for commercial tests may lead to different results and interpretations.
Table 4. IRR values for all coding stages.
VariablesStage 1Stage 2Stage 3
Coders 1 and 2Coders 1 and 2Coders 1 and 3Coder 1 and Coder 234
Author affiliationNA0.960.970.97
Article type1.000.650.650.70
FundingNA1.001.001.00
Socioeconomic divisionNA1.001.001.00
Target language0.760.630.730.67
Open code0.860.970.970.99
Open data1.001.001.001.00
Open materials1.000.800.670.79
Test type0.660.400.300.52
Note. In Stage 1 (pilot coding), author affiliation, funding and socioeconomic division were not involved as these records were already automatically coded using WoS metadata.

Analytical strategy

For RQ1, we analysed the proportion of articles with open manuscripts, open materials, open data, and open code, and explored the changes in these patterns over time. Specifically, we performed descriptive statistical analyses, calculating frequencies and percentages, and visualising the trends over time in empirical articles and non-empirical articles respectively. Note that we differentiated between empirical and non-empirical articles to obtain a more nuanced understanding of OA. Empirical articles may have different requirements and challenges in making research materials/data/code available compared to non-empirical articles, such as test reviews and theoretical papers. In addition, the OA policies for these two types of articles may differ, as exemplified by Language Testing’s (recent) policy to make all Test Review articles Bronze OA.
For RQ2, we built four logistic regression models with open manuscripts, open materials, open data and open code as the dependent variable respectively. All variables of research characteristics, test characteristics and researcher characteristics were included as independent variables to examine the extent to which these factors could predict OA.

Results

RQ1 status of OA

In Table 5, we present the frequencies and proportion of OA in terms of open manuscripts, open code, open data and open materials in empirical articles (k= 707) and in non-empirical articles (k = 191). We also visualised the trends of OA over time (Figure 2).
Table 5. Descriptive statistics of OA.
 Empirical articles
(k = 707)
Non-empirical articles
(k = 191)
YearOpen Manuscripts
k (%)
Open Code
k (%)
Open Data
k (%)
Open Materials
k (%)
Open Manuscripts
k (%)
Open Code
k (%)
Open Data
k (%)
Open Materials
k (%)
20080 (0%)0 (0%)0 (0%)6 (21.4%)0 (0%)0 (0%)0 (0%)1 (5.6%)
20091 (3.4%)0 (0%)0 (0%)14 (48.3%)2 (10.5%)0 (0%)0 (0%)2 (10.5%)
20101 (2.2%)0 (0%)0 (0%)15 (33.3%)1 (6.2%)0 (0%)0 (0%)0 (0%)
20112 (5.1%)1 (2.6%)0 (0%)13 (33.3%)3 (18.8%)0 (0%)0 (0%)1 (6.2%)
20125 (10.6%)0 (0%)0 (0%)13 (27.7%)0 (0%)0 (0%)0 (0%)2 (20%)
20136 (12.2%)0 (0%)1 (2%)16 (32.7%)0 (0%)0 (0%)0 (0%)0 (0%)
20144 (10.8%)0 (0%)0 (0%)14 (37.8%)3 (13.6%)1 (4.5%)0 (0%)1 (4.5%)
20153 (6.7%)0 (0%)0 (0%)22 (48.9%)2 (25%)1 (12.5%)0 (0%)0 (0%)
20168 (16.7%)0 (0%)0 (0%)15 (31.2%)2 (20%)0 (0%)0 (0%)0 (0%)
20175 (9.8%)0 (0%)0 (0%)13 (25.5%)0 (0%)0 (0%)0 (0%)0 (0%)
201812 (24%)0 (0%)0 (0%)15 (30%)4 (40%)0 (0%)0 (0%)0 (0%)
201911 (19%)0 (0%)0 (0%)24 (41.4%)2 (33.3%)0 (0%)0 (0%)0 (0%)
202018 (29.5%)1 (1.6%)1 (1.6%)22 (36.1%)2 (22.2%)0 (0%)0 (0%)0 (0%)
202111 (20.8%)1 (1.9%)2 (3.8%)18 (34%)11 (55%)0 (0%)0 (0%)1 (5%)
202211 (16.4%)5 (7.5%)5 (7.5%)26 (38.8%)4 (30.8%)0 (0%)0 (0%)2 (15.4%)
Total98 (13.9%)8 (1.1%)9 (1.3%)246 (34.8%)36 (18.8%)2 (1.1%)0 (0%)10 (5.2%)
Figure 2. OA trends over time.
The data from the two flagship journals reveal a rise in empirical articles offering open manuscripts, climbing from 0% in 2008 to 16.4% in 2022, with an average of 13.9% across the examined timeframe. This progression towards OA publishing is not only evident in empirical research but is also reflected in non-empirical work, in a slightly steeper upwards trajectory. The trend in open materials for empirical articles is particularly noteworthy, with a substantial increase from 21.4% in 2008 to 38.8% in 2022, averaging 34.8%.
Despite the positive trend in open manuscripts and materials, the provision of open code and data in empirical articles from these flagship journals has remained relatively scarce, with only 1.1% and 1.3% of articles sharing these resources, respectively. Nonetheless, there has been a promising rise in recent years, with both open code and data reaching 7.5% in 2022.

RQ2 factors associated with OA

Examining the factors (i.e., research, test, and researcher characteristics) associated with OA in the context of two flagship journals, we built logistic regression models, the results of which can be found in Table 6.
Table 6. Logistic regression models.
  Open manuscriptsOpen codeOpen dataOpen materials
 (Intercept)-316.74 ***
(59.09)
-509.72 *
(211.62)
-976.30 *
(409.50)
-8.19
(42.52)
Research characteristics
(Reference categories:
Article type: empirical;
Funding: unfunded)
Article type:
non-empirical
1.00 ***
(0.28)
0.54
(0.85)
-17.44
(3187.22)
-1.91 ***
(0.34)
Year of publication0.16 ***
(0.03)
0.25 *
(0.10)
0.48 *
(0.20)
0.00
(0.02)
Number of authors0.32 ***
(0.08)
0.22
(0.23)
-0.37
(0.46)
0.04
(0.07)
Funding:
academic
0.39
(0.29)
-1.48
(1.13)
-0.29
(0.93)
0.54 *
(0.24)
Funding:
industry
0.38
(0.40)
0.31
(1.21)
-18.06
(6187.40)
-0.06
(0.34)
Funding:
mixed
0.29
(0.85)
-16.35
(4869.47)
-18.17
(12,201.91)
0.14
(0.73)
Test characteristics
(Reference categories:
Test type: non-commercial;
Target language: English)
Test type:
commercial
0.38
(0.25)
-0.60
(0.81)
0.16
(0.92)
-0.67 ***
(0.19)
Test type:
mixed
-0.11
(0.33)
-16.55
(1456.97)
-17.72
(3559.33)
-0.99 ***
(0.27)
Target language:
English + LOTEs
0.14
(0.38)
1.04
(0.88)
1.55
(0.98)
-0.18
(0.31)
Target language:
LOTEs
0.97 **
(0.30)
0.03
(1.12)
-17.63
(4443.17)
-0.40
(0.27)
Researcher characteristics
(Reference categories:
Author affiliation: academic;
Socioeconomic division: Global North)
Author affiliation:
industry
-0.99 *
(0.50)
0.60
(1.15)
-17.26
(4932.67)
0.58
(0.30)
Author affiliation:
mixed
-15.31
(534.54)
-15.87
(3741.90)
-17.30
(8919.59)
0.15
(0.53)
Socioeconomic division:
Global South
-1.01 *
(0.44)
0.15
(1.15)
-0.37
(1.20)
-0.52
(0.27)
Socioeconomic division:
mixed
-0.44
(0.43)
1.09
(0.89)
0.91
(0.98)
0.12
(0.34)
 AIC609.38119.0785.06947.64
 BIC679.89189.58155.571018.15
 Log Likelihood-289.69-44.53-27.53-458.82
 Deviance579.3889.0755.06917.64
 N813813813813
Note. NA cases were removed from the analyses.
LOTEs = languages other than English.
Standard error is displayed in brackets. Due to data imbalance for certain variables, there were instances of quasi-complete separation resulting in large standard errors; however, we retained these variables in the model to avoid potential bias in the estimates of other variables that may arise from omitting relevant covariates.
For categorical predictors, we used the normative category (i.e., category with the largest sample size) as the reference category (see the first column in this table for the specific reference categories).
*
p < .05; **p < .01; ***p < .001.
For open manuscripts, the article type, year of publication, number of authors, target language, author affiliation and socioeconomic division were significant predictors. Specifically, holding all else constant, articles that are non-empirical, more recently published, with a larger team of authors, focused on non-English language tests were more likely to have open manuscripts than their counterparts (i.e., reference categories in Table 6) while industry authors and authors from the Global South were less likely to have open manuscripts than their counterparts.
For open materials, two significant predictors appeared—funding and test type. Holding all else constant, articles that had academic funding were more likely to have open materials than unfunded research while articles that focused on commercial tests, or a mixture of commercial and non-commercial tests, were less likely to have open materials than their counterparts.
For open code and open data, only year of publication had a positive effect, which conforms with the visualisations in Figure 2. The null results (and unstable estimates) observed in these models were likely due to the low presence of cases that have open code and data in the data set.

Discussion

OA in flagship journals: An uneven terrain

Focusing on two flagship journals in the field of language testing, we examined the status and development of OA in terms of open manuscripts, open materials, open data, and open code. Our findings suggest that even within the two leading journals, the terrain of OA is uneven, where progress in some areas is not matched in others.
Our study found that the prevalence of open manuscripts (Empirical: 13.9%; Non-empirical: 18.8%) and open materials (Empirical: 34.8%; Non-empirical: 5.2%) was much higher than that of open data (Empirical: 1.3%; Non-empirical: 0%) and open code (Empirical: 1.1%; Non-empirical: 1.1%).
The distinction between empirical and non-empirical articles revealed some nuanced differences in OA. Non-empirical articles had a higher proportion of open manuscripts compared to empirical articles. This may be attributed to Test Review articles’ being made free to read by Language Testing and SAGE. This move towards OA for the Test Review article type sets a precedent for other journals to consider similar approaches to enhance the accessibility of certain types of articles (particularly those that provide useful information to practitioners and non-specialist stakeholders) as a transitional step towards more equitable OA models.
For readers curious about how these patterns compare to other fields, it must be noted that such a comparison is not straightforward, and caution must be exercised given the variations in study design, time periods, and open science priorities across different fields. Examining a random sample of 250 psychology articles between 2014 and 2017, Hardwicke et al. (2022) found that 65% of the articles were publicly available, 14% had open materials, 2% had open data, and 1% had open code. In another study on a random sample of 149 biomedical articles between 2015 and 2017, Wallach et al. (2018) found that 30% included supplementary materials (though none allowed for a reconstruction of the full protocol), and 18% discussed publicly available data on some level. In a large-scale analysis of OA publishing in scholarly literature in general (e.g., articles with a Crossref DOI), Piwowar et al. (2018) estimated that at least 28% of the literature is OA, with the most recent year (2015) having the highest percentage of OA (45%).
Comparing the above results, the higher prevalence of open materials in our data set might be considered encouraging. However, as pointed out by Wallach et al. (2018), the presence of materials does not guarantee the reconstruction of the methodology/data collection process. Setting aside the differences in sampling and time, the percentage of open manuscripts is lower than found in psychology and general scholarly literature, which suggests room for improvement. The low presence of open data and code indicates considerable potential to advance the transparency and reproducibility standards in the journals we examined. The discrepant patterns in various aspects of OA suggest that OA has not been fully embraced, especially aspects that require a deeper cultural change or more robust infrastructural support.
In terms of developmental patterns, our descriptive analysis revealed an increase in open manuscripts and materials, particularly in empirical research. This might be taken as an indication of a cultural shift in flagship journals towards embracing OA principles. Similarly, the uptick in open data and code, albeit from a low base, might also be considered an emerging recognition of the importance of making research more verifiable and reproducible. Our inferential analysis corroborated these trends, with the year of publication being a significant predictor of all OA dimensions except open materials. This aligns with patterns in the broader scientific community where we see more journals integrating OA models and new journals listed in the Directory of Open Access Journals, as well as an increase in green OA articles (Chiarelli et al., 2019).

Open manuscripts: Inequitable participation

Our analysis also shows that the sharing of manuscripts is a complex matter linked to a variety of factors. Notably, we identified a concerning pattern: Authors from the Global South were less likely to have open manuscripts, a discrepancy perhaps rooted in the unique challenges these groups face in OA publishing. As reviewed earlier, many leading journals in the field adopt the hybrid OA model with costly APCs. This financial impediment, often referred to as the “pay-to-say” barrier (Piron, 2018; Šimukovič, 2018), places scholars from under-resourced backgrounds or institutions at a particular disadvantage (Chiware & Skelly, 2023). Granted, there have been some initiatives aiming to alleviate such inequities. For instance, some publishers have APC waiver policies (e.g., Springer Nature, the publisher of Language Testing in Asia) for authors based in low-income countries. However, the coverage and awareness of these policies are often limited. Many authors fall outside of waiver eligibility and still face the burden of unaffordable APCs.
Contrary to the dominant role of academic funding in OA publishing suggested by previous meta-research (Huang et al., 2020), our findings indicate a more nuanced reality in these journals. We found that the presence of academic funding did not predict the sharing of open manuscripts while author affiliation did (with industry authors less likely to publish OA than academic authors). One plausible explanation is the presence of transformative agreements between publishers and some institutions (e.g., libraries and library consortia) that take the financial burden off individual researchers. In contrast, test developers do not participate in these agreements and often resort to other publishing formats such as white papers, which might explain why author affiliation was more predictive of open manuscripts in our study.
In the meantime, publisher policies, such as those from SAGE (the publisher of Language Testing), offer lenient green OA options with few restrictions (e.g., no embargo period), indicating that financial considerations alone do not fully explain the adoption of open manuscripts. Personal choice may also play a crucial role in this case, as evidenced by a survey study on applied linguists’ perceptions of open science (Liu & De Cat, 2024). The study found that practical concerns, such as the time and effort required, and the lack of perceived necessity or value for sharing preprints or postprints, were practical barriers to open manuscripts, which could further account for the low prevalence of open manuscripts in our data.

Open materials: Academic funding versus commercial interest

In terms of open materials, we observed different predictive effects of academic funding and commercial interest. Research with academic funding showed a higher likelihood of providing open materials compared to studies without such funding. This could be taken as a reflection of the increasingly commonplace funder requirement for open science practices (Pinfield et al., 2020; Suber, 2012). Unlike OA publishing, the sharing of materials usually does not constitute (direct) financial costs, which might explain why academic funding was predictive of open materials, but not open manuscripts. Conversely, articles focusing on commercial tests, or a mixture of commercial and non-commercial tests were found to be less likely to have open materials. This could be a reflection of the unique challenges to open science faced by the field of language testing (Burton, 2023; Isbell & Kim, 2023). This tension between the free flow of scientific knowledge and the commercialisation of scientific discoveries (De Silva & Vance, 2017), particularly in relation to test security (section “Open data: ethical, practical, and methodological challenges”), is a challenge that the field needs to grapple with.

Open data and code: Reflections on open questions

Our inferential analyses of open data and code suggest a positive trend over time in the adoption of open data and code as indicated by publication year. Beyond this temporal association, however, our capacity to identify significant predictors was limited due to the sparse occurrence of such practices in our sample. Extant literature offers potential explanations for the lack of open data, ranging from concerns about test security and commercial interests to methodological challenges and epistemological differences (see section “Open data: ethical, practical, and methodological challenges”).
The situation with open code is distinct from open data, primarily because it is not entangled with the same ethical considerations. Yet, the disparity between the presence of open materials and the rarity of open code implies a differential valuation within the academic ecosystem. The absence of an “Open Code” badge implicitly signals that code openness is of lesser significance. This could certainly deter researchers from sharing their code as creating sharable and reproducible code scripts often involves additional and substantial efforts, despite the recent evidence that non-shared code is a critical barrier to reproducibility (Laurinavichyute et al., 2022). Incentivising code sharing, analogous to how open materials and data are encouraged, may remedy this disparity.

Future recommendations

Shifting towards more equitable OA models

Focusing on the case of two journals, our findings suggest the necessity for a multifaceted approach to OA that considers both financial and non-financial barriers, aiming for a more accessible scholarly communication system. We call for a shift towards non-profit OA models, particularly diamond/platinum OA, to alleviate inequities in the top tier of the publishing landscape (cf. Andringa et al., 2024 for a recent call in applied linguistics). High-calibre diamond/platinum OA journals in applied linguistics (e.g., Language Learning & Technology, Studies in Second Language Learning and Teaching), linguistics (e.g., Glossa), and psychology (e.g., Collabra: Psychology) demonstrate the viability of such models. Scholarly associations in the field should take proactive roles in facilitating such transitions by providing logistical and financial support. We also recommend high-impact authors and established scholars with tenure to leverage their influence to endorse and promote OA platforms that cater to the global research community’s varied needs. Ultimately, field-wide change can only take place through coordinated efforts by various stakeholders.

Balancing proprietary concerns and research validation

Regarding the sharing of materials and data, intellectual property rights are crucial for developers, yet they must be balanced with the need for open verification of research findings. Even when unfettered public access is not feasible, sharing these resources with peer reviewers can support the verification process (Isbell & Kim, 2023). Novel editorial strategies, such as Psychological Science’s reproducibility checks by a specialised editorial team (Hardwicke & Vazire, 2023), could offer some inspiration for improving research transparency and validity in language testing.

Expanding the scope of the open science badges

We propose an expansion of the Open Science Badges framework to better represent the breadth of transparency efforts in research. First, we propose a wider adoption of the “Open Code” badge, besides the “Open Data” badge, to formally recognise the substantial effort required to share code that enables replication and verification of results. Furthermore, it is critical to acknowledge that the term “code” carries different connotations across research paradigms. In qualitative research, for example, it might pertain to analytical frameworks, thematic categorisations, or interpretive methods rather than computational scripts. We advocate for inclusive criteria that reward the sharing of diverse analysis documentation, to enhance parity in recognition across different types of research.
Moreover, we also see the relevance of the “Open Data: Protected Access” badge to the field of language testing. Such a badge would credit the transparency efforts of those handling sensitive or confidential data that cannot be fully open but is made accessible under stringent conditions, such as ethical clearance or in controlled environments.

Our proposal: Adopting a research transparency statement

Our findings point to the importance of a more comprehensive system for indexing and explicitly documenting open science practices while offering incentives that accommodate diverse methodologies and epistemological approaches (Liu, 2023). Current initiatives, such as the Open Science Badges in Language Testing and the data availability statement (a standardised statement regarding the availability of the research data) in Language Assessment Quarterly, are primarily focused on a small subset of open science practices and research methods.
To establish a more inclusive and adaptive framework for open science practices, we propose a new reporting structure named research transparency statement (RTS)—a detailed declaration in research publication where authors explicitly outline and contextualise their open science practices to ensure clarity and transparency of the entire research lifecycle. The adoption of the RTS could promote open science awareness and offer the flexibility to align with diverse research methods and philosophical foundations. Its flexibility as a narrative statement also allows for adaptive integrations of emerging open science practices without the need for frequent overhauls of evaluation or indexing systems. Furthermore, the RTS could enhance the meta-analysis of research transparency and inform the development of nuanced, evidence-based open science policies.
We acknowledge that the adoption of the RTS would be a gradual process within scholarly publication norms. As a first step, we recommend that authors start including an RTS in the appendix. An RTS should provide information regarding the transparency of the following key stages of the research lifecycle (beyond what is typically required/provided in a standard research article):
Data collection: Provide the DOI or link to the data collection/generation instrument, or a statement of why such information is not available.
Data archiving: Provide the DOI or link to the data set, or a statement of why such information is not available or may be subject to author or third-party approval.
Data analysis: Provide the DOI or link to the data analysis script/protocol/procedure document, or a statement of why such information is not available.
Research outputs: Provide the DOI or link to the preprint/postprint if the manuscript is not already OA, and/or any additional outputs associated with this article.
Additional steps taken to enhance transparency [optional]: If applicable, include DOI or links to additional materials (e.g., preregistration).
Applying the above RTS framework,1 we provide an example with our study:
Data collection: No special data collection instrument was required for our study as we only specified journal name (in “publication titles”) and downloaded all records of the target journals from Web of Science. Our codebook for coding the data is available at https://doi.org/10.17605/osf.io/vbjd6 (folder name: Open materials).
Data archiving: Our raw and processed data are available at the above link (folder name: Open data).
Data analysis: Our R scripts and outputs are available at the above link (folder name: Open code).
Research outputs: Our postprint is available at https://doi.org/10.31219/osf.io/aedbu.
Additional steps taken to enhance transparency [optional]: Additional details of the target journal policies are available at the above OSF link (file name: “Supplementary information on journal policies”). A dedicated folder with details on how to contribute feedback and examples of research transparency statements at the above OSF link (folder name: Research transparency statement [RTS] initiative).

Limitations

It is important to acknowledge limitations of our study. First, as already highlighted, our data are limited to two flagship journals and therefore our findings should not be generalised as representing the entire field. Future research can verify whether the patterns observed in our study are present field wide. Second, our estimated prevalence of OA should be viewed conservatively. Identifying open data and code posed more challenges than identifying open manuscripts and materials. During the coding process, we found a lack of standardised and/or explicit labelling and statements regarding open science practices across articles and journals. It is possible that authors did share data or code but did not explicitly state so in the manuscript, and thereby could be potentially missed by our coders. Nevertheless, we believe such instances are infrequent and would not substantively alter our conclusions. Third, our data set and methodology does not allow for the examination of the underlying causal mechanism. Future research, especially qualitative research, can further explore the attitudes and experiences of researchers to fully understand the mechanism. Fourth, as one of the first meta-research on OA in the field, we prioritised identifying broad patterns as opposed to more fine-grained analysis (e.g., by country analysis). Future research could draw on our open data set to conduct additional analyses.

Conclusion

Our study set out to map the terrain of OA in two flagship journals in language testing, as OA is highly relevant to the field for enhancing transparency, validity, accessibility in language testing research. Specifically, we investigated four aspects of OA—open manuscripts, open materials, open data, and open code—and the associations between these aspects and various research, test, and researcher characteristics. We observed a notable advancement in OA evidenced by a rise in open manuscripts and materials, as well as a more tentative increase in open data and code, although starting from a very limited base. Our findings highlight an inequity in manuscript sharing, with authors from the Global South participating less. Academic funding was identified as a positive predictor of open materials whereas commercial interests a negative one. While our study does not permit causal inference, we have reflected on potential challenges and offered recommendations for future efforts. Key recommendations include the endorsement of more equitable publishing models, resource sharing during peer review, the expansion of Open Science Badges to recognise code sharing and data sharing with protected access, and the adoption of the Research Transparency Statement (RTS) to be inclusive of methodological and epistemological differences. We hope our study will inspire deeper explorations of open science values and practices in the field of language testing.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Footnotes

Open practice
This article has received badges for Open Data and Open Materials. More information about the Open Practices badges can be found at https://osf.io/tvyxz/wiki/home/.
Supplemental Material
Supplemental material for this article is available on OSF through the following link: https://osf.io/czqxt/.
The supplemental file below (Supplementary Material) is a peer review report for the article, documenting the reviewers’ and editors’ comments and the authors’ responses to those comments during the peer review process. This is part of a pilot for the Special Issue on Open Science in Language Testing.
1. We welcome the engagement of the broader research community to enrich the dialogue around the research transparency statement, and to ensure that our proposal remains adaptive to evolving best practices. To this end, we have a dedicated folder in our OSF repository with details on how to contribute feedback and examples of research transparency statements (Liu et al., 2024; https://doi.org/10.17605/osf.io/vbjd6).

References

Alferink I., Marsden E. (2023). OASIS: One resource to widen the reach of research in language studies. Innovation in Language Learning and Teaching, 17(5), 946–952. https://doi.org/10.1080/17501229.2023.2204100
Al-Hoorie A. H., Cinaglia C., Hiver P., Huensch A., Isbell D. R., Leung C., Sudina E. (2024). Open science: Considerations and issues for TESOL research. TESOL Quarterly, 58, 537–556. https://doi.org/10.1002/tesq.3304
Al-Hoorie A. H., Hiver P. (2023). The postprint pledge—Toward a culture of researcher-driven initiatives: A commentary on “(why) are open research practices the future for the study of language learning?” Language Learning, 73(Suppl. 2), 388–391. https://doi.org/10.1111/lang.12577
Al-Hoorie A. H., Hiver P. (2024). Open science in applied linguistics: An introduction to metascience. In Plonsky L. (Ed.), Open science in applied linguistics. Applied Linguistics Press.
Andringa S., Mos M., Van Beuningen C., González P., Hornikx J., Steinkrauss R. (2024). Diamond is a scientist’s best friend: Counteracting systemic inequality in open access publishing. Dutch Journal of Applied Linguistics, 1–13. https://doi.org/10.51751/dujal18802
Bolibaugh C., Vanek N., Marsden E. (2021). Towards a credibility revolution in bilingualism research: Open data and materials as stepping stones to more reproducible and replicable research. Bilingualism: Language and Cognition, 24(5), 801–806. https://doi.org/10.1017/s1366728921000535
Brainard J., Kaiser J. (2022). US to require free access to papers on all research it funds. Science, 377(6610), 1026–1027. https://doi.org/10.1126/science.ade6577
Burton J. D. (2023). Reflections on the past and future of language testing and assessment: An emerging scholar’s perspective. Language Testing, 40(1), 24–30. https://doi.org/10.1177/02655322221126607
Chan L., Hall B., Piron F., Tandon R., Williams L. (2020). Open science beyond open access: For and with communities. A step towards the decolonization of knowledge (pp. 1–20). The Canadian Commission for UNESCO. https://pascalobservatory.org/sites/default/files/scribd/os_for_and_with_communities_en.pdf
Chapelle C. A. (2020). Validity in language assessment. In Winke P. M., Brunfaut T. (Eds.), The Routledge handbook of second language acquisition and language testing (pp. 11–20). Routledge. https://doi.org/10.4324/9781351034784
Chiarelli A., Johnson R., Pinfield S., Richens E. (2019). Preprints and scholarly communication: An exploratory qualitative study of adoption, practices, drivers and barriers. F1000Research, 8(971), 1–78. https://doi.org/10.12688/f1000research.19619.2
Chiware E. R. T., Skelly L. (2023). Overcoming challenges to open research practices—A perspective from the global south: A commentary on “(why) are open research practices the future for the study of language learning?” Language Learning, 73, 392–396. https://doi.org/10.1111/lang.12576
Chong S.W., Sato M. (n.d.). TESOLgraphics. https://www.tesolgraphics.com/
Creswell J. W., Poth C. N. (2018). Qualitative inquiry & research design: Choosing among five approaches (4th ed.). Sage.
De Silva P. U. K., Vance C. K. (2017). Scientific scholarly communication. Springer. https://doi.org/10.1007/978-3-319-50627-2
Douglas D. (2014). Understanding language testing. Routledge.
Elman C., Kapiszewski D., Vinuela L. (2010). Qualitative data archiving: Rewards and challenges. PS: Political Science & Politics, 43(1), 23–27. https://doi.org/10.1017/S104909651099077X
Falotico R., Quatto P. (2015). Fleiss’ kappa statistic without paradoxes. Quality & Quantity, 49, 463–470. https://doi.org/10.1007/s11135-014-0003-1
Godfroid A., Andringa S. (2023). Uncovering sampling biases, advancing inclusivity, and rethinking theoretical accounts in second language acquisition: Introduction to the special issue SLA for all? Language Learning, 73, 981–1002. https://doi.org/10.1111/lang.12620
Harding L., Kremmel B. (2016). Teacher assessment literacy and professional development. In Tsagari D., Banerjee J. (Eds.), Handbook of second language assessment (pp. 413–428). De Gruyter. https://doi.org/10.1515/9781614513827-027
Harding L., Winke P. (2020). Editorial. Language Testing, 37(1), 3–5. https://doi.org/10.1177/0265532219881822
Harding L., Winke P. (2022). Innovation and expansion in Language Testing for changing times. Language Testing, 39(1), 3–6. https://doi.org/10.1177/02655322211053212
Hardwicke T. E., Thibault R. T., Kosie J. E., Wallach J. D., Kidwell M. C., Ioannidis J. P. A. (2022). Estimating the prevalence of transparency and reproducibility-related research practices in psychology (2014-2017). Perspectives on Psychological Science, 17(1), 239–251. https://doi.org/10.1177/1745691620979806
Hardwicke T. E., Vazire S. (2023). Transparency is now the default at Psychological Science. Psychological Science, 1–4. https://doi.org/10.1177/09567976231221573
Hiver P., Nagle C. (2024). Complex adaptive interventions: The challenge ahead for instructed second language acquisition research. Annual Review of Applied Linguistics, 1–16. https://doi.org/10.1017/S0267190524000060
Huang C.-K., Neylon C., Hosking R., Montgomery L., Wilson K. S., Ozaygen A., Brookes-Kenworthy C. (2020). Evaluating the impact of open access policies on research institutions. eLife, 9, e57067. https://doi.org/10.7554/eLife.57067
Hughes A., Porter D. (1984). Editorial. Language Testing, 1(1), i–ii.
Isaacs T., Chalmers H. (2023). Reducing “avoidable research waste” in applied linguistics research: Insights from healthcare research. Language Teaching, 1–18. https://doi.org/10.1017/S0261444823000411
Isaacs T., Winke P. (2024). Purposeful turns for more equitable and transparent publishing in language testing and assessment. Language Testing, 41(1), 3–8. https://doi.org/10.1177/02655322231203234
Isbell D. R., Brown D., Chen M., Derrick D. J., Ghanem R., Arvizu M. N. G., Schnur E., Zhang M., Plonsky L. (2022). Misconduct and questionable research practices: The ethics of quantitative data handling and reporting in applied linguistics. The Modern Language Journal, 106(1), 172–195. https://doi.org/10.1111/modl.12760
Isbell D. R., Kim J. (2023). Developer involvement and COI disclosure in high-stakes English proficiency test validation research: A systematic review. Research Methods in Applied Linguistics, 2(3), 100060. https://doi.org/10.1016/j.rmal.2023.100060
Kozlov M. (2022). NIH issues a seismic mandate: Share data publicly. Nature, 602(7898), 558–559. https://doi.org/10.1038/d41586-022-00402-1
Kremmel B., Harding L. (2020). Towards a comprehensive, empirical model of language assessment literacy across stakeholder groups: Developing the language assessment literacy survey. Language Assessment Quarterly, 17(1), 100–120. https://doi.org/10.1080/15434303.2019.1674855
Kunnan A. J. (2014). Fairness and justice in language assessment: Principles and public reasoning. In Deng X., Seow R. (Eds.), Alternative pedagogies in the English language and communication classroom (pp. 36–39). Centre for English Language Communication, National University of Singapore.
Kunnan A. J. (2018). Evaluating language assessments. Routledge.
Laurinavichyute A., Yadav H., Vasishth S. (2022). Share the code, not just the data: A case study of the reproducibility of articles published in the Journal of Memory and Language under the open data policy. Journal of Memory and Language, 125, 104332. https://doi.org/10.1016/j.jml.2022.104332
Liu M. (2023). Whose open science are we talking about? From open science in psychology to open science in applied linguistics. Language Teaching, 56(4), 443–450. https://doi.org/10.1017/S0261444823000307
Liu M., Al-Hoorie A. H., Hiver P., (2024). Open data, code, and materials for “Open access in language testing and assessment.” Open Science Framework. https://doi.org/10.17605/osf.io/vbjd6
Liu M., Chong S. W., Marsden E., McManus K., Morgan-Short K., Al-Hoorie A. H., Plonsky L., Bolibaugh C., Hiver P., Winke P., Huensch A., Hui B. (2023). Open scholarship in applied linguistics: What, why, and how. Language Teaching, 56(3), 432–437. https://doi.org/10.1017/S0261444822000349
Liu M., De Cat C. (2024). Open science in applied linguistics: A preliminary survey. In Plonsky L. (Ed.), Open science in applied linguistics. Applied Linguistics Press.
Liu M., Marsden E. (2024). The open turn: Rethinking applied linguistics research through open scholarship. https://doi.org/10.31219/osf.io/9kqvf
Marsden E., Mackey A., (2014). IRIS: A nesw resource for second language research. Linguistic Approaches to Bilingualism, 4(1), 125–130. https://doi.org/10.1075/lab.4.1.05mar
Marsden E., Morgan-Short K. (2023). (Why) are open research practices the future for the study of language learning? Language Learning, 73, 344–387. https://doi.org/10.1111/lang.12568
Marsden E., Morgan-Short K., Thompson S., Abugaber D. (2018). Replication in second language research: Narrative and systematic reviews and recommendations for the field. Language Learning, 68(2), 321–391. https://doi.org/10.1111/lang.12286
Marsden E., Plonsky L. (2018). Data, open science, and methodological reform in second language acquisition research. In Gudmestad A., Edmonds A. (Eds.), Critical reflections on data in second language acquisition (pp. 219–228). John Benjamins.
McManus K. (2022). Are replication studies infrequent because of negative attitudes? Insights from a survey of attitudes and practices in second language research. Studies in Second Language Acquisition, 44(5), 1410–1423. https://doi.org/10.1017/S0272263121000838
Norouzian R. (2021). Interrater reliability in second language meta-analyses: The case of categorical moderators. Studies in Second Language Acquisition, 43(4), 896–915. https://doi.org/10.1017/S0272263121000061
Page M. J., McKenzie J. E., Bossuyt P. M., Boutron I., Hoffmann T. C., Mulrow C. D., Shamseer L., Tetzlaff J. M., Akl E. A., Brennan S. E., Chou R., Glanville J., Grimshaw J. M., Hróbjartsson A., Lalu M. M., Li T., Loder E. W., Mayo-Wilson E., McDonald S., Moher D. (2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ, 372, n71. https://doi.org/10.1136/bmj.n71
Pinfield S., Wakeling S., Bawden D., Robinson L. (2020). Open access in theory and practice: The theory-practice relationship and openness. Routledge.
Piron F. (2018). Postcolonial open access. In Herb U., Schöpfel J. (Eds.), Open divide: Critical studies on open access (pp. 1–8). Litwin Books.
Piwowar H., Priem J., Larivière V., Alperin J. P., Matthias L., Norlander B., Farley A., West J., Haustein S., (2018). The state of OA: A large-scale analysis of the prevalence and impact of Open Access articles. PeerJ, 6. 1–23. https://doi.org/10.7717/peerj.4375
Plonsky L. (2024). Open science in applied linguistics. Applied Linguistics Press.
Porte G., McManus K. (2019). Doing replication research in applied linguistics. Routledge.
R Core Team. (2023). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
Šimukovič E. (2018). Open access, a new kind of emerging knowledge regime? In Herb U., Schöpfel J. (Eds.), Open divide: Critical studies on open access (pp. 31–40). Litwin Books.
Suber P. (2012). Open access. MIT Press.
UiT The Arctic University of Norway. (n.d.). TROLLing: Tromsø Repository of Language and Linguistics. https://site.uit.no/trolling
UNESCO. (2021). UNESCO recommendation on Open Science. https://unesdoc.unesco.org/ark:/48223/pf0000379949.locale=en
Wallach J. D., Boyack K. W., Ioannidis J. P. A. (2018). Reproducible research practices, transparency, and open access data in the biomedical literature, 2015–2017. PLOS Biology, 16(11), Article e2006930. https://doi.org/10.1371/journal.pbio.2006930
Wikimedia Foundation. (2023). List of countries by regional classification. https://meta.wikimedia.org/wiki/List_of_countries_by_regional_classification
Wilkinson M. D., Dumontier M., Aalbersberg Ij J., Appleton G., Axton M., Baak A., Blomberg N., Boiten J.-W., da Silva Santos L. B., Bourne P. E. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3(1), 1–9.
Winke P. (2023). Forty years of Language Testing, and the changing paths of publishing. Language Testing, 40(1), 3–7. https://doi.org/10.1177/02655322221136802

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.