Codebook
Column definitions for the public release dataset. 66 columns across 13 groups.
Core (9 columns)
| Column | Display name | Type | Description |
|---|---|---|---|
| publisher | Publisher | string | Publisher name (e.g., Elsevier, Springer Nature, Wiley) |
| journal | Journal | string | Journal title as listed on the publisher website |
| editor | Name | string | Editor full name as scraped |
| role | Role (raw) | string | Role as listed on publisher page (unstandardized) |
| role_std | Role (standardized) | string | Standardized role: editor_in_chief, associate_editor, section_editor, reviewing_editor, editorial_board_member, deputy_editor, guest_editor, other |
| affiliation | Affiliation | string | Institutional affiliation as listed (cleaned but not canonicalized) |
| orcid | ORCID | string | ORCID iD (16-digit identifier, if available) |
| source_url | Source URL | string | URL of the editorial board page scraped |
| scraped_at | Scraped at | datetime | ISO timestamp of when this record was scraped |
Gender (3 columns)
| Column | Display name | Type | Description |
|---|---|---|---|
| gender | Gender | string | Inferred gender: male, female, andy (androgynous), unknown |
| gender_raw | Gender (raw) | string | Raw output from gender-guesser before thresholding |
| gender_prob | Gender confidence | float | Confidence score (0-1) for gender inference |
Institution (ROR) (8 columns)
| Column | Display name | Type | Description |
|---|---|---|---|
| ror_id | ROR ID | string | Research Organization Registry identifier |
| ror_name | Institution | string | Canonical institution name from ROR |
| ror_country | Country | string | Country of the institution |
| ror_city | City | string | City of the institution from ROR/GeoNames |
| ror_state | State/Province | string | State or province of the institution from ROR/GeoNames |
| org_type | Org type | string | Organization type from ROR (Education, Healthcare, Government, etc.) |
| latitude | Latitude | float | Geographic latitude of the institution |
| longitude | Longitude | float | Geographic longitude of the institution |
Classification (OpenAlex) (4 columns)
| Column | Display name | Type | Description |
|---|---|---|---|
| scientific_domain | Domain | string | Broadest classification level (e.g., Life Sciences, Physical Sciences) |
| scientific_field | Field | string | Mid-level field (e.g., Medicine, Engineering, Psychology) |
| scientific_subfield | Subfield | string | Narrow subfield classification |
| scientific_topic | Topic | string | Most granular topic classification |
Journal identifiers (2 columns)
| Column | Display name | Type | Description |
|---|---|---|---|
| openalex_source_id | OpenAlex source ID | string | OpenAlex identifier for the journal |
| issn_l | ISSN-L | string | Linking ISSN (groups print and electronic) |
Journal metrics (7 columns)
| Column | Display name | Type | Description |
|---|---|---|---|
| oa_2yr_mean_citedness | Mean citedness (2yr) | float | Average citations received by articles in the last 2 years |
| oa_journal_h_index | Journal h-index | int | Journal-level h-index from OpenAlex |
| oa_journal_works_count | Journal works count | int | Total number of works published in the journal |
| oa_journal_cited_by_count | Journal citations | int | Total citations received by the journal |
| is_in_doaj | In DOAJ | bool | Listed in the Directory of Open Access Journals |
| is_oa | Open access | bool | Journal is classified as open access by OpenAlex |
| oa_impact_quartile | Impact quartile | string | Q1-Q4 computed locally from OpenAlex 2-year mean citedness. NOT the Clarivate JIF or Scopus CiteScore quartile. |
Editor bibliometrics (5 columns)
| Column | Display name | Type | Description |
|---|---|---|---|
| h_index | h-index | int | Author h-index from OpenAlex |
| total_publications | Publications | int | Total number of works by this author |
| total_citations | Citations | int | Total citations received by this author |
| academic_age | Academic age | int | Years since first publication |
| orcid_source | ORCID source | string | How the ORCID was obtained (scraped, openalex, orcid_api) |
Indexing (6 columns)
| Column | Display name | Type | Description |
|---|---|---|---|
| indexed_pubmed | PubMed | bool | Journal is indexed in PubMed/MEDLINE |
| indexed_scopus | Scopus | bool | Journal is indexed in Scopus |
| indexed_wos | Web of Science | bool | Journal is indexed in Web of Science |
| indexed_doaj | DOAJ | bool | Journal is in the Directory of Open Access Journals |
| indexed_cope | COPE | bool | Publisher is a member of COPE (Committee on Publication Ethics) |
| indexing_count | Index count | int | Number of major indexes the journal appears in (0-5) |
Norwegian Publishing Indicator (3 columns)
| Column | Display name | Type | Description |
|---|---|---|---|
| npi_level | NPI level | string | Norwegian Publishing Indicator level (1 or 2). Level 2 = top 20% of journals. |
| npi_discipline | NPI discipline | string | Broad discipline in the Norwegian system |
| npi_field | NPI field | string | Specific field in the Norwegian system |
Funding (6 columns)
| Column | Display name | Type | Description |
|---|---|---|---|
| top_funder_1 | Top funder 1 | string | Most common funding source for articles in this journal |
| top_funder_1_count | Funder 1 count | int | Number of funded articles from this funder |
| top_funder_2 | Top funder 2 | string | Second most common funder |
| top_funder_2_count | Funder 2 count | int | Count |
| top_funder_3 | Top funder 3 | string | Third most common funder |
| top_funder_3_count | Funder 3 count | int | Count |
Board diversity (6 columns)
| Column | Display name | Type | Description |
|---|---|---|---|
| board_size | Board size | int | Total number of editors on this journal's board |
| board_pct_female | % female | float | Percentage of female editors on this board |
| board_country_count | Countries on board | int | Number of distinct countries represented on the board |
| board_country_hhi | Country HHI | float | Herfindahl-Hirschman Index of country concentration (0=diverse, 1=concentrated) |
| board_institution_count | Institutions on board | int | Number of distinct institutions on the board |
| board_mean_h_index | Mean board h-index | float | Average h-index of board members |
Multi-board (3 columns)
| Column | Display name | Type | Description |
|---|---|---|---|
| boards_count | Boards served | int | Number of editorial boards this editor serves on |
| publishers_count | Publishers served | int | Number of publishers this editor serves across |
| is_multi_board | Multi-board | bool | True if editor serves on 2+ boards |
Metadata (4 columns)
| Column | Display name | Type | Description |
|---|---|---|---|
| name_script | Script | string | Detected writing script of the name (Latin, CJK, Cyrillic, etc.) |
| name_script_region | Script region | string | Geographic region associated with the name script |
| data_version | Version | string | Dataset version identifier |
| enriched_at | Enriched at | datetime | ISO timestamp of enrichment completion |