| Title: | Korean National Assembly Data for Political Science Education |
|---|---|
| Description: | Provides ready-to-use datasets from the Korean National Assembly (assemblies 20 through 22, 2016-2026) for teaching quantitative methods in political science. Includes legislator metadata, bill proposals, roll call votes, asset declarations, and policy seminar records. Designed as a Korean politics counterpart to packages like 'palmerpenguins', enabling students to practice regression, panel data analysis, text analysis, and network analysis with real legislative data. Roll call vote data and spatial voting models are described in Poole and Rosenthal (1985) <doi:10.2307/2111172>. Legislative data is sourced from the Korean National Assembly Open API. |
| Authors: | Kyusik Yang [aut, cre] |
| Maintainer: | Kyusik Yang <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.2 |
| Built: | 2026-05-14 09:30:17 UTC |
| Source: | https://github.com/kyusik-yang/assemblykor |
Provides ready-to-use datasets from the Korean National Assembly for teaching quantitative methods in political science. Includes five built-in datasets covering legislator metadata, bills, asset declarations, policy seminars, and committee speeches.
legislators: 947 MP records (20th-22nd assemblies)
bills: 60,925 legislative bills
wealth: 2,928 legislator-year asset declarations
seminars: 5,962 legislator-year seminar records
speeches: 15,843 speech records (22nd, Science & ICT Committee)
votes: 8,050 plenary vote tallies (20th-22nd assemblies)
roll_calls: 383,739 member-level roll call votes (22nd assembly)
get_bill_texts: 60,925 bill propose-reason texts
get_proposers: 769,773 co-sponsorship records
Nine Korean-language tutorials covering tidyverse, visualization, regression,
panel data, text analysis, network analysis, roll call analysis, bill success,
and speech patterns. Use list_tutorials to see all tutorials,
and open_tutorial to copy them to your working directory.
Maintainer: Kyusik Yang [email protected]
Useful links:
Report bugs at https://github.com/kyusik-yang/assemblykor/issues
Metadata for 60,925 legislative bills proposed during the 20th through 22nd Korean National Assembly (2016-2026).
billsbills
A data frame with 60,925 rows and 9 variables:
Unique bill identifier from the National Assembly system
Numeric bill number
Assembly number (20, 21, or 22)
Full bill title in Korean
Standing committee to which the bill was referred
Date the bill was formally proposed
Legislative outcome in Korean. Common values include
passed as-is, expired at term end, and incorporated into
alternative bill. See table(bills$result) for all values.
Name of the lead (primary) proposer
MONA_CD of the lead proposer (links to legislators$member_id)
The Korean National Assembly has seen a dramatic increase in bill proposals: the 21st Assembly produced 23,655 bills versus 21,594 in the 20th. Most bills expire at the end of the assembly term (term expiry); only about 5\
Use get_bill_texts() to download the full propose-reason texts
for text analysis, and get_proposers() for the complete
co-sponsorship records (769,773 rows).
Open National Assembly Information API (Republic of Korea).
data(bills) # Bills per assembly table(bills$assembly) # Top 10 committees sort(table(bills$committee), decreasing = TRUE)[1:10] # Distribution of legislative outcomes head(sort(table(bills$result), decreasing = TRUE))data(bills) # Bills per assembly table(bills$assembly) # Top 10 committees sort(table(bills$committee), decreasing = TRUE)[1:10] # Distribution of legislative outcomes head(sort(table(bills$result), decreasing = TRUE))
Downloads the full propose-reason texts (jean-iyu) for all 60,925 bills. The file is approximately 25 MB and is cached locally after the first download. Requires the arrow package to read parquet files.
get_bill_texts(cache_dir = NULL, force_download = FALSE)get_bill_texts(cache_dir = NULL, force_download = FALSE)
cache_dir |
Directory to cache downloaded files. Defaults to
|
force_download |
Logical. If |
A data frame with 60,925 rows and 3 variables:
Bill identifier (links to bills$bill_id)
Full text of the propose-reason statement (Korean)
Data collection status: "ok", "empty", "no_csrf", or "error"
if (requireNamespace("arrow", quietly = TRUE)) { texts <- get_bill_texts(cache_dir = tempdir()) nchar_dist <- nchar(texts$propose_reason) hist(nchar_dist, breaks = 100, main = "Length of Propose-Reason Texts") }if (requireNamespace("arrow", quietly = TRUE)) { texts <- get_bill_texts(cache_dir = tempdir()) nchar_dist <- nchar(texts$propose_reason) hist(nchar_dist, breaks = 100, main = "Length of Propose-Reason Texts") }
Downloads the complete proposer records (769,773 rows) listing every legislator who co-sponsored each bill. Requires the arrow package.
get_proposers(cache_dir = NULL, force_download = FALSE)get_proposers(cache_dir = NULL, force_download = FALSE)
cache_dir |
Directory to cache downloaded files. Defaults to
|
force_download |
Logical. If |
A data frame with 769,773 rows and 8 variables:
Bill identifier (links to bills$bill_id)
Numeric bill number
Bill title in Korean
Proposal date
Legislator name
Party affiliation at the time of co-sponsorship
Legislator identifier (links to legislators$member_id)
Logical: TRUE if lead (primary) proposer, FALSE if co-sponsor
if (requireNamespace("arrow", quietly = TRUE) && requireNamespace("dplyr", quietly = TRUE)) { props <- get_proposers(cache_dir = tempdir()) # Build co-sponsorship edgelist leads <- dplyr::select( dplyr::filter(props, is_lead), bill_id, lead = member_id ) cosponsors <- dplyr::select( dplyr::filter(props, !is_lead), bill_id, cosponsor = member_id ) edges <- dplyr::inner_join( leads, cosponsors, by = "bill_id", relationship = "many-to-many" ) }if (requireNamespace("arrow", quietly = TRUE) && requireNamespace("dplyr", quietly = TRUE)) { props <- get_proposers(cache_dir = tempdir()) # Build co-sponsorship edgelist leads <- dplyr::select( dplyr::filter(props, is_lead), bill_id, lead = member_id ) cosponsors <- dplyr::select( dplyr::filter(props, !is_lead), bill_id, cosponsor = member_id ) edges <- dplyr::inner_join( leads, cosponsors, by = "bill_id", relationship = "many-to-many" ) }
Biographical and political metadata for 947 records of legislators who served in the 20th (2016-2020), 21st (2020-2024), or 22nd (2024-2028) Korean National Assembly. Some legislators appear in multiple assemblies.
legislatorslegislators
A data frame with 947 rows and 15 variables:
Unique legislator identifier (MONA_CD from the National Assembly API)
Assembly number (20, 21, or 22)
Name in Korean (hangul)
Name in Chinese characters (hanja)
Name in English (romanized)
Party affiliation during the assembly term
Party at the time of election
Electoral district name, or party list position for proportional members
Election type: "constituency" or "proportional"
Standing committee assignments (comma-separated)
"M" (male) or "F" (female)
Date of birth
Number of terms served, including current (1 = first-term)
Total bills participated in (as lead proposer or co-sponsor)
Bills proposed as lead (primary) proposer
661 unique legislators served across the three assemblies. member_id
is consistent across assemblies, so legislators can be tracked over time.
Party names may differ between party (mid-term) and party_elected
(election day) due to party mergers and name changes, which are common
in Korean politics.
Open National Assembly Information API (Republic of Korea). License: public domain (Korean government open data).
data(legislators) # Party composition by assembly table(legislators$assembly, legislators$party) # Gender gap in bill production tapply(legislators$n_bills_lead, legislators$gender, median) # First-term vs senior legislators boxplot(n_bills_lead ~ seniority, data = legislators, xlab = "Terms served", ylab = "Bills proposed (lead)")data(legislators) # Party composition by assembly table(legislators$assembly, legislators$party) # Gender gap in bill production tapply(legislators$n_bills_lead, legislators$gender, median) # First-term vs senior legislators boxplot(n_bills_lead ~ seniority, data = legislators, xlab = "Terms served", ylab = "Bills proposed (lead)")
Lists the tutorial R Markdown files included with the package. Tutorials are designed for classroom use in Korean political science methods courses. Each tutorial is available in two formats:
Plain Rmd for editing in RStudio (open_tutorial)
Interactive learnr format (run_tutorial)
list_tutorials()list_tutorials()
A character vector of tutorial file names (invisibly).
list_tutorials()list_tutorials()
Copies a tutorial R Markdown file to the specified directory (default: current working directory) so students can edit and run it in RStudio.
open_tutorial(name, dest_dir = getwd())open_tutorial(name, dest_dir = getwd())
name |
Tutorial name (with or without .Rmd extension), or a number corresponding to the tutorial order (1-9). |
dest_dir |
Directory to copy the file to. Defaults to the current working directory. |
The path to the copied file (invisibly).
run_tutorial for the interactive browser version.
if (interactive()) { # Copy by name open_tutorial("01-tidyverse-basics") # Copy by number open_tutorial(1) }if (interactive()) { # Copy by name open_tutorial("01-tidyverse-basics") # Copy by number open_tutorial(1) }
Returns the file path to CSV versions of the built-in datasets stored
in inst/extdata. Useful for teaching file I/O with
read.csv() or readr::read_csv().
path_to_file(file = NULL)path_to_file(file = NULL)
file |
Name of the CSV file. One of |
A character string with the full file path.
# Read data from CSV (alternative to data()) path <- path_to_file("legislators.csv") legislators_csv <- read.csv(path, fileEncoding = "UTF-8") head(legislators_csv)# Read data from CSV (alternative to data()) path <- path_to_file("legislators.csv") legislators_csv <- read.csv(path, fileEncoding = "UTF-8") head(legislators_csv)
Individual legislator voting records for all 1,286 bills that went to a recorded plenary vote in the 22nd Korean National Assembly (2024-2026). Each row represents one legislator's vote on one bill.
roll_callsroll_calls
A data frame with 383,739 rows and 8 variables:
Bill identifier (links to votes$bill_id and
bills$bill_id)
Assembly number (22)
Legislator name in Korean
Legislator identifier (MONA_CD, links to
legislators$member_id)
Party affiliation at time of vote
Electoral district or proportional list position
Vote cast in Korean: one of four values meaning yes, no, abstain, or absent
Date of the vote
The member-level roll call API is only available for the 22nd
assembly. For the 20th and 21st assemblies, use the bill-level
votes dataset.
This dataset enables ideal point estimation (e.g., W-NOMINATE),
party unity scores, and analysis of legislative coalitions. Use
member_id to link with legislators for biographical
metadata.
Open National Assembly Information API (Republic of Korea),
endpoint nojepdqqaweusdfbi.
data(roll_calls) # Vote distribution table(roll_calls$vote) # Votes per party head(sort(table(roll_calls$party), decreasing = TRUE)) # Number of unique legislators length(unique(roll_calls$member_id))data(roll_calls) # Vote distribution table(roll_calls$vote) # Votes per party head(sort(table(roll_calls$party), decreasing = TRUE)) # Number of unique legislators length(unique(roll_calls$member_id))
Launches a learnr interactive tutorial in the browser. Students can type and run code directly in the browser with hints and solutions. Requires the learnr package.
run_tutorial(name)run_tutorial(name)
name |
Tutorial name or number (1-9). Use |
No return value, called for the side effect of launching a learnr tutorial in the browser.
open_tutorial for the plain Rmd version.
if (interactive()) { run_tutorial(1) }if (interactive()) { run_tutorial(1) }
Annual panel of policy seminar hosting activity for legislators in the 16th through 22nd Korean National Assembly. Policy seminars (jeongchaek semina) are informal legislative events where MPs invite experts, stakeholders, and colleagues from other parties to discuss policy issues.
seminarsseminars
A data frame with 5,962 rows and 18 variables:
Legislator name in Korean
Legislator identifier (MONA_CD, links to
legislators$member_id). Available for ~95\
NA for unmatched or ambiguous (homonym) cases.
Calendar year
Assembly number (17-22)
Party affiliation
Political camp: "liberal", "conservative", "progressive", or "other" (values are in Korean)
Number of terms served
Number of policy seminars hosted that year
Number of seminars co-hosted with other-party legislators
Share of seminars that were cross-party (0-1)
Average number of co-hosts per seminar
Logical: belongs to the governing (presidential) party
Logical: female legislator
Logical: proportional-representation member
Logical: represents a Seoul district
Province/metro area of electoral district
Total assembly terms served across career
Number of bills proposed as lead proposer that year
Policy seminars are a distinctive feature of the Korean National Assembly.
Unlike floor speeches or committee hearings, seminars are voluntary and
allow legislators to signal policy expertise and build cross-party ties.
The cross_party_ratio variable captures how often a legislator
cooperates across party lines in this informal arena.
The is_governing variable enables difference-in-differences designs:
when a party transitions from opposition to governing (or vice versa),
does its members' cross-party collaboration change?
National Assembly Seminar Database, collected via API.
data(seminars) # Cross-party collaboration by governing status tapply(seminars$cross_party_ratio, seminars$is_governing, mean, na.rm = TRUE) # Seminar activity over time agg <- aggregate(n_seminars ~ year, data = seminars, FUN = sum) plot(agg, type = "b", main = "Total Policy Seminars by Year") # Gender gap in seminar hosting tapply(seminars$n_seminars, seminars$is_female, median, na.rm = TRUE)data(seminars) # Cross-party collaboration by governing status tapply(seminars$cross_party_ratio, seminars$is_governing, mean, na.rm = TRUE) # Seminar activity over time agg <- aggregate(n_seminars ~ year, data = seminars, FUN = sum) plot(agg, type = "b", main = "Total Policy Seminars by Year") # Gender gap in seminar hosting tapply(seminars$n_seminars, seminars$is_female, median, na.rm = TRUE)
Detects a Korean-compatible font on the current system and applies it
to all ggplot2 plots via theme_set(). Call this once at the top
of your script to avoid broken Korean text in plot titles and labels.
set_ko_font(font = NULL)set_ko_font(font = NULL)
font |
Optional font family name to use directly. If |
The font family name used (invisibly).
if (interactive()) { library(ggplot2) set_ko_font() # Now Korean text renders correctly ggplot(data.frame(x = 1), aes(x, x)) + geom_point() + labs(title = "Korean Title Test") }if (interactive()) { library(ggplot2) set_ko_font() # Now Korean text renders correctly ggplot(data.frame(x = 1), aes(x, x)) + geom_point() + labs(title = "Korean Title Test") }
Full corpus of 15,843 speech records from the Science, Technology, Information, Broadcasting and Communications Committee of the 22nd Korean National Assembly (2024). Standing committee meetings only.
speechesspeeches
A data frame with 15,843 rows and 9 variables:
Assembly number (22)
Date of the committee meeting
Committee name in Korean
Speaker label as it appears in the minutes (may include titles)
Speaker role: "legislator", "chair", "minister", "vice_minister", "senior_bureaucrat", "agency_head", "witness", "expert_witness", "nominee", "minister_nominee", "testifier", "public_corp_head", "broadcasting", "committee_staff"
Cleaned speaker name with titles removed
Legislator identifier (MONA_CD, links to
legislators$member_id). Available for all rows; however,
non-legislator speakers (ministers, witnesses, etc.) will not match
entries in legislators.
Order of the speech turn within the meeting
Full text of the speech in Korean
This dataset contains the complete standing committee speech records (no sampling) for the Science and ICT Committee of the 22nd assembly (June-December 2024). Speeches shorter than 50 characters were excluded.
The role variable distinguishes legislators from government
officials, witnesses, and other participants. Filter to
role == "legislator" for MP speeches only, or compare how
legislators and ministers discuss the same agenda items.
This committee covers AI, telecommunications, broadcasting, space policy, and R&D governance, making it suitable for keyword analysis, topic modeling, and other text analysis exercises.
National Assembly committee minutes via the Open National Assembly Information API.
data(speeches) # Distribution of speech lengths hist(nchar(speeches$speech), breaks = 100, main = "Speech Length Distribution", xlab = "Characters") # Speaker roles table(speeches$role) # Most frequent legislator speakers leg <- speeches[speeches$role == "legislator", ] head(sort(table(leg$speaker_name), decreasing = TRUE), 10) # Simple keyword search (example: AI-related speeches) ai <- speeches[grepl("AI", speeches$speech), ] nrow(ai)data(speeches) # Distribution of speech lengths hist(nchar(speeches$speech), breaks = 100, main = "Speech Length Distribution", xlab = "Characters") # Speaker roles table(speeches$role) # Most frequent legislator speakers leg <- speeches[speeches$role == "legislator", ] head(sort(table(leg$speaker_name), decreasing = TRUE), 10) # Simple keyword search (example: AI-related speeches) ai <- speeches[grepl("AI", speeches$speech), ] nrow(ai)
Bill-level vote tallies from plenary sessions of the 20th through 22nd Korean National Assembly (2016-2026). Each row represents one bill that went to a recorded floor vote.
votesvotes
A data frame with 8,050 rows and 13 variables:
Unique bill identifier (links to bills$bill_id)
Numeric bill number
Full bill title in Korean
Assembly number (20, 21, or 22)
Standing committee to which the bill was referred
Date of the plenary vote
Vote outcome in Korean (e.g., passed as-is, passed with amendments, rejected)
Type of bill (e.g., legislation, budget, resolution)
Total number of assembly members at the time
Number of members who cast a vote
Number of yes votes
Number of no votes
Number of abstentions
Not all bills go to a floor vote. Most bills are disposed of in
committee or expire at the end of the assembly term. The votes
dataset captures only those that reached the plenary floor for a
recorded vote.
About 40\
because bills only contains legislator-proposed bills while
votes also includes committee alternatives, budget bills,
and resolutions that have separate identifiers.
See roll_calls for member-level voting records
(22nd assembly), useful for ideal point estimation or party
discipline analysis.
Open National Assembly Information API (Republic of Korea),
endpoint ncocpgfiaoituanbr.
data(votes) # Votes per assembly table(votes$assembly) # Pass rate table(votes$result) # Average yes rate votes$yes_rate <- votes$yes / votes$voted summary(votes$yes_rate) # Contentious votes (yes rate < 70%) contentious <- votes[votes$yes / votes$voted < 0.7, ] nrow(contentious)data(votes) # Votes per assembly table(votes$assembly) # Pass rate table(votes$result) # Average yes rate votes$yes_rate <- votes$yes / votes$voted summary(votes$yes_rate) # Contentious votes (yes rate < 70%) contentious <- votes[votes$yes / votes$voted < 0.7, ] nrow(contentious)
Panel data of asset declarations for 773 Korean National Assembly members across 13 reporting periods (2015-2025). Derived from mandatory public disclosures via the OpenWatch project.
wealthwealth
A data frame with 2,928 rows and 14 variables:
Legislator identifier (links to legislators$member_id)
Disclosure year (2015-2025)
Legislator name in Korean
Total declared assets, in thousands of KRW
Total declared liabilities, in thousands of KRW
Net worth (assets minus debt), in thousands of KRW
Total real estate value, in thousands of KRW
Total building/structure value, in thousands of KRW
Total land value, in thousands of KRW
Total bank deposits, in thousands of KRW
Total stock holdings, in thousands of KRW
Total number of properties disclosed
Logical: owns property in Seoul
Logical: owns property in Gangnam (Seoul's wealthiest district)
All monetary values are in thousands of KRW (1 unit = 1,000 won). To convert to billions of won, divide by 1,000,000. For example, a net_worth of 1,670,000 means 1.67 billion won (approximately USD 1.2 million).
Legislators are required by law to disclose their assets annually. Not all legislators appear in every year, as the panel is unbalanced (entries correspond to active service periods).
OpenWatch (https://docs.openwatch.kr/data/national-assembly), CC BY-SA 4.0 license.
data(wealth) # Distribution of net worth (in billions of won) hist(wealth$net_worth / 1e6, breaks = 50, main = "Legislator Net Worth", xlab = "Billion KRW") # Real estate as share of total assets wealth$re_share <- wealth$real_estate / wealth$total_assets summary(wealth$re_share) # Gangnam property owners vs others tapply(wealth$net_worth / 1e6, wealth$has_gangnam_property, median, na.rm = TRUE)data(wealth) # Distribution of net worth (in billions of won) hist(wealth$net_worth / 1e6, breaks = 50, main = "Legislator Net Worth", xlab = "Billion KRW") # Real estate as share of total assets wealth$re_share <- wealth$real_estate / wealth$total_assets summary(wealth$re_share) # Gangnam property owners vs others tapply(wealth$net_worth / 1e6, wealth$has_gangnam_property, median, na.rm = TRUE)