Assigning Nationality to Names Using Machine Learning to Differentiate Emigration from Return Migration of Scholars

Faeze Ghorbanpour, Ludwig Maximilian University of Munich, Munich
Thiago Malaguth, Max Planck Institute for Demographic Research (MPIDR)
Aliakbar Akbaritabar , Max Planck Institute for Demographic Research (MPIDR)

Most digital trace data does not include the nationality of individuals for privacy reasons. Once this data is used for migration research, it can have a left truncation issue since we are uncertain about the migrant’s country of origin. Identifying nationality enables a better differentiation between emigration and return migration. We detect the nationality with the least available data, full names, and use it instead of the country of academic origin in studying the migration of scholars. We gathered 2.6 million unique name-nationality pairs from Wikipedia and categorized them into families of nationalities with three granularity levels. We used a character-based machine learning model that reached a weighted F1-score of 80% for highest- and 64% for country-level categorization. We discuss the shifts in migration rates when considering the assigned country of origin based on authors’ names rather than the previously used country of first academic affiliation. Our results show that this impact is exacerbated in the case of countries of immigration that have a more diverse academic workforce such as the USA, Australia, and Canada.

See extended abstract

 Presented in Session 9. Innovations with Internet and Consumer Data