Data

Lexicostatistical data (raw and derived text files) on 200 basic words in each of 95 Indoeuropean languages as collected/collated by Professor Isidore Dyen circa 1960

Charles Darwin University
Dr Joseph Kruskal (Aggregated by) Dr Paul Black (Aggregated by) Professor Isidore Dyen (Aggregated by)
Viewed: [[ro.stat.viewed]] Cited: [[ro.stat.cited]] Accessed: [[ro.stat.accessed]]
ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rfr_id=info%3Asid%2FANDS&rft_id=https://researchers.cdu.edu.au/en/datasets/lexicostatistical-data-raw-and-derived-text-files-on-200-basic-wo&rft.title=Lexicostatistical data (raw and derived text files) on 200 basic words in each of 95 Indoeuropean languages as collected/collated by Professor Isidore Dyen circa 1960&rft.identifier=https://researchers.cdu.edu.au/en/datasets/lexicostatistical-data-raw-and-derived-text-files-on-200-basic-wo&rft.publisher=Charles Darwin University&rft.description=This dataset formed the basis of the 1992 seminal work 'An Indoeuropean Classification: A Lexicostatistical Experiment' by Isidore Dyen, Joseph B Kruskal and Paul Black. The publication tested lexicostatistical methods against what was already known about Indoeuropean languages using more traditional methods. The dataset comprises three descriptive documents and six raw and derived data text files. The three descriptive documents provide background and publication information on the dataset. The six text files contain all the data as collected and collated by Isidore Dyen circa 1960. One file contains the data that was placed on punched cards circa 1970, and transferred to disc circa 1990. It gives cognation data among 95 Indoeuropean speech varieties. For each meaning in the list of 200 basic meanings the file contains the forms used in the 95 speech varieties collected by Isidore Dyen and the cognation decisions among these forms made by Dyen circa 1970. Other files contain the statistical matrices produced from the raw data to determine similarities/differences between languages in order to create a tree like structure of evolution of Indoeuropean languages. Virtual copy of the data available at wordgumbo http://www.wordgumbo.com/ie/cmp&rft.creator=Dr Joseph Kruskal&rft.creator=Dr Paul Black&rft.creator=Professor Isidore Dyen&rft.date=2013&rft.relation=https://researchspace.auckland.ac.nz/bitstream/handle/2292/10655/nature02029.pdf%3Fsequence%3D3&rft.relation=http://www.jstor.org/stable/1006517&rft.relation=http://trove.nla.gov.au/version/43282186&rft.relation=http://trove.nla.gov.au/version/12156786&rft.relation=http://trove.nla.gov.au/version/46695089&rft.coverage=Europe, Central Russian Eurasia, Indian Subcontinent, Iran, Afghanistan&rft.coverage=name=EUROPE, FR (Europe/Paris); east=3.9833; north=43.5333; projection=WGS84&rft.coverage=name=Caucasus Region [FNC PENDING April2000], GE (Asia/Tbilisi); east=45.0; north=42.0; projection=WGS84&rft.coverage=name=Indian Subcontinent, IN (Asia/Kolkata); east=76.97021; north=22.20775; projection=WGS84&rft.coverage=name=Pasgah-e Marzi-ye Afghanistan, AF (Asia/Kabul); east=60.9231; north=34.2892; projection=WGS84&rft.coverage=name=United States, US (null); east=-98.5; north=39.76; projection=WGS84&rft.coverage=name=Latin America and the Caribbean, BR (America/Rio_Branco); east=-72.59766; north=-8.05923; projection=WGS84&rft.coverage=name=Australia and New Zealand, AU (Australia/Brisbane); east=143.26172; north=-27.83908; projection=WGS84&rft_rights=Copyright held by Dr Paul Black (Prof Dyen and Dr Kruskal have both passed away.)&rft_rights=CC BY-NC-SA: Attribution-Noncommercial-Share Alike 3.0 AU http://creativecommons.org/licenses/by-nc-sa/3.0/au&rft_subject=Isidore Dyen&rft_subject=Joseph B Kruskal&rft_subject=lexicostatistics&rft_subject=Indoeuropean languages&rft_subject=comparative linguistics&rft_subject=genetic linguistics&rft_subject=language evolution&rft_subject=linguistic classification&rft_subject=Indo-European languages&rft_subject=Lexicography&rft_subject=LANGUAGE, COMMUNICATION AND CULTURE&rft_subject=LINGUISTICS&rft.type=dataset&rft.language=English Access the data

Please use the contact information below to request access to this data.

Contact Information

Street Address:
School of Education, Charles Darwin University, Ellengowan Dr, Casuarina NT 0810



Licence & Rights:

Other view details
Unknown

CC BY-NC-SA: Attribution-Noncommercial-Share Alike 3.0 AU
http://creativecommons.org/licenses/by-nc-sa/3.0/au

Copyright held by Dr Paul Black (Prof Dyen and Dr Kruskal have both passed away.)

Access:

Other view details

Open access. Soon to be available at CDU eSpace. Contact Dr Paul Black, Paul.Black@cdu.edu.au

Full description

This dataset formed the basis of the 1992 seminal work 'An Indoeuropean Classification: A Lexicostatistical Experiment' by Isidore Dyen, Joseph B Kruskal and Paul Black. The publication tested lexicostatistical methods against what was already known about Indoeuropean languages using more traditional methods. The dataset comprises three descriptive documents and six raw and derived data text files. The three descriptive documents provide background and publication information on the dataset. The six text files contain all the data as collected and collated by Isidore Dyen circa 1960. One file contains the data that was placed on punched cards circa 1970, and transferred to disc circa 1990. It gives cognation data among 95 Indoeuropean speech varieties. For each meaning in the list of 200 basic meanings the file contains the forms used in the 95 speech varieties collected by Isidore Dyen and the cognation decisions among these forms made by Dyen circa 1970. Other files contain the statistical matrices produced from the raw data to determine similarities/differences between languages in order to create a tree like structure of evolution of Indoeuropean languages. Virtual copy of the data available at wordgumbo http://www.wordgumbo.com/ie/cmp

Data time period: 1960 to 1990

This dataset is part of a larger collection

Click to explore relationships graph

3.9833,43.5333

3.9833,43.5333

45,42

45,42

76.97021,22.20775

76.97021,22.20775

60.9231,34.2892

60.9231,34.2892

-98.5,39.76

-98.5,39.76

-72.59766,-8.05923

-72.59766,-8.05923

143.26172,-27.83908

143.26172,-27.83908

text: Europe, Central Russian Eurasia, Indian Subcontinent, Iran, Afghanistan

dcmiPoint: name=EUROPE, FR (Europe/Paris); east=3.9833; north=43.5333; projection=WGS84

dcmiPoint: name=Caucasus Region [FNC PENDING April2000], GE (Asia/Tbilisi); east=45.0; north=42.0; projection=WGS84

dcmiPoint: name=Indian Subcontinent, IN (Asia/Kolkata); east=76.97021; north=22.20775; projection=WGS84

dcmiPoint: name=Pasgah-e Marzi-ye Afghanistan, AF (Asia/Kabul); east=60.9231; north=34.2892; projection=WGS84

dcmiPoint: name=United States, US (null); east=-98.5; north=39.76; projection=WGS84

dcmiPoint: name=Latin America and the Caribbean, BR (America/Rio_Branco); east=-72.59766; north=-8.05923; projection=WGS84

dcmiPoint: name=Australia and New Zealand, AU (Australia/Brisbane); east=143.26172; north=-27.83908; projection=WGS84

Identifiers