Louise Françoise Contat and Wikipedia talk:Biographical metadata: Difference between pages

From Wikipedia, the free encyclopedia
(Difference between pages)
Content deleted Content added
redirect to duplicate article
 
→‎Gender: new section
 
Line 1: Line 1:
== Inspiration ==
#REDIRECT [[Louise Contat]]

After following discussions about biographical articles for several years, and noting the inefficiency with which biographical metadata was dealt with, I finally started this summary following a discussion about "death by age". See [http://en.wikipedia.org/w/index.php?title=Wikipedia:Deletion_review/Log/2008_October_7&diff=prev&oldid=244021361 here] for one summary of my views. I'm now going to start a subpage in my userspace to list the 126 links for "death by age" using "what links here". [[User:Carcharoth|Carcharoth]] ([[User talk:Carcharoth|talk]]) 00:09, 9 October 2008 (UTC)
:Just a suggestion, but you could make that list a subpage of this one, unless you're frowning on others editing it? - [[User:Jc37|jc37]] 00:42, 9 October 2008 (UTC)
::Others can edit it. It is at [[User:Carcharoth/People who died aged XX]]. But I want it to be tidy first. Well, And I also want to keep a record in my userspace. Maybe I'll copy it over. [[User:Carcharoth|Carcharoth]] ([[User talk:Carcharoth|talk]]) 00:52, 9 October 2008 (UTC)

== Discussion on gender metadata ==

Please see comment I made [[Wikipedia_talk:Categorization/Gender,_race_and_sexuality#Recording_the_gender_of_a_person|here]] about obtaining and recording gender (male/female) metadata. Opinions would be welcomed. Thanks. [[User:Carcharoth|Carcharoth]] ([[User talk:Carcharoth|talk]]) 02:05, 11 October 2008 (UTC)

== The value of numbers ==
You seem to be wanting a quantity, not a category?

(Statistics, without the actual need for grouping those which are part of the statistics.)

If there a way to do this besides using the whatlinkshere workaround, or categories? - [[User:Jc37|jc37]] 02:46, 11 October 2008 (UTC)

:Sort of. I also want to tidy things up where they have got messy. Not separating the music group articles and "double" or "group" biographies from the "single" biographies was a big mistake. Would be good if there was a way to go back and tidy up 500,000 articles and label them as "male" or "female" and "single biography" and "group biography" (even separating out the "double biography" as that is relatively common). I think some of this needs the semantic what's-its-name thingy. In general, it seems that when people want large stats, they either go to a database dump, an API query, run a bot (or request one), or scrabble around with categories and whatlinks here (I think I listed those in decreasing order of technical know-how). I may have missed a few options. There are instructions somewhere on how to extract Persondata at [[Wikipedia:Persondata]], but I've never worked that out. Instructions here on how to extract biographical data would be good. There are also ways to count numbers of articles in various places using AWB as well, I think. See the stats on my talk page ([[User_talk:Carcharoth#People_by_birth.2Fdeath|here]]): ''"there are currently 561,596 pages categorized by birth or death; 522,416 categorized by birth; 236,557 categorized by death date, excluding those categorized as living people; 307,892 categorized as living people"''. You will note the numbers don't ''quite'' add up (236,557 plus 307,892 is 544,449 (living and dead); 522,416 (birth categories); and 561,596 (birth and death categories). This does depend on exactly how that analysis was done and how the "unknown" and "missing" categories were treated. Compare also with the 543,994 total [[Wikipedia:Version 1.0 Editorial Team/Biography articles by quality statistics|here]]. There are various old discussions as well that have some data. It is almost certain that there are thousands of inconsistencies here that could be cleaned up. See the lists that [[User:Dsp13]] produced at [[User:Dsp13/Living people needing categorization by year of birth]], [[User:Dsp13/People needing categorization by year of birth]], and [[User:Dsp13/People needing categorization as living or by year of death]]. [[User:Carcharoth|Carcharoth]] ([[User talk:Carcharoth|talk]]) 03:26, 11 October 2008 (UTC)

== That Polbot stuff ==

I think this definitely qualified as biographical metadata stuff. In no particular order:
*[[User:Carcharoth/Polbot3 trial run]]
*[[User talk:Carcharoth/Polbot3 trial run]]
*[[Wikipedia:Bots/Requests for approval/Polbot 3]]
*[[User:Polbot/ideas/defaultsort]]
I think that's most of it. [[User:Carcharoth|Carcharoth]] ([[User talk:Carcharoth|talk]]) 03:31, 11 October 2008 (UTC)

==Random other discussions==
Did I ever say there was a lot of this sort of stuff out there? ;-)
*[[Template talk:Persondata/Removing data]]
*Archives of [[Template talk:WPBiography]]
Please add more as found or needed. [[User:Carcharoth|Carcharoth]] ([[User talk:Carcharoth|talk]]) 03:31, 11 October 2008 (UTC)

== Gender ==

Are you interested in ... the question I posed about gender metadata [[Wikipedia_talk:WikiProject_Categories#Discussion_on_gender_metadata|here]]. I was surprised to see that [[Wikipedia:Persondata]] doesn't include gender. Do the microformats include it? [...] [[User:Carcharoth|Carcharoth]] ([[User talk:Carcharoth|talk]]) 02:19, 11 October 2008 (UTC)

:[The above copied from my talk page, in order that conversation my be centralised here]

:The [[hCard [[microformat]] does not include a gender attribute; though [http://microformats.org/wiki/vcard-suggestions#Gender I proposed one] some time ago. I have also proposed the addition of gender to the [[vCard]] specification on which hCard is based, and [http://www.vcarddav.org/issues.xhtml#158 it seems likely that that will be adopted], though such big wheels grind with terminal slowness. If a gender property is added to vCard, it is almost certain that hCard will do so. In the interim, one microformat parser, Cognition, has [http://buzzword.org.uk/cognition/uf-plus.html#hcard adopted a gender property] on a trial basis, and there is no technical reason why that could not be included as such in infoboxes (or elsewhere on Wikipedia) where it would be parsed by Cognition, (and other parsers which might follow suit) and simply ignored by other microformat parsers, ensuring backwards compatibility. Whether editors would want to see "gender" in infoboxes is another matter.

:Bear in mind that there is a statistically-small edge-case, for people who are trans-gender, hermaphrodite, or some such. A third, "other", classification may suffice for these. [[User:Pigsonthewing|Andy Mabbett]] (User:Pigsonthewing); [[User talk:Pigsonthewing|Andy's talk]]; [[Special:Contributions/Pigsonthewing|Andy's edits]] 10:12, 11 October 2008 (UTC)

Revision as of 10:12, 11 October 2008

Inspiration

After following discussions about biographical articles for several years, and noting the inefficiency with which biographical metadata was dealt with, I finally started this summary following a discussion about "death by age". See here for one summary of my views. I'm now going to start a subpage in my userspace to list the 126 links for "death by age" using "what links here". Carcharoth (talk) 00:09, 9 October 2008 (UTC)

Just a suggestion, but you could make that list a subpage of this one, unless you're frowning on others editing it? - jc37 00:42, 9 October 2008 (UTC)
Others can edit it. It is at User:Carcharoth/People who died aged XX. But I want it to be tidy first. Well, And I also want to keep a record in my userspace. Maybe I'll copy it over. Carcharoth (talk) 00:52, 9 October 2008 (UTC)

Discussion on gender metadata

Please see comment I made here about obtaining and recording gender (male/female) metadata. Opinions would be welcomed. Thanks. Carcharoth (talk) 02:05, 11 October 2008 (UTC)

The value of numbers

You seem to be wanting a quantity, not a category?

(Statistics, without the actual need for grouping those which are part of the statistics.)

If there a way to do this besides using the whatlinkshere workaround, or categories? - jc37 02:46, 11 October 2008 (UTC)

Sort of. I also want to tidy things up where they have got messy. Not separating the music group articles and "double" or "group" biographies from the "single" biographies was a big mistake. Would be good if there was a way to go back and tidy up 500,000 articles and label them as "male" or "female" and "single biography" and "group biography" (even separating out the "double biography" as that is relatively common). I think some of this needs the semantic what's-its-name thingy. In general, it seems that when people want large stats, they either go to a database dump, an API query, run a bot (or request one), or scrabble around with categories and whatlinks here (I think I listed those in decreasing order of technical know-how). I may have missed a few options. There are instructions somewhere on how to extract Persondata at Wikipedia:Persondata, but I've never worked that out. Instructions here on how to extract biographical data would be good. There are also ways to count numbers of articles in various places using AWB as well, I think. See the stats on my talk page (here): "there are currently 561,596 pages categorized by birth or death; 522,416 categorized by birth; 236,557 categorized by death date, excluding those categorized as living people; 307,892 categorized as living people". You will note the numbers don't quite add up (236,557 plus 307,892 is 544,449 (living and dead); 522,416 (birth categories); and 561,596 (birth and death categories). This does depend on exactly how that analysis was done and how the "unknown" and "missing" categories were treated. Compare also with the 543,994 total here. There are various old discussions as well that have some data. It is almost certain that there are thousands of inconsistencies here that could be cleaned up. See the lists that User:Dsp13 produced at User:Dsp13/Living people needing categorization by year of birth, User:Dsp13/People needing categorization by year of birth, and User:Dsp13/People needing categorization as living or by year of death. Carcharoth (talk) 03:26, 11 October 2008 (UTC)

That Polbot stuff

I think this definitely qualified as biographical metadata stuff. In no particular order:

I think that's most of it. Carcharoth (talk) 03:31, 11 October 2008 (UTC)

Random other discussions

Did I ever say there was a lot of this sort of stuff out there? ;-)

Please add more as found or needed. Carcharoth (talk) 03:31, 11 October 2008 (UTC)

Gender

Are you interested in ... the question I posed about gender metadata here. I was surprised to see that Wikipedia:Persondata doesn't include gender. Do the microformats include it? [...] Carcharoth (talk) 02:19, 11 October 2008 (UTC)

[The above copied from my talk page, in order that conversation my be centralised here]
The [[hCard microformat does not include a gender attribute; though I proposed one some time ago. I have also proposed the addition of gender to the vCard specification on which hCard is based, and it seems likely that that will be adopted, though such big wheels grind with terminal slowness. If a gender property is added to vCard, it is almost certain that hCard will do so. In the interim, one microformat parser, Cognition, has adopted a gender property on a trial basis, and there is no technical reason why that could not be included as such in infoboxes (or elsewhere on Wikipedia) where it would be parsed by Cognition, (and other parsers which might follow suit) and simply ignored by other microformat parsers, ensuring backwards compatibility. Whether editors would want to see "gender" in infoboxes is another matter.
Bear in mind that there is a statistically-small edge-case, for people who are trans-gender, hermaphrodite, or some such. A third, "other", classification may suffice for these. Andy Mabbett (User:Pigsonthewing); Andy's talk; Andy's edits 10:12, 11 October 2008 (UTC)