I'm working on a new feature in the SmartCopy extension that will run a consistency & plausibility check automatically against the immediate family of a focus profile on Geni. It will then inject a message box into the Geni page when issues are detected. My intent is to enable it by default and it will not require Curator authorization before working. Here is my in-work example: https://media.geni.com/p13/84/2b/00/31/53444843794dbe13/screen_shot...
If you have ideas for rules (error, warn, info), let me know. :)
Um... interesting features will be:
- Expand the consistency to all people being followed
- Stores results and decissions (for example, discard some inconsistencies and being stored, so they do not popup again and again).
- Being able to visualize the complete list of incosistencies
- Marriage is at least after 14 years old (just in case...)
- All places recorded match standard format.
- All dates are recorded correctly (sometimes dates are not correctly stored... I found this)
- Lexical consistency of names (this can be complicated)
Jeff,
Please keep it takes about 9 months for a child to be born, so a father could die before the child is born.
Some other checks:
Child is born after mother has died
2 children within 9 months
Child is born before or after ending of marriage
Child is born before mother is 14 years old
Child is born when mother is older then .. (45?)
Child is born before father is 12 year old
Multiple partners at the same time
Mother has more then .. (20?) children
Age greater then 115
Enough overlap in age for marriage?
Marriage age probably should be time/region depended
Marriage between near family (may be more likely for royalty)
Little more complicated (probably also time/region depended):
Not enough / too many generations without dates between profiles that have dates
Noticing extreme cases may be possible without looking at time/region
Dan also mentioned it as a possible separate extension, and maybe if it grows into something that's large enough to be independent, but for now, I'm planning to just have it as part of SmartCopy.
1) SmartCopy has 944 weekly users, according to Google. That's a large user base that would make instant use of consistency checks on Geni, instead of starting over and rebuilding a separate user base.
2) It's easier to develop because I already have the core code in SmartCopy.
3) Maybe the consistency features will bring in new users that will utilize the main features of the extension to build the world tree.
Thanks Job - My in-work code is considering the pregnancy term and I'll work to include those additional rules.
ValentÃn, at this time, I think it's only feasible for this tool to consider the immediate family of a profile. Scanning multiple generations would involve too many queries and be too time consuming for something that is intended to be dynamically integrated into the webpage. Scanning all followed profiles would need something that could be run offline. I also think it would be infeasible to "Stores results and decisions (for example, discard some inconsistencies and being stored, so they do not popup again and again). " I don't have the storage and I don't think it would be good to store it in the browser storage.
The following are the consistency checks performed by MyHeritage Family Tree Builder.
The missing numbers are conditions that were not encountered during the check of my Geni extraction.
03 – Alive but too old
05 – Parent too young
06 – Parent too old
07 – Child born after death of parent
08 – Fact occurring after death
09 - Fact occurring before birth
12 – Siblings age (too close)
13 – Siblings with same first name
15 – Descendant – Ancestor age mismatch
17 – Large spouse age difference
19 – Married too young
20 – Too young to be spouse
21 – Inconsistent placename spelling
22 – Place name resembles date
25 – Inconsistent last name spelling
28 – Maiden name similar to married name
Maiden name 'Falvey' of Mary Falvey is similar to the last name of her spouse Thomas Falvey
29 – Double spaces in name
32 – Suffix in first name
First name of Gladys I O'Connor ends with the suffix 'I', which should be moved to the separate Suffix field
34 - Alias in first name
First name of Mervyn "Ned" Ellem includes "Ned", which should be moved to the separate Alias field
36 – Siblings have different last names
37 – Incorrect use of uppercase/lowercase
Private User you can store locally in the user's browser ( I think the class is LocalStorage). User shall be aware that his work is only applicable to one browser, one computer. Doing things smartly he can copy/paste the result.
Or... you can offer the possibility to copy/paste between browser/computers
I placed an option to close the check with an "X" on the right. In my current version, that just hides it until you refresh or update the profile. As Valentin mentioned, I can use local storage in the browser, but this storage is limited (usually 5-10mb) and could impact performance.
I'm wondering if I should make the close more persistent. So that when you click "X", it will no longer consistency check those profiles. I guess the problem with this though is that if you want to undo that, I have to create some ability to clear it.
What are your thoughts on the temporary or persistent closing of the checks for profiles that indicate issues?
Would these be over-rideable (that's not a word but you must know what I mean)?
For example previous owners of my house visited me and they mentioned that their mother's maiden name was the same as her married name although, to their knowledge they were not related and came from different parts of the country.
Historically there are examples of marriage of and indeed births to a 12 year old girl.
Large spouse age difference too will have considerable variability.
Terry Jackson (Switzer), I am considering making the values something you can change. So if you want to decrease the age warning from 105 to 100, or increase it to 115. I'm not sure that will be available in the first release, but I'm keeping it in mind. It's more a matter of reorganizing the SmartCopy configuration so that you're not lost in various options and can easily distinguish options for copying data from those that control consistency. Initially though, I'd like to figure out some good default values.
A further extract produced more checks from FTB...
01 - Birth after death
02 - Died too old
04 - Child older than parent
10 - Death date resembles cause of death
11 - Death place resembles cause of death
14 - Descendant older than ancestor
16 - Ancestor of himself
If using FTB ensure you use version 7. Version 8 is full of bugs and is not compatible with Geni.
Thinking a bit about the UI, if embedded in the SmartCopy extension: (which, I agree, may be a way to 'pull in' more active Tree Builders as well as creating a whole new set of "Tree Quality Auditors") ....
If there are two 'tabs' (Smart Copy vs. Consistency Checks) in the configuration panel, maybe have a top-level 'disable' tick-box for the consistency checks so it can be disabled if one temporarily is having performance issues in accessing the "SmartCopy Host".
Probably something should change on the 'icon' when the consistency checking is enabled / disabled so one doesn't forget they turned it off the day before because of performance issues.
re: persistent "closing" of consistency checking.
That's a tough issue, I think.
-- in the long run, it should be part of the Geni database, similar to a data conflict ("yes, the age is over 125, but that's known based on the best references") ... and maybe probably really could be best as an "acknowledgement" of the potential issue (e.g. "Spouse age difference > X years").
-- barring that for the present time, maybe it should be a user-config option to store "close/acknowledge" info locally? Thus, if I'm on a brain-dead (RAM limited) machine, I might turn of the local persistence, but on another machine I'll turn it on. (Turning off, then back on, could act to "clear it".)
-- version 1 of the over-ride/inhibit probably should start a just the per-profile level, not all the detail items for each profile!
-- the 'possible issues' injection should work on Profile as well as Tree Views, I'd hope.
Hmmm ... for some "reference" pages, it'd be really nice to "flag" some inconsistencies on the Smart-Copy panel itself, so one can perhaps get a better sense of whether the "data-about-to-be-copied" might degrade the quality of the Geni profile(s).
-- these might include items such as "will reduce the detail of dates", "fewer location details fields", etc.
-- Maybe, in the SmartCopy panel, the textual description of the 'potential issue' could be in a "tool-tip" kind of hover popup?
(more later on dates ... got to think about that a bit more ...)
I don't expect consistency checks will cause performance issues with the site as it runs asynchronously. If the SmartCopy server is having issues, you just won't see any warnings. It's not a persistent message box. It only becomes visible when an issue is identified in the family.
"the 'possible issues' injection should work on Profile as well as Tree Views, I'd hope." - Yup it does.
The list of checks is alrady extensive enough that if we run all of them, some of them will be triggered on (I think) at least one out of 50 profiles (for example siblings with different birth names).
This means that we definitely need a way to say "no, this is OK, I've checked the weird stuff, and I don't want to be warned any more".
What's more, we need a way to tell *others* that "this is OK, don't worry". Which means server-side storage.
Checks that rarely trigger falsely (age > 125) are fine to run all the time. But we might want to delay tests that trigger frequrently (siblings with differing last names) until we have server-side storage of "this is checked, don't worry" flags.
Along the lines of Harald's comments:
I was thinking that the "configuration options" might have a 'radio-button' selection of when-to-run:
-- "run always",
-- "never run",
-- and a third option of either "run delayed" or "run on demand".
The "run on demand" could be akin to the SmartCopy "Submit", in that it runs / displays the results of those evaluations when the user 'clicks' something. That group could (by default) include those items where it's not unusual to have "false positives", whereas the "run always" group could trigger the HTML insertion banner / icon change.
To add an additional layer of complexity (<smile> it's what I do, sometimes, before simplifying!) ...
We'd discussed having user-modifiable customization of the checks (e.g.: age > ZZ years) ... it could even be useful to have two or even three "columns" of those customizing values, where the selection of the appropriate "column" is done based on the birth/death year of the focus profile. In that way, one could have rather "tight" checks for, say post 1700's, somewhat looser checks between then and 600 CE, and even looser checks pre-600's (with the user being able to change those "selection years", of course).
And ... ... the associated 'when-to-run' would also be in each "column".
That way I could have a rule which is always evaluated in the modern era, but that same rule might never be run in "Biblical times" (even when I click the "run-extra checks" button).