MaNIS Georeferencing Discussion
Archive
Following are extracts of the Georeferencing Listserv discussions accumulated during the MaNIS georeferencing project. Missing postings were not relevant to georeferencing in perpetuity. Messages have been edited to protect the guilty by masking names of individuals with XXXXXX.
>>> Posting number 1, dated 17 Jul 1999 14:12:50
-----------------------------------------------------------------------------
>>> Posting number 2, dated 17 Jul 1999 14:15:23
-----------------------------------------------------------------------------
>>> Posting number 3, dated 17 Jul 1999 14:16:03
-----------------------------------------------------------------------------
>>> Posting number 4, dated 17 Jul 1999 14:19:25
------------------------------------------------------------------------=
-----
>>> Posting number 5, dated 17 Jul 1999 14:19:59
-----------------------------------------------------------------------------
>>> Posting number 6, dated 17 Jul 1999 14:26:41
-----------------------------------------------------------------------------
>>> Posting number 7, dated 17 Jul 1999 14:22:50
-----------------------------------------------------------------------------
>>> Posting number 8, dated 17 Jul 1999 14:23:12
-----------------------------------------------------------------------------
>>> Posting number 9, dated 19 Jul 1999 09:29:01
----------------------------------------------------------------------------
--------------------
>>> Posting number 10, dated 23 Jul 1999 16:35:41
>>> Posting number 11, dated 3 Sep 1999 16:17:55
>>> Posting number 12, dated 17 Sep 1999 15:19:38
>>> Posting number 13, dated 17 Sep 1999 13:13:14
>>> Posting number 14, dated 17 Sep 1999 14:57:30
>>> Posting number 15, dated 20 Sep 1999 09:04:17
>>> Posting number 16, dated 24 Sep 1999 17:01:21
>>> Posting number 17, dated 28 Sep 1999 12:50:27
>>> Posting number 18, dated 15 Oct 1999 19:37:37
>>> Posting number 19, dated 17 Oct 1999 16:37:27
>>> Posting number 20, dated 18 Oct 1999 16:50:30
>>> Posting number 21, dated 19 Oct 1999 11:15:26
>>> Posting number 22, dated 19 Oct 1999 16:35:19
>>> Posting number 23, dated 20 Oct 1999 15:51:18
>>> Posting number 24, dated 20 Oct 1999 11:34:55
>>> Posting number 25, dated 20 Oct 1999 16:00:18
>>> Posting number 26, dated 10 Nov 1999 10:52:01
>>> Posting number 27, dated 10 Nov 1999 13:54:04
>>> Posting number 28, dated 17 Nov 1999 15:12:19
>>> Posting number 29, dated 18 Nov 1999 12:38:15
>>> Posting number 30, dated 18 Nov 1999 10:08:56
>>> Posting number 31, dated 18 Nov 1999 13:22:25
>>> Posting number 32, dated 19 Nov 1999 14:35:52
>>> Posting number 33, dated 3 Dec 1999 10:21:24
>>> Posting number 34, dated 3 Jan 2000 11:48:10
>>> Posting number 35, dated 3 Jan 2000 16:24:25
>>> Posting number 36, dated 18 May 2000 16:51:23
>>> Posting number 37, dated 18 May 2000 19:49:29
>>> Posting number 38, dated 23 May 2000 18:41:45
>>> Posting number 39, dated 24 May 2000 09:38:19
--------------------------------------------------------
---------------------
>>> Posting number 40, dated 24 May 2000 12:15:39
>>> Posting number 41, dated 12 Jun 2000 15:45:50
>>> Posting number 42, dated 13 Jun 2000 09:31:26
>>> Posting number 43, dated 13 Jun 2000 09:59:02
>>> Posting number 44, dated 13 Jun 2000 09:17:08
>>> Posting number 45, dated 13 Jun 2000 07:49:43
>>> Posting number 46, dated 13 Jun 2000 09:04:22
>>> Posting number 47, dated 13 Jun 2000 08:54:22
>>> Posting number 48, dated 13 Jun 2000 11:11:31
>>> Posting number 49, dated 13 Jun 2000 13:23:46
>>> Posting number 50, dated 30 Jun 2000 16:25:38
>>> Posting number 51, dated 30 Jun 2000 17:14:31
>>> Posting number 52, dated 30 Jun 2000 23:29:35
>>> Posting number 53, dated 1 Jul 2000 07:35:15
>>> Posting number 54, dated 4 Jul 2000 11:04:23
>>> Posting number 55, dated 4 Jul 2000 10:07:33
>>> Posting number 56, dated 6 Jul 2000 00:00:0/
>>> Posting number 57, dated 5 Jul 2000 19:40:11
>>> Posting number 58, dated 5 Aug 2000 09:24:55
>>> Posting number 59, dated 5 Aug 2000 12:31:07
>>> Posting number 60, dated 7 Aug 2000 13:45:33
>>> Posting number 61, dated 15 Aug 2000 21:54:23
>>> Posting number 62, dated 23 Aug 2000 16:24:48
>>> Posting number 63, dated 30 Aug 2000 11:20:17
>>> Posting number 64, dated 22 Sep 2000 09:36:34
>>> Posting number 65, dated 29 Sep 2000 08:51:23
>>> Posting number 66, dated 2 Oct 2000 10:35:12
>>> Posting number 67, dated 5 Oct 2000 09:40:24
>>> Posting number 68, dated 17 Oct 2000 18:13:33
>>> Posting number 69, dated 1 Nov 2000 07:48:24
>>> Posting number 70, dated 1 Nov 2000 08:06:24
>>> Posting number 71, dated 28 Nov 2000 18:26:18
>>> Posting number 72, dated 29 Nov 2000 21:09:35
>>> Posting number 73, dated 30 Nov 2000 08:31:10
>>> Posting number 74, dated 30 Nov 2000 11:33:07
>>> Posting number 75, dated 14 Dec 2000 20:41:28
>>> Posting number 76, dated 15 Dec 2000 07:59:04
>>> Posting number 77, dated 26 Apr 2001 09:00:01
>>> Posting number 78, dated 16 May 2001 18:29:45
>>> Posting number 79, dated 16 May 2001 17:36:59
>>> Posting number 80, dated 18 May 2001 08:29:49
>>> Posting number 81, dated 24 May 2001 10:19:20
>>> Posting number 82, dated 25 May 2001 09:43:37
>>> Posting number 83, dated 11 Jun 2001 12:01:03
>>> Posting number 84, dated 11 Jun 2001 15:02:51
>>> Posting number 85, dated 11 Jun 2001 15:44:56
>>> Posting number 86, dated 29 Jun 2001 21:12:37
>>> Posting number 87, dated 4 Jul 2001 14:24:24
Date: Wed, 4 Jul 2001 14:24:24 -0700
Reply-To: "Mammalogy Z39.50 Network (Private)" <MAMMAL-Z-NET@USOBI.ORG>
Sender: "Mammalogy Z39.50 Network (Private)" <MAMMAL-Z-NET@USOBI.ORG>
From: John Wieczorek <tuco@SOCRATES.BERKELEY.EDU>
Subject: Re: ROM higher geography
In-Reply-To: <sb433743.076@romfs7.rom.on.ca>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
I'm posting the following exchange to the list because there is information
contained herein that is relevant to everyone. The basic concepts of data
cleanliness, the gazetteer, and data updates are addressed in brief.
>Once I began working on the Bukedi inconsistency (2nd in your list) I saw
>that your methodology is missing many more errors/inconsistencies that
>exist in County and Province data.
Understood. My analysis reveals only the duplicates of
ORCT+ORCRY+ORPR+ORCY
I understand that there may be many other errors and inconsistencies in the
original data, but that is not a concern for the gazetteer. In fact, the
duplicates I pointed out aren't a problem either. I just wanted to alert
you to them since they came out in my analysis.
> The errors and inconsistencies are a direct reflection of the state of
> documentation on field catalogues or specimen cards, depending on the
> source of the automated record. We did not have the resources at the
> time of automation (nor do we now for that matter) to resolve what is a
> "Province" term and what is a "County" term for all
> countries. Additionally, we are looking at historical data that may no
> longer be reflected in the current political reality of our little world
> (e.g.,
> are used routinely to manage the collection and retrieve data. Continent
> and Country should be clean. The Province field should be clean for
>
> just finished cleaning up the Province field for
> County field should be clean for
> frequency listings for Country etc. for these priority sections of the db
> (and collection) in an effort to maintain the consistency of our
> data. For all other geographic locations, Province and County are not
> used for managing the collection, so the data clean up or enhancement has
> been a low priority. This is an ongoing situation that I have discussed
> with Judith with regard to the Manis Project. My understanding is that
> funding for documentational and staffing resources will be part of this
> "mission". I am afraid your listing of 13 inconsistencies barely
> scratches the surface of the data cleaning that is required and even more
> importantly, misses all kinds of erroneous or missing data. I currently
> do not have the maps, atlases, or gazetteers nor the staff/time to
> undertake this project which from a collections' perspective is of low
> priority. To do a proper job I cannot resolve all of the problems that
> you have identified without undertaking a full review of the entire
> country's data.
There is no requirement for any standard of cleanliness. It is my hope that
errors and inconsistencies will be noted during georeferencing and
forwarded to the attention of the institutions as a part of that
process. The tools are meant to identify the inconsistencies, not to
remedy them. What the institutions do with these notes is entirely up to them.
>I am not sure what you are currently attempting to do with the data so we
>may need to further discuss our respective needs to insure that we are not
>working at cross purposes. If work is to be globally undertaken, I would
>like our data to be the db of record - making long lists of changes for
>you to then repeat is a waste of effort and time; you will see the work
>generated by having two dbs of record by the simple changes that I have
>made this afternoon. Also, errors in interpretation or typos that are
>bound to occur should be avoided. Finally, the data you have is already
>out of date, since changes are made by me on a daily basis as errors etc.
>are encountered during the normal activities of managing the collection,
>fulfilling data requests, etc.
The institutional databases will always be the database of record. The
data I have from all of the institutions is just a snapshot, to be used for
georeferencing. I will not ask for these data again during the project, nor
will I make changes to the data I have received. When we have a network,
the gazetteer will be created and updated automatically whenever data
change and the snapshot will be obsolete. I've only created the snapshot
so that we have combined data to work with. When people begin to do
georeferencing using the gazetteer they will not change the data - they
will only make commentaries. Even the latitude and longitude are
commentaries in a sense. It is up to each institution to accept or reject
the commentaries and make changes based on them in its database.
>Regards,
>
> >>> John Wieczorek <tuco@socrates.Berkeley.EDU> 07/02/01 08:50PM >>>
>Attached is a tab-delimited file with the first row containing column
>headings. The contents of the file are combinations of higher geographic
>fields for which you have more than one interpretation in your
>database. The first field (highergeog) is a concatenation of the fields of
>higher geography that reveal duplication. The second field (geogid) is an
>identifier unique to the ROM higher geography data with one row for every
>unique combination of ORCT, ORCRY, ORPR, and ORCY. As you can see by the
>rows in the table, there are 13 places for which there are inconsistent
>placements of county vs. province, for example. It is not critical for my
>purposes to have these resolved, but since I noticed them I thought I might
>as well tell you. If you do make changes to these combinations, let me
>know which are correct and I'll do so on this end as well.
>>> Posting number 88, dated 10 Jul 2001 12:01:24
Date: Tue, 10 Jul 2001 12:01:24 -0700
Reply-To: Mammal Networked Information System <MAMMAL-Z-NET@USOBI.ORG>
Sender: Mammal Networked Information System <MAMMAL-Z-NET@USOBI.ORG>
From:
Subject: cave localities
Mime-version: 1.0
Content-type: text/plain; charset="US-ASCII"
Content-transfer-encoding: 7bit
I've noticed that the USGS GNIS web site does not give information on cave
sites. (It does give locations of variants such as Boulder Cave
Campground.) Is this a protocol we wish to follow? Are there other web
sites that do list cave localities? What do you think?
Cheers,
>>> Posting number 89, dated 10 Jul 2001 13:40:25
Date: Tue, 10 Jul 2001 13:40:25 -0700
Reply-To: Mammal Networked Information System <MAMMAL-Z-NET@USOBI.ORG>
Sender: Mammal Networked Information System <MAMMAL-Z-NET@USOBI.ORG>
From: John Wieczorek <tuco@SOCRATES.BERKELEY.EDU>
Subject: Filtering data
In-Reply-To: <sb4b0d4a.070@romfs7.rom.on.ca>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
This message is in reply to a comment about
records for captive animals.
>I would recommend that you do not use any captive records for a
>gazetteer. Does that make sense?
In a restricted view of the utility of a gazetteer it does make sense to
exclude them. However, it is actually easier to include them, yet have them
flagged. This has the benefit that one can filter on the captive attribute.
This could be useful if you wanted to do a quick query of only captive
animals as well as for a query in which you want to leave them out. The
philosophy in general will be to have a home for all data that anyone deems
useful, yet to allow each institution to decide which data it will provide
through the filters implemented during migration.
A filter might do any one of the following:
1) exclude attributes altogether (e.g., not show a "CaptiveFlag" field)
2) exclude records based on the value of an attribute (e.g., not show
records of endangered species)
3) exclude certain values of an attribute (e.g., not show localities for
endangered species)
4) substitute a surrogate value for an attribute of a certain value (e.g.,
instead of showing locality with lat-long, show only county-level and
higher geography for endangered species)
These are just a few examples of what might be done at one institution, and
may vary between institutions. I encourage the participant's to discuss
these issues, and to begin to make institutional decisions about filtering
rules when it comes time to set up the migration. The rules must be
clearly defined before I begin to create the creation scripts - I can't
afford to stay at any given institution (except maybe Hawaii, heh heh),
while the rules are being hashed out.
>>> Posting number 90, dated 8 Aug 2001 13:10:05
>>> Posting number 91, dated 14 Sep 2001 08:48:17
>>> Posting number 92, dated 23 Sep 2001 17:24:24
>>> Posting number 93, dated 24 Sep 2001 20:07:31
Date: Mon, 24 Sep 2001 20:07:31 -0700
Reply-To: Mammal Networked Information System <MAMMAL-Z-NET@USOBI.ORG>
Sender: Mammal Networked Information System <MAMMAL-Z-NET@USOBI.ORG>
From: John Wieczorek <tuco@SOCRATES.BERKELEY.EDU>
Subject: Georeferencing Guidelines
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Dear All,
Now that we are officially up and running I would like to provide the first
of two documents on the MaNIS collaborative georeferencing effort. This
first document is meant to open for discussion the issues associated with
turning specific locality descriptions into well-documented latitudes and
longitudes. The document does not explain what tools to use, or how to use
any of them - that will be in a forthcoming document. Instead, this
document focuses on the "theoretical aspects" of the task, our methods and
assumptions, upon which it would be helpful for us all to agree. To that
end, please read the Georeferencing Guidelines page, accessible from the
Documents page on the MaNIS website (see below). Comment by sending
messages to MAMMAL-Z-NET@USOBI.ORG. Let's try to get through this
discussion by 6 Oct.
http://dlp.cs.berkeley.edu/manis/Documents.html
Anticipating your enthusiastic participation,
John Wieczorek
>>> Posting number 94, dated 25 Sep 2001 18:30:16
Date: Tue, 25 Sep 2001 18:30:16 -0700
Reply-To: Mammal Networked Information System <MAMMAL-Z-NET@USOBI.ORG>
Sender: Mammal Networked Information System <MAMMAL-Z-NET@USOBI.ORG>
From: John Wieczorek <tuco@SOCRATES.BERKELEY.EDU>
Subject: Georeferencing text, for reference
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Dear All,
It was pointed out to me that it might be prudent to have a text-only copy
of the document, with line numbers, to which everyone can refer in
discussions. I am including the full text of the GeorefGuide.html file
below for that purpose. The page itself can be found at the following URL:
http://dlp.cs.berkeley.edu/manis/GeorefGuide.html
1 MaNIS
2 The Mammal Networked Information System
3
4 John Wieczorek
5 24 September 2001
6 _________________________________________________
7
8 Georeferencing Guidelines
9
10 This document contains information about assigning geographic
11 coordinates and maximum errors for those coordinates to specific
12 locality descriptions. This document does not attempt to
13 describe the tools and methods for finding named places on maps
14 or gazetteers. The process of assigning coordinates and errors,
15 called georeferencing, can be rather complicated. The complexity
16 of the process can be greatly reduced and the consistency of the
17 results can be greatly increased by establishing simple
18 guidelines that cover most commonly encountered locality
19 descriptions. The guidelines for assigning coordinates for named
20 places are presented with examples in the section Determining
21 Latitude & Longitude.
22
23 There are several fundamental sources of error for specific
24 locality descriptions, and these vary in magnitude. It is
25 essential during georeferencing to determine and record the
26 greatest source of error among all possible sources. There are
27 numerous ways in which the maximum error of a geographic
28 coordinate might be expressed, but the most convenient is as a
29 distance, because its size and shape are constant over any
30 geodetic surface model. The sources of error and their
31 magnitudes are discussed primarily in the section Determining
32 Error.
33
34 An Appendix containing a description of the data that should be
35 captured for each georeferenced locality, a glossary, and
36 references are appended for the convenience of the reader.
37
38 Determining Latitude & Longitude
39
40 Geographic coordinates can be expressed in a number of different
41 coordinate systems (e.g. decimal degrees, degrees minutes
42 seconds, degrees decimal minutes, UTM, etc.). Conversions can be
43 made readily between coordinate systems, but decimal degrees
44 provide the most convenient coordinates to use for
45 georeferencing for no more profound a reason than that a
46 specific locality can be described with only two attributes
47 decimal latitude and decimal longitude.
48
49 Named Places
50
51 The simplest of specific locality descriptions consist of only a
52 named place. Use the geographic center of a named place for the
53 latitude and longitude, and use the distance from that point to
54 the furthest point within that named place for the maximum error
55 distance. If the geographic center of the named place is not
56 within the confines of the shape of the named place, use the
57 point nearest to the geographic center that lies within the
58 shape.
59
60 Example: "Bakersfield"
61
62 Township Range Section (TRS) descriptions are essentially no
63 different from that of any other named place. It is necessary to
64 understand how TRS descriptions work and how they describe a
65 place. See the References section, below, for links to TRS
66 information.
67
68 Example: "E of Bakersfield, T29S R29E Sec. 34 NE 1/4"
69
70 Offsets
71
72 Offsets generally consist of combinations of distances and
73 directions from a named place. Use the geographic center of the
74 named place in the direction of the offset as a starting point.
75 Unless there is contrary information in the locality
76 description, measure the distance in the offset direction to
77 find the spot for the geographic coordinates. Offsets that do
78 not explicitly say that they were measured by air or by some
79 contour (e.g., by road, river, valley, etc.) should be
80 determined as if by air in a straight line.
81
82 Example: "10 mi E (by air) Bakersfield"
83
84 Example: "10 mi E of Bakersfield"
85
86 However, if there is no mention of the mode of measurement in
87 the locality description, but the measurement includes fractions
88 (e.g., 10.2 miles) and there is a road in the vicinity, use road
89 miles. Offsets that were described in the specific locality as
90 being measured by road should be determined using the contours
91 of the road rather than using a straight line. The methods for
92 determining the maximum error distances for these types of
93 specific locality descriptions are given in the Determining
94 Error section, below.
95
96 Example: "10.2 mi E of Bakersfield"
97
98 Example: "13 mi E (by road) Bakersfield"
99
100 Vagueness
101
102 At times, specific locality descriptions are fraught with
103 vagueness. It is not the purpose here to belittle localities of
104 this type; in fact, an honest admission of the unknown is
105 preferable to masking it with unwarranted precision.
106
107 The most important type of vagueness in a specific locality
108 description is one in which the locality is in question. No such
109 locality should be georeferenced.
110
111 Example: "Bakersfield?"
112
113 Many locality descriptions imply an offset from a named place
114 without definitive directions or distances. Use the geographic
115 center of the named place for the geographic coordinates. For
116 the maximum error distance, use the greatest distance that is
117 not likely to be considered in the area of another named place.
118 Clearly there is a measure of subjectivity involved here. Let
119 common sense prevail and document the assumptions made.
120
121 Example: "near Bakersfield"
122
123 Sometimes offset information is vague either in its direction or
124 in its distance. If the direction information is vague, record
125 the geographic coordinates of the center of the named place and
126 add the offset distance to the greatest extent of the named
127 place to get the maximum error distance.
128
129 Example: "5 mi from Bakersfield"
130
131 Uncertainty in the offset distance is a fact of the business.
132 Almost no localities are recorded with error estimates,
133 therefore every offset distance is inherently uncertain. The
134 addition of a modifier in the locality description, while an
135 honest observation, should not change the determination of the
136 geographic coordinates or of the maximum error.
137
138 Example: "about 3 mi E of Bakersfield"
139
140 The worst of situations arises when a specific locality
141 description is internally inconsistent. There are numerous
142 possible causes for inconsistencies. It is the task of the those
143 georeferencing to determine the part of the description most
144 likely to be in error, ignore it for the purpose of the
145 determination, and document the decision to do so. The most
146 common source of inconsistency in a locality description comes
147 from trying to match elevation information with the rest of the
148 description. If there is no reasonable way to reconcile the
149 discrepancy, ignore the elevation.
150
151 Example: "10 mi W of Bakersfield, 6000 ft"
152
153 Determining Error
154
155 The process of georeferencing includes an assessment of the
156 possible sources of error in a geographic coordinate
157 determination. Errors may arise due to the extent of a locality,
158 due to unspecified precision in original measurements (distance
159 precision and directional precision), or due to not knowing the
160 datum under which coordinates were determined. It is essential
161 to determine which of these yields the greatest error and record
162 that value as the maximum error distance. Potential error
163 sources and guidelines for determining the magnitude of each for
164 a given specific locality are given in the paragraphs below.
165
166 Error due to the shape of a locality
167
168 Named places are not single points; they have extents. If a
169 locality description is no more specific than to describe a
170 named place or an offset from a named place, then the size of
171 the named place is a source of error. The treatment of error due
172 to the extent of a locality is described under the examples of
173 determining latitude and longitude, above.
174
175 Error due to a unknown datum
176
177 Seldom have geographic coordinates been recorded for a locality
178 in a natural history collection in which the underlying datum of
179 the coordinate system was given. Even now, when GPS coordinates
180 are being taken as definitive evidence of a location, the
181 geodetic datum is being ignored. Without recording the datum
182 with the coordinates, potential accuracy is being lost. Figure 1
183 shows the magnitude of error (in meters) over North America
184 based on not knowing the datum from which the coordinates were
185 taken.
186
187 [datumerror.jpg]
188
189 Figure 1. Map of North America showing the magnitude of
190 potential error from not knowing whether coordinates were taken
191 from NAD27, NAD83, or WGS84.
192
193 This map can be used as a rough guide for determining the
194 magnitude of error due to not knowing the datum from which the
195 geographic coordinates were recorded.
196
197 Precision
198
199 Precision is difficult to gauge from specific locality
200 descriptions; it may be reflected in the locality description,
201 but it is seldom, if ever, explicitly recorded. Furthermore, a
202 database record may not reflect, or may reflect incorrectly, the
203 precision inherent in the original measurement, especially if
204 the locality description has undergone interpretation from the
205 verbatim original description. Precision issues arise from both
206 distance measurements and directions in a locality description.
207 Potential errors from each of these sources are discussed in the
208 paragraphs below.
209
210 Error associated with distance precision
211
212 Distance may be recorded in a specific locality description with
213 or without significant digits, and those digits may or may not
214 be warranted. A conservative way to insure that distance
215 precision is not inflated is to treat distance measurements as
216 integers with fractional remainders. Thus 10.25 becomes 10 1/4,
217 10.5 becomes 10 1/2, etc. Calculate the error for these distances
218 based on the fractional part of the distance, using 1 divided by
219 the denominator of the fraction.
220
221 Example: "10.5 mi N of Bakersfield" Fraction is 1/2, error should
222 be 0.5 mi.
223
224 Example: "10.6 mi N of Bakersfield" Fraction is 1/10, error
225 should be 0.1 mi.
226
227 Example: "10.75 mi N of Bakersfield" Fraction is 3/4, error should
228 be 0.25 mi.
229
230 If the distance is an integer, use an error of one unit.
231
232 Example: "10 mi N of Bakersfield" Error should be 1 mi.
233
234 Error associated with directional precision
235
236 Direction is almost always expressed in specific locality
237 descriptions using cardinal and intercardinal directions rather
238 than degree headings. A conservative interpretation of these
239 directions allows for an error of 22.5 degrees to either side of
240 the recorded direction. Thus, ENE can be any direction between E
241 and NE, while NE can be any direction between ENE and NNE.
242
243 [directionerror.jpg]
244
245 The error distance resulting from imprecision in direction
246 increases with increasing offset distance. In fact the error
247 distance due to directional imprecision is 0.4142 times the
248 offset. Note, however, that when a locality description uses two
249 offsets based on cardinal directions (e.g., 1 mi N and 3 mi E of
250 Bakersfield), the distances and directions are likely to have
251 been measured on a map. In this case, directional imprecision
252 should be ignored.
253
254 Appendix
255
256 Geographic Coordinate Data
257
258 Following are the essential attributes to be captured for each
259 locality while georeferencing.
260
261 Decimal_Latitude - the latitude coordinate (in decimal degrees) at
262 the center of a circle encompassing the whole of a specific
263 locality. Convention holds that decimal latitudes north of the
264 equator are positive numbers less than or equal to 90, while
265 those south are negative numbers greater or equal to 90.
266 Example: -42.51 degrees (which is the same as 42d 30' 36" S).
267
268 Decimal_Longitude - the longitude coordinate (in decimal degrees)
269 at the center of a circle encompassing the whole of a specific
270 locality. Decimal longitudes west of the Greenwich Meridian are
271 considered negative and must be greater than or equal to 180,
272 while eastern longitudes are positive and less than or equal to
273 180. Example: -122.49 degrees (which is the same as 122d 29' 24"
274 W).
275
276 Maximum_Error_Distance - the upper limit of the distance from the
277 given latitude and longitude within which the described locality
278 must lie.
279
280 Maximum_Error_Units - the units of length in which the maximum
281 error is recorded (e.g., mi, km, m, and ft). Express maximum
282 error distance in the same units as the distance measurement in
283 the specific locality description.
284
285 Datum - the geometric description of a geodetic surface model
286 (e.g., NAD27, NAD83, WGS84). Datums are often recorded on maps
287 and in gazetteers, and can be specifically set for most GPS
288 devices. Use "not recorded" when the datum is not known.
289
290 Original_Coord_System - the coordinate system in which the raw
291 data are being entered. For the purpose of collaborative
292 georeferencing this value will be "decimal degrees." However,
293 existing geographic coordinates may be entered in degrees
294 minutes seconds, degrees decimal minutes, or UTM coordinates.
295
296 Reference - the reference source (e.g., map, gazetteer, or
297 software) used to determine the coordinates. Such information
298 should provide enough detail so that anyone can locate the
299 actual reference that was used (e.g., name, edition or version,
300 year). Lat_Long_Determined_By the person or organization by
301 which the determination was made.
302
303 Lat_Long_Determined_Date - the date on which the determination was
304 made.
305
306 Remarks - comments on methods and assumptions used in determining
307 coordinates or errors when those methods or assumptions differ
308 from or expand upon the accepted guidelines.
309
310 Glossary
311
312 Datum - A geodetic datum describes the size, shape, origin, and
313 orientation of a coordinate system for mapping the surface of
314 the earth.
315
316 Decimal degrees - degrees expressed as a single real number (e.g.,
317 -22.343456) rather than as a composite of degrees, minutes,
318 seconds, and direction (e.g., 7d 54 18.32" E).
319
320 Geodetic surface model - a geometric description of the surface of
321 the earth.
322
323 Geographic coordinates - latitude and longitude, measured in any
324 of various coordinate systems.
325
326 Geographic center - To find the geographic center of a shape,
327 first find the extremes of both latitude and longitude within
328 the shape and then take their respective means.
329
330 UTM - Universal Transverse Mercator. A grid coordinate system
331 specifying a datum, zone, and offsets from the equator and from
332 the meridian of the zone. See the References section, below, for
333 more information.
334
335 References
336
337 Township, Range Section Information:
338
339 http://www.esg.montana.edu/gl/trs-data.html
340
341 Datum Information:
342
343 http://www.colorado.edu/geography/gcraft/notes/datum/datum_f.html
344 http://164.214.2.59/GandG/tm83581/tr83581a.htm
345 http://biology.usgs.gov/geotech/documents/datum.html
346
347 UTM Information:
348
349 http://www.nps.gov/prwi/readutm.htm
350 http://www.dmap.co.uk/ll2tm.htm
351
352 Note
353
354 Specific locality descriptions are inexact and seldom give
355 estimates of error. An ideal description of a specific locality
356 has no error. One way to achieve this ideal is to describe the
357 locality by a shape within which the exact locality must
358 certainly lie. The capture of shape data is certainly possible
359 with current GIS technology, and is even demonstrably more
360 efficient than the methods described above. However, there are
361 technical challenges yet to be met in order to make the capture
362 of shape data feasible in a collaborative Internet-based
363 georeferencing environment.
364
365 An alternative to using a shape to describe a locality is to use
366 a definitive point of arbitrarily high precision with an
367 attendant maximum error. This method, described in the foregoing
368 document, is a conservative expression of the locality which
369 satisfies the requirement that the exact locality must lie
370 within the space described.
371
372
373 _________________________________________________
374
375 Rev. 24 September 2001, JRW
376
377 University of California, Berkeley, CA 94720, Copyright 2001,
378 The Regents of the University of California.
>>> Posting number 95, dated 27 Sep 2001 10:45:45
Date: Thu, 27 Sep 2001 10:45:45 -1000
Reply-To: Mammal Networked Information System <MAMMAL-Z-NET@USOBI.ORG>
Sender: Mammal Networked Information System <MAMMAL-Z-NET@USOBI.ORG>
From:
Subject: Georeferencing document
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
John,
I went through your document this morning and find most of it clear and in
agreement with my own practices of georeferencing. I have some
observations and questions as follows:
A.
140 The worst of situations arises when a specific locality
141 description is internally inconsistent. There are numerous
142 possible causes for inconsistencies. It is the task of the those
143 georeferencing to determine the part of the description most
144 likely to be in error, ignore it for the purpose of the
145 determination, and document the decision to do so. The most
146 common source of inconsistency in a locality description comes
147 from trying to match elevation information with the rest of the
148 description. If there is no reasonable way to reconcile the
149 discrepancy, ignore the elevation.
150
151 Example: "10 mi W of Bakersfield, 6000 ft"
I have recently been through a georeferencing exercise in the herp
collection for which obtaining coordinates that agreed with the elevations
was critical. It was only through trying to match the description of the
location (distance and direction from X village) with the elevation given,
and finding that the given elevation at the described site was impossible,
that I uncovered major problems in the locality data provided for a large
number of herps on one particular collecting trip. In this case I was able
to contact the collector to ask about the inconsistencies and he determined
that his original distances were totally off because he was using miles on
a metric map. In this case the elevations were the correct piece of
information. I therefore caution against ignoring elevations out of hand.
B.
Section on Determining Latitude and Longitude does not include an example
for when coordinates are provided. For the sake of completeness, should
such and example be included, or, since they are being provided and not
determined, should this be taken up in another section? For example, when
coordinates are provided in degrees, minutes and seconds, do we translate
into decimals? how many decimal places do we go for minutes? for
seconds? Does it matter who provided the
coordinates? collector? previous museum person? someone else? Under
what circumstances, if any, should we recalculate coordinates when they are
provided by some previous source?
C.
210 Error associated with distance precision
211
212 Distance may be recorded in a specific locality description with
213 or without significant digits, and those digits may or may not
214 be warranted. A conservative way to insure that distance
215 precision is not inflated is to treat distance measurements as
216 integers with fractional remainders. Thus 10.25 becomes 10 1/4,
217 10.5 becomes 10 1/2, etc. Calculate the error for these distances
218 based on the fractional part of the distance, using 1 divided by
219 the denominator of the fraction.
Lines 217-219. Does this mean to "replace" the numerator with 1, and
divide by the denominator?
221 Example: "10.5 mi N of Bakersfield" Fraction is 1/2, error should
222 be 0.5 mi.
numerator is 1 to begin with, so doesn't answer the question.
224 Example: "10.6 mi N of Bakersfield" Fraction is 1/10, error
225 should be 0.1 mi.
Isn't the fraction of .6, 6/10? Did you replace the 6 with a 1 in order
to calculate the error?
227 Example: "10.75 mi N of Bakersfield" Fraction is 3/4, error should
228 be 0.25 mi.
Fraction this time is given as 3/4, not 1/4, but you could only get an
error of 0.25 by replacing the 3 with a 1 before dividing by 4.
As you can see, the examples are confusing.
All in all, its a sound document. Thanks much.
>>> Posting number 96, dated 27 Sep 2001 20:34:47
Date: Thu, 27 Sep 2001 20:34:47 -0800
Reply-To: Mammal Networked Information System <MAMMAL-Z-NET@USOBI.ORG>
Sender: Mammal Networked Information System <MAMMAL-Z-NET@USOBI.ORG>
From: Gordon Jarrell <fnghj@AURORA.UAF.EDU>
Subject: Re: Georeferencing document
In-Reply-To: <5.0.2.1.1.20010927104434.00a2f7e0@mail.bishopmuseum.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Some good points. I've inserted my comments.
On Thu, 27 Sep 2001, XXXXXXX wrote:
> A.
> 140 The worst of situations arises when a specific locality
> 141 description is internally inconsistent. There are numerous
> 142 possible causes for inconsistencies. It is the task of the those
> 143 georeferencing to determine the part of the description most
> 144 likely to be in error, ignore it for the purpose of the
> 145 determination, and document the decision to do so. The most
> 146 common source of inconsistency in a locality description comes
> 147 from trying to match elevation information with the rest of the
> 148 description. If there is no reasonable way to reconcile the
> 149 discrepancy, ignore the elevation.
> 150
> 151 Example: "10 mi W of Bakersfield, 6000 ft"
>
> I have recently been through a georeferencing exercise in the herp
> collection for which obtaining coordinates that agreed with the elevations
> was critical. It was only through trying to match the description of the
> location (distance and direction from X village) with the elevation given,
> and finding that the given elevation at the described site was impossible,
> that I uncovered major problems in the locality data provided for a large
> number of herps on one particular collecting trip. In this case I was able
> to contact the collector to ask about the inconsistencies and he determined
> that his original distances were totally off because he was using miles on
> a metric map. In this case the elevations were the correct piece of
> information. I therefore caution against ignoring elevations out of hand.
>
The key words here are, "IF there is no way to reconcile the
discrepancy..." A possible resolution of the discrepancy might be to
treat it as "specific locality unknown." This might best be left to the
discretion of the individual collections. We have to judge individually
how bad our bad data are, i.e., whether or not we can reconcile them.
> B.
> Section on Determining Latitude and Longitude does not include an example
> for when coordinates are provided. For the sake of completeness, should
> such and example be included, or, since they are being provided and not
> determined, should this be taken up in another section? For example, when
> coordinates are provided in degrees, minutes and seconds, do we translate
> into decimals? how many decimal places do we go for minutes? for
> seconds? Does it matter who provided the
> coordinates? collector? previous museum person? someone else? Under
> what circumstances, if any, should we recalculate coordinates when they are
> provided by some previous source?
>
(I know John's answer to some of this one.) The coordinates define an
infinitely small point, no matter what the format. Precision is measured
with max_error, not the number of significant figures.
Nevertheless, we will have coordinates in which precision was implied by
the recorded format. We have to convert this implied imprecision into a
measure of max_error. At UAM we are using 2 km, a little over a nautical
mile, for coordinates that were recorded to the nearest whole minutes.
There are other examples, similar to the problems with distance precision:
64D 28' 30" N - What they meant to say, in terms of significant
figures, was probably 64D 28.5' N. I suppose in this example we would use
max_error= 1 km
We probably do need to develop a standard here. And yes, I'll bet we want
to be able to keep track of various determinations, re-determinations, who
did it, when, and how.
> C.
> 210 Error associated with distance precision
> 211
> 212 Distance may be recorded in a specific locality description with
> 213 or without significant digits, and those digits may or may not
> 214 be warranted. A conservative way to insure that distance
> 215 precision is not inflated is to treat distance measurements as
> 216 integers with fractional remainders. Thus 10.25 becomes 10 1/4,
> 217 10.5 becomes 10 1/2, etc. Calculate the error for these distances
> 218 based on the fractional part of the distance, using 1 divided by
> 219 the denominator of the fraction.
>
> Lines 217-219. Does this mean to "replace" the numerator with 1, and
> divide by the denominator?
>
> 221 Example: "10.5 mi N of Bakersfield" Fraction is 1/2, error should
> 222 be 0.5 mi.
>
> numerator is 1 to begin with, so doesn't answer the question.
>
> 224 Example: "10.6 mi N of Bakersfield" Fraction is 1/10, error
> 225 should be 0.1 mi.
>
> Isn't the fraction of .6, 6/10? Did you replace the 6 with a 1 in order
> to calculate the error?
>
> 227 Example: "10.75 mi N of Bakersfield" Fraction is 3/4, error should
> 228 be 0.25 mi.
>
> Fraction this time is given as 3/4, not 1/4, but you could only get an
> error of 0.25 by replacing the 3 with a 1 before dividing by 4.
>
> As you can see, the examples are confusing.
>
>
Looks like a typo in line 224.
I suggest replacing the sentence beginning in line 217 with:
The error is the resolution implied by the denominator. It can be
calculated as a distance by dividing one unit of distance by the
denominator.
Is that better? Or worse?
>>> Posting number 97, dated 28 Sep 2001 12:53:09
Date: Fri, 28 Sep 2001 12:53:09 -0500
Reply-To: Mammal Networked Information System <MAMMAL-Z-NET@USOBI.ORG>
Sender: Mammal Networked Information System <MAMMAL-Z-NET@USOBI.ORG>
From:
Subject: Georeferencing guidelines
Mime-version: 1.0
Content-type: multipart/alternative;
boundary="MS_Mac_OE_3084526390_196216_MIME_Part"
John et al.,
The georeferencing guidelines look great to me. The only (minor) quibble I
have
would be with the second item under the subheading "Offsets" (lines 86-89).
Here, you
suggest that a locality that contains distance fractions (such as "10.2 mi E
Bakerfield") should be assumed to be road miles rather than air miles. I see
it the other way around. Most field workers I know are careful to state "by
road" if their mileage was actually measured along a road. Otherwise, the
mileage is assumed to be taken directly from a map (i.e., air miles). I
don't see that the inclusion of fractions in the mileage should
automatically signal that the mileage was read from an odometer...it's easy
to get that level of precision using the distance scale printed on the map.
Let's see what the others think. Well done.
>>> Posting number 98, dated 28 Sep 2001 11:33:22
Date: Fri, 28 Sep 2001 11:33:22 -0700
Reply-To: Peter Rauch <peterr@socrates.Berkeley.EDU>
Sender: Mammal Networked Information System <MAMMAL-Z-NET@USOBI.ORG>
From:
Subject: Re: Georeferencing guidelines
In-Reply-To: <OF482A362E.E38FA255-ON86256AD5.00621E6D@lsu.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
On Fri, 28 Sep 2001, XXXXXXXX wrote:
> The georeferencing guidelines look great to me. The only
> (minor) quibble I have would be with the second item under
> the subheading "Offsets" (lines 86-89). Here, you suggest
> that a locality that contains distance fractions (such as
> "10.2 mi E Bakerfield") should be assumed to be road miles
> rather than air miles. I see it the other way around. Most
> field workers I know are careful to state "by road" if their
> mileage was actually measured along a road.
On insect labels ;>) "by road" is just that much more text to
cram onto tiny labels. Maybe things are different with
vertebrate folks, especially for those who keep detailed field
notebooks. I think lots of folks keep careful track of their
odometers, and record road/track miles quite often. I suspect
that *either* assumption is likely to be wrong too often (i.e.,
when no explicit indication is given of which type of
measurement is done). Perhaps the classification should be
"Basis of measure not indicated" and let the "buyer beware"?
(I.e., the geographic analyst can then chose how she wishes to
interpret the distances --perhaps choosing to measure both ways
if a locality seems out of place under one or the other
measurement scheme.)
> Otherwise, the
> mileage is assumed to be taken directly from a map (i.e.,
> air miles). I don't see that the inclusion of fractions in
> the mileage should automatically signal that the mileage was
> read from an odometer...it's easy to get that level of
> precision using the distance scale printed on the map.
>>> Posting number 99, dated 30 Sep 2001 13:35:49
Date: Sun, 30 Sep 2001 13:35:49 -0500
Reply-To: Mammal Networked Information System <MAMMAL-Z-NET@USOBI.ORG>
Sender: Mammal Networked Information System <MAMMAL-Z-NET@USOBI.ORG>
From:
Subject: FW: Locality comment
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
John et al.:
With regard to assigning coordinates to localities, there is a convention
that has been used here at KU for at least 50 years that will help with
localities that are given with reference to towns in the US. When the town
(e.g. Lawrence) was a county seat, distances were measured from the
courthouse. Frequently this was near the center of town, but it reduces the
error in estimating the distance from town because we don't need to worry
about the distance being measured from the city limits. If the locality is
3.5 mi NW of
Lawrence, we still have the uncertainty associated with the angular
component. If the town is not a county seat, the Post Office is frequently
specified as the point of reference. We think this system was exported to
several other collections that are part of MANIS. In general, your
suggestions look quite reasonable (and conservative).
>>> Posting number 100, dated 12 Oct 2001 16:22:06
Date: Fri, 12 Oct 2001 16:22:06 -0700
Reply-To: Mammal Networked Information System <MAMMAL-Z-NET@USOBI.ORG>
Sender: Mammal Networked Information System <MAMMAL-Z-NET@USOBI.ORG>
From: John Wieczorek <tuco@SOCRATES.BERKELEY.EDU>
Subject: Georeferencing Commentary synopsis
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Hi folks,
I've been ruminating over the responses to the Georeferencing Guidelines
document, which was posted on the MaNIS website on 24 Sep 2001. That
document has generated interest in a wider community, including the
Alexandria Digital Library Project, so I feel it worthwhile to spend a
little extra effort to fill in some omissions. Below I will address the
points brought up in discussion and try to provide satisfactory solutions.
I would like to know if there are any objections to these solutions. My
next step will be to incorporate this information into the Guidelines
document and then announce the existence of that document to NHCOLL.
XXXXXXXX mentioned a convention to use the courthouse for a
point of reference for a county seat and to use a post office as a point of
reference for other towns. Since the Board on Geographic Names GNIS data
often follows this convention as well I see no conflict. Of course, this
convention applies only to the US, and only to those towns where there is a
single identifiable post office or a courthouse. For all other
determinations the current geographic center of the town, or the
coordinates given in a gazetteer, should be used. In either case it is best
to note something akin to "measured from the post office" or "measured from
the geographic center of Bakersfield" in the determination remarks.
XXXXXXXX bought up the topic of elevations as a critical part of the
determination criteria. I agree with her assessment and I propose that we
follow XXXXXXXX's advice, namely, that localities for which there are
internal inconsistencies should be deferred to the parent institution for
further investigation. I have designed the collaborative gazetteer to
allow annotations to both localities and higher geography. Through the
annotations, georeferencers can note inconsistencies for follow-up work.
Collaborators will be able to check the gazetteer for annotations that
apply to the data from their institution.
XXXXX also noted that there was no example of how to deal with existing
geographic coordinates. My original thought was that we should count these
localities as finished. Yet, there is merit in revisiting existing data,
both for validation and for edification, especially since none of the
existing coordinates have associated error. Nevertheless, we must remain
cognizant of our budgetary constraints. We were given funds to georeference
localities for which we didn't already have coordinates. All that aside,
XXXXX's point is well-taken. I will provide guidelines for existing
geographic coordinates in the forthcoming revised Georeferencing Guideline
document.
XXXXX asked whether we should translate coordinates from other coordinate
systems into decimal degrees for data entry. The gazetteer currently
accommodates the following coordinate systems:
decimal degrees
degrees, decimal minutes
degrees, minutes, decimal seconds
UTM
But that doesn't answer the question. I will endeavor to create an
interface in which the user will select the original coordinate system and
provide the data in that system. Behind the scenes the data will be stored
in that system AND will be translated to decimal degrees. There will be
decimal degrees and the original coordinates for every determination.
XXXXX's next topic was with respect to the precision stored in the
coordinate fields. There is no reason to truncate the values of coordinates
to conform to a predefined level of precision. For reasons described under
the section on Precision in the Georeferencing Guidelines document, it is
inappropriate to try to store precision information in the coordinate data.
Since the values of the coordinates do not make a statement about the
precision of the determination, keeping as many digits as your source
provides is the preferred method. Discarding digits may have an effect on
accuracy, so it is not recommended. Just for edification, a decimal degree
that records five digits to the right of the decimal can distinguish
between two places on the earth roughly one meter apart. Similarly, if you
want to maintain accuracy down to one meter, degrees and decimal minutes
should be recorded with 4 decimal places in the decimal minutes, and
degrees minutes seconds should be recorded with 2 decimal places in the
decimal seconds. Conversely, degrees minutes seconds measured to whole
seconds can introduce inaccuracies of up to 31 meters. Those measured to
whole minutes can introduce inaccuracies of up to 1.85 km. I'll make a
chart of this information for the document revision.
XXXXX's final question has to do with recording the information about who
determined the coordinates. This should certainly be among the best
practices within museums. At the MVZ these data are recorded by making a
reference to the actual person who made the determination. Since the data
are internal to the museum we can tell whether that person was also the
collector or another person on staff. Another possibility is to record the
role of the person who made the determination (e.g., 'collector',
'curatorial assistant', 'Joe's specific locality munger', etc.). Or, if you
only care whether the collector was the one to provide the coordinates, you
could include a DeterminedByCollector field. For MaNIS I intend to use the
name of the person who determines the coordinates, this name being
determined from a login to the online georeferencing interface.
A point of clarification is in order. When determinations are made, I
intend to treat them as opinions. They will not be stored directly with the
locality record, rather, they will refer to it. This allows any number of
lat/long opinions to be registered. The individual institutions will be
able to decide which one (if there are multiple opinions) will the
"accepted" determination when they put the data back in their databases.
All of the coordinates that were provided in the data sent to me have been
turned into opinions and are already in the gazetteer.
XXXXXX made the following observation:
"There are other examples, similar to the problems with distance precision:
64D 28' 30" N - What they meant to say, in terms of significant
figures, was probably 64D 28.5' N. I suppose in this example we would use
max_error= 1 km"
I agree with XXXXXX's assessment of significance, however, the
determination of error is more complicated. Not all degrees are created
equal. Contrary to popular opinion, the distance between 64 degrees N and
65 degrees N is not the same as the distance between 10 degrees N and 11
degrees N. This is due to the oblateness (flattening from a perfect sphere)
of the earth. This may be a minor point, but longitudinal degrees vary
greatly, being roughly 110 km at the equator and 0 km at the poles. My
point is that I need to provide an interface in which one can enter
coordinates and the digits of precision and get back an error distance
based on those criteria
I will amend my wording and typos with respect to using fractions in the
distance precision error section.
XXXXXXXXX brought up a reasonable alternative view of how offsets should
be handled. The judgement of whether measurements are "by road" or "by air"
can be a tricky one. I want to propose a solution and see if I can get a
consensus.
Specific localities that actually say what the measurement method is (e.g.,
"2.8 mi (by road) E of Marysville") should use that method for determining
coordinates and errors. No special remark is necessary in these cases.
Specific localities that have two orthogonal measurements in them (e.g.,
"2.5 mi E and 1.5 mi N of Bakersfield") are always assumed to be "by
air." No special remark is necessary in these cases either. Furthermore,
no error due to direction imprecision should be used.
So much for the easy stuff.
Specific localities that have one linear offset measurement from a named
place, but that do not specify how that measurement was taken (e.g., "10.2
mi E of Yuma") are open for a case-by-case judgment. I propose that the
judgement itself always be documented in the remarks for the determination
(e.g., "Assumed 'by air' - no roads E out of Yuma", or "Assumed 'by road'
on Hwy. 80"). If there is no clear best choice, then use the midpoint
between the two possibilities as the geographic coordinate and assign an
error large enough to encompass the coordinates and errors of both methods.
In this case I would remark something like "Error encompasses both distance
by air and distance by road (Hwy. 80)". This is a conservative solution,
but it is relatively simple to do and to remember. This method is also
never "wrong," if by "wrong" we mean that the actual place is certainly
within our error distance from the given coordinates.
XXXXXXXXX brought up a question about what units should be used
for maximum error distance. I have set up the gazetteer so that the units
are entered (chosen actually) from a list of possible values (m, km, ft,
yds, mi). The distance and units should be chosen to make sense in the
context of the locality description. My conservative stance on translation
and recalculation issues is to "never adulterate data that can be
adulterated later." If you decide to put these data back into your
databases (and I certainly hope that you will), you can decide at that time
whether to normalize to a single unit of measure.
XXXXXXX also brought up an essential issue of whether errors propagate and
should therefore be summed rather than simply choosing the greatest single
source or error. The answer is not a simple one, so bear with me.
XXXXXXX's specific example, "3 km N + 2 km W Bakersfield" is an instance
of a type of locality description for which I did not provide an example. A
proper description of the error for this example would be a bounding box
centered on the point 3 km N and 2 km W of Bakersfield. Each side of the
box would be 2 km in length (1 km error in any direction). Since we're
using a point and radius to characterize the error, we need a circle that
will circumscribe the above-mentioned bounding box. To do this, the radius
has to be the distance from the center coordinate to a corner. This could
either be calculated by the geometry of the bounding box (in the above
example it would be the distance to the corner times the square root of 2)
or measured on a map.
There remains the more general question of whether errors propagate. They
do, and they are non-linear, so to sum them is a mistake. The paragraph
above shows how a sum is not a satisfactory method of accommodating
multiple sources of error. As more sources of error come to bear, the
propagation gets even more "interesting." I'll spare you the details here,
but I'll make a point of explaining these sources and how they should be
dealt with in the Guidelines revision.
In addition to the issues brought up so far in discussion, I have a few to
add independently. First, I got the calculation for directional error
wrong. I'll update that in the revision. Second, it is probably obvious,
but I still need to state that the directional error can be ignored when
the distance is measured either "by road" or when the description gives two
orthogonal offsets (e.g., "2 mi E and 4 mi N"). Third, there is another
source or errors inherent to reading maps. This error is based on the scale
and it reflects inherent errors in the maps themselves. I will quantify
these errors in the revision.
Aside from the revised georeferencing document, I'm currently working on
interfaces to do the georeferencing online. I'll send out a how-to guide
when the interface is ready to use. It is too soon to know when that will be.
So that everyone knows, my field season is about to begin. Eileen and I are
scheduled to leave for Argentina on 3 Nov and to return around New Year's day.
That's it for my update. Feel free to discourse on my proposed amendments
and thanks to everyone for the comments thus far.
John
>>> Posting number 101, dated 16 Oct 2001 12:43:55
>>> Posting number 102, dated 18 Oct 2001 19:30:33
Date: Thu, 18 Oct 2001 19:30:33 -0700
Reply-To: Mammal Networked Information System <MAMMAL-Z-NET@USOBI.ORG>
Sender: Mammal Networked Information System <MAMMAL-Z-NET@USOBI.ORG>
From: John Wieczorek <tuco@SOCRATES.BERKELEY.EDU>
Subject: Georeferencing Guideline Document Updated
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Dear All,
It took almost two weeks, but the eagerly-awaited revision to the
Georeferencing Guidelines Document is finally complete. I have replaced the
original document, so the following URL now points to the revision:
http://dlp.cs.berkeley.edu/manis/GeorefGuide.html
I'm not including the line-numbered text of the document here since we are
presumably past the heated debates. Nevertheless, commentary is
always welcome.
When you read the revised document you are likely to be stricken by the
complexities of determining error properly. Don't despair. My next task is
to create an error calculator. The idea is to have a web page on which you
can enter the relevant parameters and get a maximum error distance. This
tool will be a supplement to the georeferencing tool itself, the
development of which is underway.
John
>>> Posting number 103, dated 19 Oct 2001 12:29:38
>>> Posting number 104, dated 4 Nov 2001 21:44:44
Date: Sun, 4 Nov 2001 21:44:44 -0800
Reply-To: Mammal Networked Information System <MAMMAL-Z-NET@USOBI.ORG>
Sender: Mammal Networked Information System <MAMMAL-Z-NET@USOBI.ORG>
From: "Barbara R. Stein" <bstein@OZ.NET>
Subject: MaNIS--ready, set, georeference!
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="------------24FB9C29A003860042ABE8C3"
--------------24FB9C29A003860042ABE8C3
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Dear All,
This is the moment I know you have all been waiting for! You will
notice a new Gazetteer link at the bottom of the MaNIS home page
(http://dlp.cs.berkeley.edu/manis). This is your gateway to hours of
georeferencing fun. But before starting to work, please read this
message in its entirety, print it out and post it next to the computer
that will be used for georeferencing. You’ll see why you need to print
it when you get near the bottom.
To begin, please review the updated Georeferencing Guidelines.
Next, you will want to read the Georeferencing Steps document. A hot
link to it appears at the top of the gazetteer page.
You will also want to read the text below the query screen on the
gazetteer main page.
After reading all of the above, you will query the gazetteer for a
locality of interest. The "Search" button returns a list of all higher
geographies containing the term entered and indicates how many unique