Ask ten European coffee importers how their suppliers send farm geo-data, and you will get ten different answers. Some receive Excel spreadsheets. Others get KML files exported from Google Earth. A few receive GeoJSON. Many get a mix of all of the above, depending on the origin country and the supplier's technical capacity.
Each format has its own structure, its own conventions, and its own ways of going wrong. Here is what we see in practice.
CSV and Excel
These are the most common formats — and the most variable. There is no enforced schema. Every supplier can invent their own column names, their own field order, and their own conventions.
The most frequent issues:
- Decimal separator. In many countries — particularly across Latin America and Southern Europe — the standard decimal separator is a comma, not a period. A coordinate that should read
14.889199arrives as14,889199. Most geo-tools will either reject the file or misparse the coordinate silently. This is by far the most common error we see, and it is entirely auto-correctable. - Swapped latitude and longitude. Longitude comes first in GeoJSON (per the spec) but latitude comes first in most human-readable tables. Suppliers who copy coordinates from Google Maps into an Excel file that expects a different order will produce a file where every farm is placed in the wrong location. A bounds check against the expected country catches this reliably.
- Empty farm ID column. Farm IDs exist in the supplier's internal system but were not exported. The file arrives with coordinate data but no way to link each farm back to a supplier record or a shipment. Flagged, not correctable.
- Inconsistent column names across suppliers. One supplier uses
Latitude, another usesLat, another usesLAT_WGS84, another usesLatitud. All mean the same thing. Fuzzy matching against known field name patterns handles most of these automatically.
GeoJSON
GeoJSON is the closest thing to a standard in this space — it has a defined specification, a defined coordinate order (longitude first, latitude second), and a defined geometry model. When it is correct, it is the easiest format to process.
When it is not correct, the problems tend to be structural.
- Incorrect geometry type. A file arrives with
Pointfeatures wherePolygonfeatures are required for farms over 4 hectares. This reflects how the data was collected in the field — a GPS point was recorded instead of a boundary walk. The file is technically valid GeoJSON, but it does not meet EUDR requirements for larger farms. Flagged, not correctable without a field survey. - Inconsistent feature IDs. Some features have an
idfield, others do not. Some IDs contain spaces or special characters —"SAM -013"instead of"SAM-013". Normalisation handles the formatting. Missing IDs are flagged. - Non-standard properties. Suppliers add custom fields —
FLOID,ID finca,ID interno,SUPERFICIE— that do not map to any standard schema. These need to be interpreted and mapped to the expected output format. Usually manageable, sometimes ambiguous.
KML
KML was designed for Google Earth, not for EUDR compliance. It is XML-based and human-readable, which makes it popular with field teams using mobile mapping apps. It is also the format most likely to contain idiosyncratic errors.
- Non-standard tags. A file arrives using
<n>instead of<name>— a typo introduced somewhere in the export template that has propagated across an entire supplier dataset. Every standard KML parser rejects it silently. Auto-correctable once the pattern is identified. - Mixed geometry types. Point and Polygon features in the same file — smallholder farms recorded as points, larger farms recorded as polygons. Requires splitting and processing differently. Point features are flagged if the farm area exceeds 4 hectares.
- Whitespace in coordinate strings. A coordinate sequence reads
14.889199, -85.905430with a space after the comma instead of14.889199,-85.905430. Some parsers handle this; others do not. Strippable automatically. - Encoding issues. Field names with accented characters —
Área,Café— can arrive with incorrect character encoding if the export was not set to UTF-8. Detectable and correctable in most cases.
What this means in practice
No single format is inherently better or worse than another. What matters is consistency — a supplier who always sends the same format, with the same field names and the same conventions, causes far fewer problems than one who switches formats or changes column names between submissions.
The upstream data quality problem is not really a format problem. It is a consistency problem.
TraceBean normalises all four formats into a single validated output — so whatever arrives, the compliance tool receives the same clean, structured file.