Detect duplicates with JQ

The Problem

While working on a bug, I needed to go through JSON file containing a few thousand objects. Some objects had a duplicate id value causing errors. A nice use-case for JQ!

The solution with JQ

What I wanted as output was a list of ids and their frequency in the file. So I grouped the object by the id value and transformed this map into an object with properties id and length of the array assigned to this id.

I curled the url and piped the response into jq.

curl http://some.url | jq 'group_by (.id)[] | {id: .[0].id, length: length}'

This resulted in the following output:

{
  "id": 183,
  "length": 2
}
{
  "id": 192,
  "length": 2
}
{
  "id": 332,
  "length": 2
}
{
  "id": 374,
  "length": 2
}
{
  "id": 613,
  "length": 2
}
{
  "id": 2318,
  "length": 2
}
{
  "id": 2390,
  "length": 2
}
{
  "id": 2482,
  "length": 2
}
{
  "id": 2635,
  "length": 2
}

After fixing the issue, I appended an additional filter to the jq statement to only kee the objects where length was greater than 1. This was a check to see if the duplication bug was indeed resolved.

The Problem#

The solution with JQ#

The Problem

The solution with JQ