Consuming a REST API in Python
July 11, 2020
Download the code
Why Python?
Python has great libraries for data analysis
Why JSON instead of CSV?
- Most recent data
- No need to download files
- Richer data
- Supports hierarchical and relational data
- Support for queries
- Less error prone
- Scales better than CSV
Producing a Pandas DataFrame from a JSON feed
This demo shows how to consume a JSON feed in Python and return a Pandas DataFrame
Strings to Integers
If we were to load the data from a CSV or directly into a Pandas DataFrame directly from JSON, the data would load correctly, but the values would be "strings" for many columns. This means that we would not be able to do mathematical analysis of numeric values, such as finding the mean, median or mode for a numeric column. Graphing our ordinal values would also be impossible at this point.
Index | Boolean Values | Numeric Values | Categorical Values | Ordinal values |
---|---|---|---|---|
0 | “yes” | “12123” | “fruit” | mild” |
1 | “yes” | “” | “vegetable” | “medium” |
2 | "" | “432” | “vegetable” | “strong” |
2 | "no" | “2” | “dairy” | “very mild” |
What we want is a DataFrame that looks more like this, where we have the following mapping:
Categorical Values
- 1
- fruit
- 2
- vegetable
- 3
- dairy
Ordinal Values
- 0
- None
- 1
- Very Mild
- 2
- Mild
- 3
- Medium
- 4
- Strong
- 5
- Very Strong
Index | Boolean Values | Numeric Values | Categorical Values | Ordinal values |
---|---|---|---|---|
0 | TRUE | 12123 | 1 | 2 |
1 | TRUE | 0 | 2 | 3 |
2 | NONE | 432 | 2 | 4 |
2 | FALSE | 2 | 3 | 1 |
The problem that this code tries to solve is how to map categorical values when we do not know all the possible values are upfront. To solve this problem, we build up the mapping as we encounter them during parsing of the data and print the mapping once the feed has been completely parsed.