Projects  

Cryptocrawl 2023

Cryptocrawl

The Cryptocrawl ETL data pipeline is a simple and lightweight data pipeline that delivers valuable insights into the ever-changing cryptocurrency market. Built using Python, this data pipeline efficiently automates the process of data extraction, transformation, and loading (ETL), allowing users to readily analyze cryptocurrency data without the need for manual data gathering or processing.

The Cryptocrawl ETL pipeline starts by extracting real-time cryptocurrency data from Yahoo Finance using web scraping techniques. This vast data, which includes attributes like the cryptocurrency name, symbol, price, market capitalization, trading volume, and circulating supply, is then neatly organized into a structured format using transformation processes.

The pipeline leverages the strength of Bonobo, a straightforward and lightweight Python ETL framework, to streamline the data flow through the extract, transform, and load stages. It employs a single transformation function that prepares the data into two separate dataframes which are then channeled into different paths for loading. This bifurcation demonstrates the versatility and complexity that can be achieved with Bonobo, even in handling parallel data streams.

In the final step of the pipeline, the transformed data is stored in a SQLite database, split between two related tables: "crypto" and "crypto_data". This structure allows for efficient storage and easy retrieval of data, paving the way for future in-depth analyses and insights.

The entire pipeline can also be integrated into a task manager / scheduler, providing flexibility to users who require the data to be updated more or less frequently than others.


© 2024 Made with ♥. All rights reserved. Design from @craftzdog.