Creating catalogs and schemas in Databricks
Jordan Smith Jordan Smith

Creating catalogs and schemas in Databricks

Working in Databricks, it is key to harness a foundational understanding of Catalogs, Schemas, and Tables before moving on to advanced AI and ML use cases. The traditional database workflow of setting up a data environment is rapidly scalable within the Databricks platform like never before, but nonetheless, and the platform makes database development more streamlined than ever. 7 Min Read

Read More
Pulling GitHub data into Databricks with dbutils
Jordan Smith Jordan Smith

Pulling GitHub data into Databricks with dbutils

In this blog, we will demonstrate a method that can be used to pull GitHub data across several formats into Databricks. This is a frequent request from Databricks users because it allows for the utilization of large existing GitHub datasets for developing and training AI and ML models, enabling Unity Catalog to access github repositories like US Zip Code data, and working with unstructured data such as JSON logs. By linking GitHub and Databricks, you can improve your workflows and access critical data.  8 Min Read

Read More
Accessing HuggingFace ML datasets in Databricks
Jordan Smith Jordan Smith

Accessing HuggingFace ML datasets in Databricks

As a supplement to our blog on pulling GitHub datasets into Databricks, many users may find that the dataset that they require for their project is located in HuggingFace. HuggingFace is a prominent platform in the AI and machine learning community, known for its extensive library of pre-trained models and datasets. It provides tools for natural language processing (NLP), computer vision, audio, and multimodal tasks, making it a versatile resource for developers and researchers.  10 Min Read

Read More