The open source Presto SQL query engine is used by a diverse set of companies to navigate increasingly large data workflows. These companies are using Presto in support of e-commerce, cloud, security and other areas. Not only do many companies use Presto, but individuals from those companies are also active contributors to the Presto open source community.
In support of that community, Presto holds meetups around the world and has an annual conference, PrestoCon, where experts and contributors gather to exchange knowledge. This year’s PrestoCon, hosted by the Linux Foundation, takes place December 7-8 in Mountain View, CA. This blog post will explore some foundational elements of Presto and what to expect at this year’s PrestoCon.
Presto is a distributed SQL query engine for data platform teams. Presto users can perform interactive queries on data where it lives using ANSI SQL across federated and diverse sources. Query engines allow data scientists and analysts to focus on building dashboards and utilizing BI tools so that data engineers can focus on storage and management, all while communicating through a unified connection layer.
In short, the scientist does not have to consider how or where data is stored, and the engineer does not have to optimize for every use case for the data sources they manage. You can learn more about Presto in a recent ELI5 video below.
Caption: Watch the video by clicking on the image above.
Presto was developed to solve the problem of petabyte-scale, multi-source data queries taking hours or days to return. These resources and time constraints make real-time analysis impossible. Presto can return results from those same queries in less than a second in most cases, allowing for interactive data exploration.
Not only is it highly scalable, but it’s also extensible, allowing you to build your own connector for any data source Presto does not already support. At a low level, Presto also supports a wide range of file types for query processing. Presto was open sourced by Meta and later donated to the Linux Foundation in September of 2019.
Here are some Presto resources for those who are new to the community:
PrestoCon is held annually in the Bay Area and hosted by the Linux Foundation. This year, the event takes place December 7-8 at the Computer History Museum. You can register here. Each year at PrestoCon, you can hear about the latest major evolutions of the platform, how different organizations use Presto and what plans the Technical Steering Committee has for Presto in the coming year.
Presto’s scalability is especially apparent as every year we hear from small startups, as well as industry leaders like Meta and Uber, who are using the Presto platform for different use cases, whether those are small or large. If you’re looking to contribute to open source, PrestoCon is a great opportunity for networking as well as hearing the vision that the Technical Steering Committee has for the project in the coming year.
Explore what’s happening at PrestoCon 2022:
Since its release in November of 2013, Presto has been used as an integral part of big data pipelines within Meta and other massive-scale companies, including Uber and Twitter.
The most common use case is connecting business intelligence tools to vast data sets within an organization. This enables crucial questions to be answered faster and data-driven decision-making can be more efficient.
First, a coordinator takes your statement and parses it into a query. The internal planner generates an optimized plan as a series of stages, which are further separated into tasks. Tasks are then assigned to workers to process in parallel.
Workers then use the relevant connector to pull data from the source.
The output of each task is returned by the workers, until the stage is complete. The stage’s output is returned by the final worker towards the next stage, where another series of tasks must be executed.
The results of stages are combined, eventually returning the final result of the original statement to the coordinator, which then returns to the client.
To start using Presto, go to prestodb.io and click Get Started.
We would love for you to join the Presto Slack channel if you have any questions or need help. Visit the community page on the Presto website to see all the ways you can get involved and find other users and developers interested in Presto.
If you would like to contribute, go to the GitHub repository and read over the Contributors’ Guide.
To learn more about Presto, check out its website for installation guides, user guides, conference talks and samples.
Make sure you check out previous Presto talks, and attend the annual PrestoCon event if you are able to do so.
To learn more about Meta Open Source, visit our open source site, subscribe to our YouTube channel, or follow us on Twitter, Facebook and LinkedIn.