What alternatives are there to Google BigQuery
The cloud market is still fiercely competitive - Microsoft and Amazon are clearly (still?) The top dogs when it comes to public cloud computing. In recent years, Google Cloud has established itself as a very good alternative to the two offers mentioned. With regard to a new Big Data Analytics project, we have relied on Google Cloud for the following reasons:
- Managed Kubernetes Cluster
- Intuitive interface and product overview, especially if there are many software developers on the team
- The Hidden champion: Google BigQuery
With this article we would like to present our experiences with BigQuery to you - first about the problem.
Have you ever tried to perform an arithmetic operation in an Excel file with a few gigabytes? It's not really fun and costs valuable time. In our case, we have designed an analytics platform that is able to evaluate terabytes of transaction data in a reasonable time. Not only does Excel fail here, but also most databases such as B. MySQL, MSSQL and Oracle. Of course you can use EXASOL, SAP HANA or similar (massive parallel processing) solutions for such challenges, but these applications cost you several (ten) thousand euros only for the appropriate licenses.
Open source and "serverless" MPP databases
In contrast to the above-mentioned MPP solutions that require a license, open source and “serverless” cloud MPP databases have existed on the market for some time. We have dealt in depth with various solutions and ultimately focus on Apache Impala (Cloudera), Pivotal Greenplum and Google BigQuery.
impala and Greenplum convince in terms of price, as there are no costs for the pure application. BigQuery , as a column-oriented cloud solution, operates here with a pay-as-you-go model. storage costs around 2 cents per gigabyte and month. Furthermore, Google calculates the query of the database. At 5 € / TB, this should not be underestimated - to be fair, it must be said that only the selected columns are relevant.
Accordingly, the queries should be designed with great care and avoided if possible.
Usability & documentation
Anyone who has ever managed a database knows how labor-intensive this task can be. With the open source solutions, you can of course take over the hosting yourself - with the resulting overhead and presumably frustrated employees. Installation, maintenance, update and even more maintenance …… .Or, alternatively, a third party provider (e.g. Pivotal or Cloudera) will take care of the hosting.
Google BigQuery is only available as a "serverless" offer - you activate the API in the Google Cloud and voilà MPP-Database-as-a-Service. It is particularly worth mentioning that BigQuery automatically scales and hides the complexity surrounding infrastructure from the end user.
The documentation of all three products is very good. There are many tutorials related to these three MPP databases. We were impressed by the BigQuery documentation - especially due to the familiarity of the documentation with other Google products.
The thing about the SQL standards ...
Many providers officially state that they support certain SQL standards and this statement is mostly true. According to its own statement, Greenplum Database can be used with the SQL 2003 specification. Google BigQuery with SQL 2011. It is worth mentioning here, however, that you repeatedly come up against limits, as data types and standard functions are sometimes missing. Our advice is to first check all features (YOURSELF!). In the case of BigQuery, Decimal / Numeric, a data type used to avoid rounding errors, was not yet officially supported. After various emails with Google Support, however, we were activated for the BETA version with Numeric.
Our performance ranking is based on the benchmarks from  and . The test data from TPC-DS and TPC-H were used for the benchmarks. Impala and BigQuery run head to head. BigQuery is convincing in the “Concurrency” benchmark and is accordingly in first place for our purposes. Unfortunately, Greenplum could not hold a candle to Impala and BigQuery. Our SQL queries, which analyze over several hundred gigabytes, were answered within a few seconds.
Our recommendation: Google BigQuery
We clearly recommend BigQuery. The simplicity combined with this performance is unsurpassed in our eyes these days. BigQuery is recommended both for rapid data prototyping and for use as a data warehouse (data vault). Of course, BigQuery still has some teething problems and missing features (roles and rights leaves something to be desired ...) - but we still think that BigQuery is at least worth a look. There are no costs in this regard.
- Dogs are in pain like humans
- What's wrong with communism
- What is Morgan Freeman's worst performance
- What can Superman do about kryptonite
- Are narcissistic protecting their supplies
- How eye glasses are made
- Is Pakistan versus the Philippines
- What is LSI in terms of SEO
- One day Facebook will own LinkedIn
- Is a voting tag worth the price
- What types of intellectual disabilities are there
- What encouraged you today
- What is foil screen printing
- It's hard to get into the NIT
- The temperature of the vacuum is 0K
- Breastfeeding whales
- Why is racism no longer a crime
- Did you do something bad?
- Would you marry an entrepreneur
- What does gooseberry mean
- Why does beer taste like rice?
- Who pays when an apartment burns down
- Bike how much cc
- What should an ENFP learn