Hive is an all-in-one project management tool developed to “help teams move faster” regardless of how they work. Features are created based on users’ requests and are updated weekly, making Hive the world’s first democratic software platform. It’s best known for its capabilities in project management, time management, team collaboration, automation, and an array of integrations with third-party software. Hive is free to use for solo users and with premium versions available to teams and enterprises.
Capabilities |
|
---|---|
Segment |
|
Deployment | Cloud / SaaS / Web-Based, Mobile Android, Mobile iPad, Mobile iPhone |
Support | 24/7 (Live rep), Chat, Email/Help Desk, FAQs/Forum, Knowledge Base, Phone Support |
Training | Documentation |
Languages | English |
Hive its a data warehousing infrastructure built on top of Hadoop to provide data grouping, querying, and analysis.Apache Hive soporta el análisis de grandes conjuntos de datos almacenados bajo HDFS de Hadoop y en sistemas compatibles como el sistema de archivos Amazon.It offers a SQL-based query language called HiveQL5 with schemas to transparently read and convert queries in MapReduce, Apache Tez6, and Spark tasks. All three execution engines can run under YARN. To speed up queries, Hive provides indexes, which include bitmap indexes.
Offers many tools, has great growth potential
Possibility of storing metadata in an organized and easily accessible way.
Hive syntax is almost like sql, so for someone already familiar with sql it takes almost no effort to pick up hive. But there are other tools that can do the same thing faster these days. Hive initially was really good to have; but more and more projects are now available to do SQL like operations on Big Data (like Drill).
Hive is comparatively slower than its competitors. Its easy to use but that comes with the cost of processing, If you are using it just for batch processing then hive is well and fine. It also does not have as rich of a scripting language.
In Retail, the business partners are more comfortable querying their own data instead of relying on Engineers. Hive solves one of that problems. The main purpouse of using Hive is to building reports and do analysis of data that is stored in the Hadoop file system.
nothing in particular. helps us with big data and allows all users to have unrestricted bandwidth, but we already ran into issues with that, so now one of the servers has limitations.
. at my company it was fairly troublesome getting access since it's underlying warehouseing is in hadoop, then have to connect through hive
data insights with big browser data through mapreduce
Easy SQL like syntax for very short and simple queries
No alias for relation. No flow controls as well.
I build machine learning model for online advertising system. Hive to me is more like a ad-hoc query engine rather than a platform where I can develop complex algorithm on
Schema on any format HDFS files. Easy to download the data. A complete tool similar to database tool like toad.
Performance,sometime it is very difficult to run queries. Gui can be improved with more user friendly options
Data processing for regulatory reporting ...maintain lineage
I was a top fan of Impala for a while until I reached a series of limitations that were impossible to overcome. I work a lot with arrays and just the fact of being able to use array_contains in impala made me switch to Hive. Also, we are moving fast on the direction of self made Macross for hive that let us do complex queries without lateral view explodes
Session creation takes a while and speed is quite slow when comparing to Impala
Complex data analysis with tables that have several billion rows by partition
It is highly flexible in configurations. So many options to load data from- directly from linux file system or hdfs. You can create external and managed tables. One fun feature is that you can shoot bash commands from hive as well
It cannot be used for streaming data. Error logging can be improved so that error tracking and resolution can be more efficient.
It is used to transform and process Big Data datasets in batches. It can handle TBs of data. Push predicate feature has greatly improved the performance of the queries and the developer doesn't need to think about it anymore
Hive is great for handling logs in big data projects. We are using the same in our project and it is great for using joins and grouping which is very difficult and tricky in map reduce. It has a lot of udf packages and it is very easy to add new udfs. We were also using bucketing and clustering to optimize the query. Concept of external tables and the way we can manipulate data even when table is deleted from hive is really amazing. Lot of connectors available in the market for different softwares.
The thing which I dislike is latency and the way it saves data. While inserting data I have to wait a lot of few records. Compiler execution plan is very immature as it does not do proper query optimization. Though the community is working fast for overcoming quickly but I think it will take time for hive to be
We are using hive mainly for saving our logs. it helps us to keep track of what records are inserted, which records have failed and what are relationship between them. we are using tableau for analyzing data .
The progression of features, speed, etc brings me the strategic confidence I need in the SQL in hadoop space.
At this point, everything is on pint & theories it is great in hive 1.2
Deriving value from masses of unstructured & structured data.
The ability to view HDFS data in a relational format and easily query it through HiveQL
The fact that it uses MapReduce whether you query a pre existing table or a perform a complex query. Tez helps with this issue. Also the inability to delete/update data is a real issue and forces other services to be used eg HBase.
The ability to use Hive on HUE is perfect. We are building a platform for data scientists (prefer GUI to shell) to perform analysis so removing the need for command line is excellent.
It is very simple to use because you fill like you use simple SQL language for querying data. When I just started I didn't have any experience with Hive and in like one week I was able to query big data and do some analysis. In a month I was able to administrate data and create my own databases with the useful data. . .
Not so many implemented functions in the Hive. There are very useful Window functions but it's not enough. . . It's not that simple to modify data inside a table. . .
Analyze every day and every hour or even every minute user experience, user behavior in application or web client , etc . . .
Apache Hive is a tool built on top of Hadoop for analyzing large, unstructured data sets. Most BI and SQL developer tools can connect to Hive as easily as to any other database.
Unable to cancel a running query. Query tuning is difficult compared to RDBMS
We had a requiement to scan a large dataset for our predection algorithm. Initially we used RDBMS but the performace was very slow and user where not happy with it. We replaced RDBMS with the Hive and we are able to see a drastic improvment in the performance.
hiveql is more like SQL and really easy to learn
doesnt work good if you want a low latency queries
performance for 1TB of data
If you know SQL you will be able to get Hive really quickly. Lots of the same functionality but not exactly SQL. Easy to create tables and start writing queries allowing you to dive deeper into your data.
As with all Hadoop tools lots of knobs to tweak. Takes a good bit of time optimize and finely tune your Hive install.
Putting structure on unstructured. Once we chose hive to accomplish the aforementioned task we were able to bring our data to our data scientists quickly. An easier degree of acceptance to the Big Data idea.
- Easy to use interface - multiple clients (CLIs) - easy to debug issues with the help of fully descriptive logs - constantly the product is being improved to meet all the DB developer requirements - can be accessed from multiple applications - access through knox for additional security - no indexing - multiple file formats - the tez architecture
- authentication gaps - issues when routing through zookeeper - not as matured tool as the regular database tools
- BI team is helping all the enterprise users to ingest and access data from hadoop - most of the users are well versed with standard sql tools - to make hadoop enterprise wide solution we are training all users with hive