Recently I've decided I wanted to use Apache's Giraph on some of the big data we have at my company. The problem is I don't want to recreate the wheel and run into mistakes that I could weed out by learning about graph based solutions at a fundamental level. Like are there de-facto best algorithms for linking data sets which are linked via a N-tier "foreign key" relationship? Are there low-memory solutions and high-memory solutions for these problems? What are the trade offs?
I'm having trouble finding information but it could be I don't know what I'm looking to find. Is there a place to learn about theoretical graph based solutions and algorithms for problems? I came up with my own home grown low-memory footprint algorithm that I think is pretty slick but no idea if it is just a commonly used pattern.
Then if anyone knows people who work at Facebook or have worked at Facebook and have some documentation on using Giraph to feed into a Hive metastore and using Presto for low-latency queries and possibly Solr/Lucene as a search engine ontop of the metastore that would be cool too
Heroes of the Storm Hotsdogs
There are currently 1 users browsing this thread. (0 members and 1 guests)