Page 2 of 2 FirstFirst 12
Results 21 to 35 of 35

Thread: Big Data Architects

  1. #21
    Registered User Dumar's Avatar
    Join Date
    Dec 2012
    Location
    Washington, D.C.
    Posts
    2,665
    Tuconets
    -11
    You'll get a promotion when you threaten to leave.

  2. #22
    Registered User
    Join Date
    Dec 2012
    Posts
    4,445
    Tuconets
    14
    I've been promoted twice in five years at my company

  3. #23
    Registered User
    Join Date
    Dec 2012
    Location
    St.Louis, MO
    Posts
    104
    Tuconets
    0
    I have found that one of the big things is that most places you have to ask for it to get promoted. Those who don't ask never get promoted.

  4. #24
    Registered User
    Join Date
    Dec 2012
    Posts
    4,445
    Tuconets
    14
    Quote Originally Posted by CnCGOD View Post
    I have found that one of the big things is that most places you have to ask for it to get promoted. Those who don't ask never get promoted.
    My first promotion came later than I expected because I didn't speak up. I specifically asked for this next promotion. I believe you are right. I know I wouldn't have been promoted had I not asked.

  5. #25
    Registered User
    Join Date
    Dec 2012
    Posts
    4,445
    Tuconets
    14
    I'm kind of hijacking this thread but CnC do you have any experience with Lucene? Setting it up, indexing your data, using the API? There is a decent sized chance I'm going to join the team for writing our next generation search engine anyways using MapReduce and it appears Lucene is the Apache solution. Are there any other solutions you'd recommend we explore?

  6. #26
    Registered User
    Join Date
    Dec 2012
    Location
    St.Louis, MO
    Posts
    104
    Tuconets
    0
    I have worked with Lucene for keyword indexing of medical documents, it is the goto choice in a lot of deployments. Have you looked into Solr? It has done a lot of the work for you in building the search capabilities on top of the Lucene library.

  7. #27
    Registered User
    Join Date
    Dec 2012
    Posts
    4,445
    Tuconets
    14
    I saw Solr pop up a bunch when I was doing my research into Lucene last night but wasn't 100% sure the function of the product. I haven't joined the team and was just trying to wet my feet to figure out how easy/difficult it is to use MapReduce searches. I'd be on the team investigating solutions so Lucene is just one of the options to explore but it seems to be pretty well regarded as the de facto search for MapReduce.

  8. #28
    Registered User
    Join Date
    Dec 2012
    Posts
    4,445
    Tuconets
    14
    And I'm not sure your knowledge level with Solr but what we really need to support is named indexing. Not just full text keyword searching. In our cause we have data such as:

    Code:
    <Record>
        <001>Spot's adventure</001>
        <002>Dr. Tenks</002>
        <003>Dog</003>
    </Record>
    Bassically the 001, 002 and 003 all have different meanings and conform to the MARC standards set forth by the Library of Congress. So for our "se" index this is for subject which would need to index against the 003. The 001 would be the title index and 002 being the author index. I haven't seen if there is a way of setting something like that up using Solr/Lucene.

  9. #29
    Registered User
    Join Date
    Dec 2012
    Location
    St.Louis, MO
    Posts
    104
    Tuconets
    0
    Quote Originally Posted by Tenks View Post
    And I'm not sure your knowledge level with Solr but what we really need to support is named indexing. Not just full text keyword searching. In our cause we have data such as:

    Code:
    <Record>
        <001>Spot's adventure</001>
        <002>Dr. Tenks</002>
        <003>Dog</003>
    </Record>
    Bassically the 001, 002 and 003 all have different meanings and conform to the MARC standards set forth by the Library of Congress. So for our "se" index this is for subject which would need to index against the 003. The 001 would be the title index and 002 being the author index. I haven't seen if there is a way of setting something like that up using Solr/Lucene.

    I don't have enough lucene experience to tell you if it is good for that. I think you may be looking at an HBase problem with custom indexing.

  10. #30
    Registered User
    Join Date
    Dec 2012
    Posts
    291
    Tuconets
    2
    Quote Originally Posted by Tenks View Post
    And I'm not sure your knowledge level with Solr but what we really need to support is named indexing. Not just full text keyword searching. In our cause we have data such as:

    Code:
    <Record>
        <001>Spot's adventure</001>
        <002>Dr. Tenks</002>
        <003>Dog</003>
    </Record>
    Bassically the 001, 002 and 003 all have different meanings and conform to the MARC standards set forth by the Library of Congress. So for our "se" index this is for subject which would need to index against the 003. The 001 would be the title index and 002 being the author index. I haven't seen if there is a way of setting something like that up using Solr/Lucene.
    I'm not sure if I get what you're writing but Solr indexes are generally segmented against the field where the data came from. So you can use their query language to search against a single field to get your results. There are also cool features you can build against the index like faceting where if you have some date field or word you want to search for you can limit regular search results against them: http://wiki.apache.org/solr/SolrFacetingOverview

    You can see examples of faceting on sites like Newegg where you can limit your search for motherboards to specific subsets of features etc

  11. #31
    Registered User
    Join Date
    Dec 2012
    Posts
    4,445
    Tuconets
    14
    Yeah it isn't really faceting that I'm needing. We have far too many indicies for me to see how we could display facets in a proper UX manner. I am completely ignorant about Solr so I'm just trying to gather some information about the product. I don't even know exactly what a query string looks like you call to it via HTTP or anything. I assume I'd have to at least put a layer in the middle to parse my index names into something usable by Solr.

  12. #32
    Registered User
    Join Date
    Dec 2012
    Posts
    291
    Tuconets
    2
    Quote Originally Posted by Tenks View Post
    Yeah it isn't really faceting that I'm needing. We have far too many indicies for me to see how we could display facets in a proper UX manner. I am completely ignorant about Solr so I'm just trying to gather some information about the product. I don't even know exactly what a query string looks like you call to it via HTTP or anything. I assume I'd have to at least put a layer in the middle to parse my index names into something usable by Solr.
    I'm ignorant of how your data is laid out, but if there is a common identifier for every entry in your source data you could create one singular index for all your data. An example would be a table that has secondary meta data tables that contain extra info for each row. The original row data and all of it's meta data could be put into a single index with all the data segmented into where it was sourced so you can limit your search to the meta data or the main table or both.

    There are some built in query parsers/query syntaxes built into Solr you can choose from (or roll your own), this is the default one and I'm not sure if it helps. http://wiki.apache.org/solr/SolrQuerySyntax

  13. #33
    Registered User
    Join Date
    Dec 2012
    Posts
    4,445
    Tuconets
    14
    I'll try and be a bit more through with the data description then.

    Basically our HBase is keyed off an incremental number as the key. 1, 2, 3 or whatever. I don't know exactly how many keys there are but it is around 100M. Under these keys there are various qualifiers but the big important ones are the XML blob and holdings. Holdings basically means that a library has this book. There are over 1b of these type of entries. The XML blob is a big metadata about the book. So it has things like title, langauges, ISBN/ISSN, author, etc etc. In general here are some common searches:

    "no:123456" --> Returns 123456
    "li:ABC" --> Returns all numbers library ABC has holdings on
    "se:cats" --> Returns all numbers that the /root/040/sa/d contains the phrase "cats" (for this argument lets say the LoC MARC standard for subjects is in the 040 subfield "a")

    There are a bunch of indexes but these are just a few examples. For the most part most of the indexes are similar to the "se:" one where we simply index a field/subfield. I was just wondering if Solr could handle all these use cases.

  14. #34
    Registered User
    Join Date
    Dec 2012
    Posts
    291
    Tuconets
    2
    Yup, and it sounds like it could be a single index. Which from the look of it would greatly simplify your implementation. Take what I say with a grain of salt though, this is all knowledge that I've gained via seeing how Solr would give us a unified search and talking with various companies that do support contracts for Solr. I don't have any first hand experience, but we provide similar searching to what you're describing in our product and the contractors have all given positive feedback that what I described would scale well.

  15. #35
    Registered User
    Join Date
    Dec 2012
    Posts
    260
    Tuconets
    0
    Re-derailing a bit, but on the consulting topic:

    There is a LOT to be said for being the money maker. If you are an IT or dev at a company where you aren't central to their primary product, you are treated completely differently than if you are the moneymaker. Its pretty easy to justify high pay when you can point at $350k in billable hours in the past year. You also (in my experience) get to do a lot of interesting and cool shit because there are generally 2 scenarios in which people are willing to pay big bucks for a team of consultants:

    1) Shit is totally fucked, and they need to be rescued.
    2) They want to do really cool things but have no fucking clue how to do it and don't trust their own dev team to do it (from what I've seen, this is usually a good choice too).

    fixing #1 can also involve #2, so that scenario isn't all bad.

    The upsides of consulting:
    1) Fast salary growth
    2) do cool shit
    3) work with some really experienced people and able to tap those resources. I was working with a beta SDK from microsoft trying to integrate something into a windows 8 app that wasn't really supposed to be done yet. One of the main resources for that product in the country was an IM away and a guy who worked on silverlight and then windows 8 for the past 5 years was across the room. It would have taken significantly longer without their help.
    4) generally a lot better culture (at our company anyway) than traditional stodgy large companies. Not startup level (although we do have a kegerator) but a long ways from office space. When your business is technology, you tend to get more smart people and fewer pointy haired bosses.

    Downsides:
    1) Can be a lot of travel, depending on the company. Might have to work at the client site even if they are local.
    2) You never "own" a product that you then tinker with for years. You go in, build something (hopefully cool), and then hand it off to the customer.
    3) Not a lot of fuckoff time where you can be lazy and feel like you've "earned it" from previous accomplishments. Those accomplishments were a different team and a different time, very much a "what have you done for me lately" feel to it.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •