How to Pull Data From Various Sources
A Startup's Guide to Importing Data Feeds From Multiple Sources.
Whether you’re receiving customer orders, inventory, documents, media, or anything else, bringing in data from customers and partners is necessary for your business' survival.
As a startup, you’re at a point where decisions you make now will surely shape the way data reaches your organization in the years to come.
So now is the time to take a good look at all the aspects of how you bring the data your business needs, and to build a good foundation which you can consistently rely on.
Find your data
The first step is to understand what data your business needs. Every business is different, and understanding your data requirements is not easy. But once you know what data you need, and where that data is located (and how to access it there), your work has only just begun: now you can start building for real.
Receive your data
Once you know what data you need, where the data is and how to access it, you can start receiving data.
Our goal is to set up reliable, stable and controlled processes that continuously sync data into the system — i.e. a data pipeline.
Modern cloud infrastructure has made this type of task significantly easier than it used to be a decade ago. That said, modern data requirements — such as size, distribution, processing, analysis, and near real-time sync — mean that the challenge involved is still far from trivial.
On a high-level, there are three main options for building your data pipelines:
Developing in house
Hire knowledge or learn about data feeds, choose the right tools for working with your data sources, and then build and maintain your data pipelines.
This might work best for you if your organization had the “data DNA”. If you are already focused on data, then in the long run you’ll probably find it useful to fully control your data infrastructure.
But if data is not exactly your thing, or if focusing on your data pipelines is too much of a distraction from your core business — consider outsourcing or buying.
Outsourcing your data pipelines to service providers
Let others — experts with knowledge and expertise — build and maintain your data pipelines.
This might increase costs in the short term, but in the mid or mid-long run, this will enable you to focus on your core business while having your data come directly to your doorstep.
Even if you later consider bringing those incoming data pipelines back in-house, outsourcing is still a good option until you’re ready for in-house development.
So long as you can agree on a structure for your data and the update frequency your system needs, you should find it easy to interface with your outsourcers.
Make sure to choose an outsource team has high development standards, and is truly ready for the production challenges ahead.
Whether you’re developing or outsourcing, you’ll want to minimize development to reduce human error and friction. The best way to do that is by using a ready-to-go proven product to do most of the work for you.
Using products to get your incoming data ready for processing
Choose a product that will cover most of your data pipelining needs, so that all that’s left for you to do is to process your data internally.
A good product should cover the vast majority of your incoming data needs, and at least be able to:
- Pull from your data sources
- Receive data sent to you
- Control and monitor what’s coming in
Avoid having a single point of failure and getting locked in to a specific tool — rely on products which you can migrate from easily.
You know where the data is and how you’re going to pull and process it. Great, in theory. To avoid sleepless nights, you’ll have to think production.
In production, with active incoming data pipelines, things can get a bit difficult. Here is a checklist to focus on before taking your data pipeline to the front lines:
- Ease of development - how fast can you get new modules up and running?
- Performance - you’re business will be growing and so will be the amount of data you’ll bring in. Will your solution be able to handle the growth?
- Making changes - in your business logic, in the underlying technology stack, or to handle changes made by the source supplying the data.
- Error handling - when production issues occur, will you have the right tools to handle it (be that detailed logs, metrics, debugging tools, intermediate results, or anything else)?
- Optimization - Will you be able to optimize your pipeline in the future to better fit a new stack or reduce cloud costs?
- Reliability and availability - Your data pipelines need to be up and running 24/7, without losing any of your precious data.
- Support - from your infrastructure provider; or from the community in the case of an open-source solution.
- Human resources - Does your solution require knowledgeable developers who’ll need to write low-level code?
- Backup and disaster recovery
- Monitoring - Even less technical employees will need to be able to look at and understand the state of the incoming data, and to report either internally or to your customers and/or partners.
As a startup, you want to stay focused.
As you explore possible solutions, bear in mind the following criteria:
- Bringing in your data in reliably
- Spending minimal resources on data engineering
- Having the ability to bring in new data fast
- Keeping both development and back-office operations on top of the incoming data
Keep yourself lean with a minimal amount of live systems or operational complexity — data can be complicated, but your first priority is to get to your market fit fast and lean; and grow from there with a firm foundation.
ImportFeed makes collecting, sorting, and organizing documents a breeze.
Join ImportFeed today to start receiving clients documents and scale.