Internet as Datasets + API

Big Data of the Web

prev. 196 d
Datasets / Sites Tables Columns Rows Data Bytes Media Bytes Media Items
1,563
(+ 162)
9,159
(+ 377)
67,550
(+ 11,933)
2,204,520,021
(+ 102,695,331)
367.24G
(+ 3.85G)
1.60T
(+ 86.91G)
34,514,624
(+ 2,888,627)
Datasets / Sites 1,563
(+ 162)
Tables 9,159
(+ 377)
Columns 67,550
(+ 11,933)
Rows 2,204,520,021
(+ 102,695,331)
Data Bytes 367.24G
(+ 3.85G)
Media Bytes 1.60T
(+ 86.91G)
Media Items 34,514,624
(+ 2,888,627)
2020-04-04 11:52:06 (132 d ago)

What is DataSN? or Data Source Network?

Data crawler and parser of ALL websites


DataSN, or Data Source Network, crawls, parses and hosts all data of the Internet, not raw web pages, but data objects that are both machine friendly and human readable. More than website scraper, DataSN extracts, cleanse, normalize, categorize, and format data.
A web of sites that we crawl and parse
Data values incrementally updated and time-stamped

Easy incremental updates


Every table row is stamped with the time it's created or last updated. You can easily find the newly created or updated data rows since your last retrieval. Some datasets have history traceback enabled with all historical values archived for a given field.

Clean and atomic data


DataSN data sets are rigorously sanitized and cleansed, ready for software consumption. We frown upon unparsed strings and raw bytes, providing atomic data values that are immediately usable by the simplest of programs.
Data values are atomic thus easy for program consumption
Data are meaningfully titled and tagged

Meaningful data names


DataSN columns and tables are properly named after the meaning and semantic nature of the data so you instantly know what the data is about.

Data are live


Data are instantly published via API as soon as they are crawled so your program knows what happens in real world by the minute.
Data are scraped, parsed, and published in real time
Data are format-neutral at DataSN so you have any format or file type for our datasets

All popular formats


Data should be formatless rather than be bound to a specific application. DataSN data is neutral in formats not affiliated with any proprietary application by delivering the same piece of data in all formats you can imagine, among which the most popular being JSON, XML, CSV, Excel, and HTML. Advanced formats are available per request, such as MySQL, MSSQL, etc.

Data are related


All DataSN data records are properly related to / associated with each other to form a traversable family / network of relations or knowledge map. Data are structurally normalized to reduce redundancy, to facilitate association, categorization, and searching.
Data objects are all related to each other to form a knowledge map
DataSN datasets come with related media files

Did we mention the media files?


DataSN crawls not just text but also all the media. Media files, such as images, are exhaustively collected, meticulously tagged, categorized and associated with its particular data row(s) so they are searchable and retrievable by information about them.

READY TO TAKE
ACTION?

Access All Data Now!

Popular Data & APIs
Some of DataSN's most accessed data sets and APIs in the last 14 days

READY TO TAKE
ACTION?

Access All Data Now!

Browse Data + APIs + Images
HTML / JSON / XML / Excel / CSV / PDF / Media / MySQL / MSSQL / WordPress / Magento / ...

READY TO TAKE
ACTION?

Access ALL Data Now!
NETWORK SITE
Usable Databases
NETWORK SITE
The Data Planet

Data Source Network

DataSN refines and connects databases across the world to facilitate data re-use / analysis, acquire new knowledge and uncover curious correlations buried in the data.

GET DATA OF ANY
WEBSITE

GOT A DIFFERENT IDEA?

Let's Talk

EARLY BIRDS EMAIL LIST

Subscribe to be notified of major data releases and updates.

Terms of Use | Privacy Policy | Disclaimer | [email protected] | PayPal: [email protected] | +86 (158) 0293-6510 | Shangpin Guoji, 88 Gaoxin Road, Xian 710000, China
@ 2017 - 2020 DataSN.io, DataSN.com, Data Source Network. Data Sets / Databases, APIs, Scrapers.