Ask HN: Built a DB of over 50M+ Org Names for API use. Should it be made public?

2 points by sagnikghosh 2 days ago

TLDR: Built an Internal API that maps over 50M+ Organization Names to their Domains and Logos. Planning to make this API Endpoint Publicly available for other devs to use. Good Idea or Bad???

I run an AI startup where we fine-tune Open Source LLama and Falcon models to turn them into Enterprise Grade Models with longer context windows and better reasoning capabilities. We ended up collecting over 50 Million+ organization data. Recently we came across a use case from one of our customers that they want to use AI to create an auto populating CRM. So right now we have an Internal API that maps all Organization Domains to their Official Names and their Logos. Useful for all those Devs who want to fetch Organization Logos and Organization Names from Domains or vice versa. Should I be converting the API into a Publicly Accessible one for people to use it in their projects?!!??

westurner 2 days ago

Yeah, how do you indicate uncertainty in the aigen estimated correspondences? W3C CSVW supports dataset, column, and cell -level metadata. E.g. opencog atomspace hypergraph supports an Attention Value and a Truth Value.

Are there surprising regional and temporal trends in the names?

RDFS specifies a standard vocabulary for classes and subclasses, and properties and sub properties; rdfs:Class , rdfs:Property .

There are schema.org properties on the schema:LocalBusiness class for various business identifiers and other attributes ;

https://schema.org/url : domain

https://schema.org/identifier and subproperties : https://schema.org/duns , https://schema.org/taxID ,

https://schema.org/areaServed

https://schema.org/brand r: https://schema.org/Brand , https://schema.org/Organization

Maybe, a https://schema.org/Dataset :isPartOf a https://schema.org/ScholarlyArticle

https://schema.org/isPartOf