Feature guides

Data Browser & Catalog

The Data Browser is the discovery and governance layer of the DataFlow AI Platform. It surfaces an OpenMetadata-backed catalog that lets you search, explore, and understand every table, column, and dataset across Polkomtel's heterogeneous data estate — Teradata DWH-MONA, Snowflake, SAP HANA, Databricks, and MSSQL/Sybase. It is search-first and governance-visible: PII tags, certification badges, and quality scores are always on display.


New here? What a "data catalog" is

A large company like Polkomtel stores data in dozens of different databases. Nobody can remember what is in all of them. A data catalog solves this — it is a searchable directory of every table and column the company has, no matter which database it lives in. Think of it as the library catalogue for the company's data: you search it to find what you need, then read its "card" to understand what you found.

The Data Browser is that catalogue. With it you can:

  • Search for data by keyword — type "churn" and see every related table across all systems.
  • Browse data grouped by business area (Customers, Billing, Network, and so on).
  • Understand a table before you use it — see its columns, a sample of real rows, and statistics.
  • Trust it — see whether it has been quality-checked and certified, and whether it contains private data.
  • Trace where a piece of data came from and where it flows.

A few words you will see on these screens:

  • Table — a grid of data, like a spreadsheet tab. Columns are its fields (e.g. FIRST_NAME); rows are its records (one per customer).
  • Schema — a named group of related tables inside a database.
  • PIIPersonally Identifiable Information: data that identifies a real person (a name, a phone number). It is marked with a special tag because it must be handled carefully.
  • Certified — a badge meaning a data steward has reviewed the table and vouches it is trustworthy.
  • Lineage — the documented journey of data, showing where a table's values came from and what flows out of it.
  • Profiling — automatic statistics about a column: how many rows are empty, how many distinct values it has, its smallest and largest values, and so on.

No request to engineering needed

The whole point of the Data Browser is self-service. An analyst can find, understand, and assess a dataset on their own — without filing a ticket or waiting for an engineer to explain what a table contains.


What the Data Browser does

The Data Browser answers one core question: "What data do we have, and can I trust it?" It is built around a search bar and a set of governance-aware views that let you move from a keyword to a fully understood table — its columns, sample rows, profiling statistics, lineage, and quality rules.

Route base: /data-browser. Entry file: src/pages/DataBrowser.tsx. The catalog is backed by the metadata-service CatalogController and an OpenMetadata-compatible catalog.

Design principles

  • Search-first — the search bar is the primary interaction; everything else is secondary navigation.
  • Progressive disclosure — the landing page shows domains and popular tables; detail arrives on click-through.
  • Governance-visible — PII tags, certification badges, and quality scores are never hidden.

Who uses it

PersonaHow they use it
Marek — Business AnalystDiscover tables by keyword or domain without asking engineering
Tomasz — Data StewardVerify PII tagging and certification, inspect column lineage
Anna — Data EngineerPreview sample data before building a pipeline; jump to Design Studio
Katarzyna — Platform AdminAudit estate coverage and ownership

Screen layout — Browse / Search landing

The landing page (/data-browser) leads with a large search bar and arranges discovery surfaces below it.

+----------------------------------------------------------------------+
|  DATA BROWSER                                                        |
|  Discover, explore, and understand your data assets.                 |
|                                                                      |
|  +----------------------------------------------------------------+  |
|  | [search]  Search tables, columns, datasets...                  |  |
|  +----------------------------------------------------------------+  |
|                                                                      |
|  BROWSE BY DOMAIN                                                    |
|  +--------+ +---------+ +---------+ +--------+ +-------------+        |
|  |  CRM   | | Billing | | Network | |  CDR   | | Reference   |        |
|  | 47 tbl | | 32 tbl  | | 28 tbl  | | 19 tbl | | Data 14 tbl |        |
|  +--------+ +---------+ +---------+ +--------+ +-------------+        |
|                                                                      |
|  +-----------------------------+  +-------------------------------+  |
|  | RECENTLY VIEWED             |  | POPULAR TABLES                |  |
|  | DIM_CUSTOMER     3 min ago  |  | FACT_REVENUE             96   |  |
|  | FACT_CDR_DAILY   1 hr ago   |  | DIM_SUBSCRIBER           91   |  |
|  | BIURO_SPRZEDAZY  yesterday  |  | DIM_CUSTOMER             88   |  |
|  | STG_NETWORK_EV.  2 days ago |  | FACT_CDR_DAILY           85   |  |
|  +-----------------------------+  +-------------------------------+  |
|                                                                      |
|  TAGS                                                                |
|  [ PII 84 ]  [ Confidential 31 ]  [ Certified 67 ]  [ Draft 23 ]     |
|  [ Deprecated 8 ]                                                    |
+----------------------------------------------------------------------+

UI controls

ControlBehaviour
Search barTypeahead with debounced suggestions; Enter navigates to /data-browser/search?q=
Domain cardsFive cards (CRM, Billing, Network, CDR, Reference Data) with table counts; click filters search by domain
Recently ViewedTables you opened recently, with relative timestamps
Popular TablesMost-used tables, sorted by a 0–100 popularity score
Tag CloudFilter pills for PII, Confidential, Certified, Draft, and Deprecated, each with a count

Governance tags

Every table and column can carry catalog tags, color-coded so governance status is visible at a glance.

TagMeaning
PIIContains personally identifiable information (e.g. MSISDN, names)
ConfidentialSensitive business data
CertifiedReviewed and endorsed as trustworthy
DraftNot yet reviewed
DeprecatedScheduled for retirement — avoid building new pipelines on it

Search results

Searching navigates to /data-browser/search?q={query}, a two-pane layout: a filter sidebar on the left and a results list on the right.

+------------------+  +--------------------------------------------+
| FILTERS          |  | 12 results for "churn"    Sort: [Relevance]|
|                  |  |                                            |
| Database         |  | +----------------------------------------+ |
| [x] Teradata     |  | | DIM_CUSTOMER            [PII][Certified]| |
| [x] Snowflake    |  | | Snowflake.DWH.DIM_CUSTOMER             | |
| [ ] SAP HANA     |  | | Customer dimension with churn scoring..| |
| [ ] Databricks   |  | | Popularity: 88   Owner: Anna Nowak     | |
| [ ] MSSQL        |  | +----------------------------------------+ |
|                  |  |                                            |
| Tags             |  | +----------------------------------------+ |
| [ ] PII          |  | | FACT_CHURN_MONTHLY          [Certified]| |
| [ ] Confidential |  | | Snowflake.DWH.FACT_CHURN_MONTHLY       | |
| [ ] Certified    |  | | Monthly churn analysis fact table...   | |
|                  |  | | Popularity: 76   Owner: Marek Lew...   | |
| Domain / Owner   |  | +----------------------------------------+ |
| [ Clear Filters ]|  |  ... (more results, scrollable)            |
+------------------+  +--------------------------------------------+

The filter sidebar groups facets into Database, Tags, Domain, and Owner sections. The results header shows the result count and a Sort dropdown with three options: Relevance, Popularity, and Recently Updated. Each result card shows the table name, fully qualified name, a truncated description, tag pills, the popularity score, and the owner; the matched query term is highlighted.


Table detail view

Clicking a result opens the Table Detail View at /data-browser/table/{database}/{schema}/{table}. A header strip carries the fully qualified name, certification badge, description, owner, last-updated time, row and column counts, tags, and domain — plus a Create pipeline from this table button that hands off to the Design Studio.

Below the header, six tabs organize the table's metadata:

TabWhat it shows
ColumnsEvery column — name, type, description, nullability, tags, distinct count, null %
Sample DataA horizontally scrollable preview of sample rows
ProfilingPer-column profiling cards with distribution histograms
LineageA mini lineage graph of upstream and downstream dependencies
QualityA grid of quality rules with pass/fail status and pass rates
ActivityA timeline of catalog activity — description edits, tag changes

Columns tab

The Columns tab is the default. It renders a table where each column name is clickable. PII columns are tagged inline, and per-column statistics — distinct count, null percentage, min/max, mean, unique percentage — come from the catalog's profiling data.

+-----+-------------+----------+----------+-------+-------------+--------+
| #   | Name        | Type     | Nullable | Tags  | Distinct    | Null % |
+-----+-------------+----------+----------+-------+-------------+--------+
| 1   | CUSTOMER_ID | BIGINT   | No       |       | 8,420,316   | 0.0%   |
| 2   | MSISDN      | VARCHAR  | No       | [PII] | 8,418,200   | 0.0%   |
| 3   | FIRST_NAME  | VARCHAR  | No       | [PII] | 142,300     | 0.1%   |
| 5   | CHURN_SCORE | DECIMAL  | Yes      |       | 87          | 0.3%   |
| 8   | ARPU        | DECIMAL  | Yes      |       | 4,230       | 0.2%   |
+-----+-------------+----------+----------+-------+-------------+--------+

Column detail slide-over

Clicking a column name opens the Column Detail panel as a slide-over (the URL gains a ?column= parameter). It shows per-column statistics, a profiling histogram, PII tags, and column-level lineage — letting a steward confirm a column's classification and provenance without leaving the table.


Data products and the marketplace

Beyond raw tables, the platform exposes curated data products — packaged, governed datasets ready for consumption — through a Data Marketplace.

RoutePagePurpose
/data-marketplaceDataMarketplacePageBrowse the catalog of data products
/data-marketplace/products/:idDataProductDetailA single data product's detail page

A data product detail page presents the product's description, owning team, the underlying datasets, governance status, and quality posture — the consumption-ready counterpart to the table-level catalog views.


How to use it — click-paths

Find a dataset

  1. Open /data-browser.
  2. Type a keyword into the search bar — for example "churn".
  3. Press Enter to navigate to the Search Results page.
  4. Narrow the results with the left filter sidebar — tick Snowflake under Database, or Certified under Tags.
  5. Reorder with the Sort dropdown — switch to Popularity to surface the most-used tables first.
  6. Click a result card to open its Table Detail View.

Behind the scenes: api/search.ts runs the ranked catalog search; api/catalog.ts resolves table summaries.

Preview a table

  1. From a Table Detail View, click the Sample Data tab.
  2. A horizontally scrollable grid shows sample rows, with NULL values rendered distinctly and a "Showing N of total" footer.
  3. Switch to the Profiling tab to inspect per-column distributions and statistics.
  4. Open the Quality tab to check which quality rules pass or fail on the table.
  5. To trace provenance, open the Lineage tab or click a column name to open the Column Detail slide-over.

Behind the scenes: api/catalog.ts for columns and sample data, api/quality.ts for quality rules, api/lineage.ts for the lineage graph, api/pii.ts for PII tags.

Open a data product

  1. Navigate to /data-marketplace.
  2. Browse or search the catalog of curated data products.
  3. Click a product to open /data-marketplace/products/{id}.
  4. Review its description, owning team, underlying datasets, governance status, and quality posture.

Behind the scenes: api/dataProducts.ts lists and resolves data products; api/endorsements.ts supplies certification status.

From discovery to building

The Data Browser is a launchpad as well as a catalog. From any Table Detail View, the Create pipeline from this table button pre-populates the Design Studio with that table as a source — so analysts and engineers move from finding data to building on it in a single click.


Walkthrough — find and assess a dataset

Suppose you have been asked to analyse customer churn and you do not know which table to use. Here is how the Data Browser gets you from a vague question to a trustworthy table in a couple of minutes.

  1. Open the Data Browser. Click Data Browser in the left sidebar.
  2. Search by keyword. Click the large search bar, type churn, and press Enter. You arrive at the Search Results page.
  3. Read the result cards. Each card shows a table name, where it lives, a short description, its tags, a popularity score, and its owner. The popularity score tells you which tables your colleagues actually use — a strong hint about which to trust.
  4. Narrow it down. Use the left filter sidebar — tick Snowflake under Database if you only want warehouse tables, or tick Certified under Tags to see only steward-approved data.
  5. Sort smartly. Change the Sort dropdown to Popularity so the most-used tables rise to the top.
  6. Open a table. Click a promising result, for example DIM_CUSTOMER. Its Table Detail View opens.
  7. Check it is trustworthy. Look at the header: a Certified badge means a steward vouches for it. Open the Quality tab to see which quality rules pass. A high quality score plus a Certified badge means you can rely on it.
  8. Understand the columns. The Columns tab lists every field with its type and statistics. Columns marked PII contain personal data — handle them per policy.
  9. See real data. Open the Sample Data tab for a preview of actual rows, so you know what the values look like before you build anything.
  10. Trace its origin (optional). Open the Lineage tab, or click a column name, to see where the data came from and what depends on it.
  11. Act on it. Satisfied? Click Create pipeline from this table to jump straight into the Design Studio with this table pre-loaded as a source.

Common questions

What is the difference between the Data Browser search and the Ctrl+K Global Search? The Data Browser search is for discovery — ranked catalog results with filters, PII tags, certification, lineage, and quality. The Ctrl+K Global Search is for fast navigation — jumping to a known pipeline, table, user, or setting from anywhere. Use Global Search when you know what you want; use the Data Browser when you are still figuring it out.

What does the "Certified" badge actually mean? A data steward has formally reviewed the table — its quality, its lineage, its PII classification — and endorsed it as trustworthy. Prefer Certified tables for important work. A Draft tag means the opposite: not yet reviewed. Deprecated means it is being retired — do not build new pipelines on it.

What is a popularity score? A 0–100 figure reflecting how much a table is used across the platform. A high score is a useful signal that colleagues already rely on it, though it is not a substitute for the Certified badge.

I see a column tagged PII — can I still use it? Yes, but carefully. PII (names, phone numbers, PESEL) is personal data protected by law. Follow your organisation's policy: it may need to be masked, and access may be restricted by role. When in doubt, ask a data steward.

What is the difference between a table and a data product? A table is raw data straight from a database. A data product is a curated, packaged, governed dataset — assembled and documented for direct consumption — found in the Data Marketplace. Data products are the polished, ready-to-use option.

The statistics say a column is 30% null — is that a problem? "Null" means empty — no value recorded. Whether 30% empty is a problem depends entirely on the column. A missing optional middle-name is fine; a missing customer ID is serious. The Profiling tab shows you the rate so you can judge it for your use case.

Can I add a description or a tag to a table? Stewards, admins, engineers, and analysts can edit catalog annotations (descriptions, tags, glossary links). Every annotation change is recorded in the audit trail and visible to everyone in the workspace.


Global Search overlay

The Data Browser's search bar is the deep, governance-aware catalog search. Alongside it, the platform shell provides a lighter-weight Global Search overlay — a command-palette-style modal for jumping to anything anywhere in DataFlow AI without leaving the current page.

The global search overlay modal showing results grouped into pipelines, tables, columns, users, and settings categories
The global search overlay (Cmd/Ctrl+K) is a command-palette jump-to for pipelines, tables, columns, users, and settings across the whole platform.

How it works

Press Cmd+K (macOS) or Ctrl+K (Windows/Linux) from anywhere in the application — or click the search trigger in the top bar — to open the overlay. It is a centered modal over a blurred backdrop, focused on a single text input.

BehaviorDetail
Open / toggleCmd+K / Ctrl+K, or the top-bar search trigger
CloseEscape, or click the backdrop
NavigateArrow Up / Down to move the selection, Enter to jump to the highlighted result
Empty stateWith no query typed, the overlay shows recent searches
ResultsFiltered live as you type, grouped by category

Result categories

Results are grouped into five categories, each with its own icon and color:

CategoryIcon colorExample result
Pipelinesbluewf_Subscriber_Churn_v2 — Staging, pending approval
TablesemeraldDIM_SUBSCRIBER_360 — Snowflake / DWH / 12.4M rows
ColumnsamberCUSTOMER_CHURN_SCOREDIM_SUBSCRIBER_360.CHURN_SCORE, float64
UserspurpleAnna Kowalska — Data Engineer, BI & Data Engineering
SettingsslateConnection: Teradata DWH-MONA — JDBC, Active

Global Search vs. the Data Browser

Use the Global Search overlay (Cmd+K) for fast navigation — jumping to a known pipeline, table, user, or setting. Use the Data Browser search bar for discovery and governance — ranked catalog results with facet filters, PII tags, certification badges, lineage, and quality. Global Search routes you to a destination; the Data Browser helps you understand and trust the data once you arrive.

The overlay is implemented by src/components/shell/GlobalSearchModal.tsx, opened from GlobalSearchTrigger.tsx in the top bar, with open state held in the shellStore.


Behind the scenes — API summary

CapabilityAPI module
Catalog tables & columnsapi/catalog.ts
Ranked searchapi/search.ts
Data products / marketplaceapi/dataProducts.ts
Lineage graphapi/lineage.ts
Quality rulesapi/quality.ts
PII tagsapi/pii.ts
Certificationapi/endorsements.ts
NL catalog Q&Aapi/catalogAsk.ts (also surfaced via the AI Copilot)

Data Browser UI components live in src/components/data-browser/ — including CatalogSearch, ConnectionTree, SchemaViewer, TableCard, ColumnDetail, DataProfile, and SampleDataPreview. Related shipped pages include DataMarketplacePage.tsx and DataProductDetail.tsx.


Glossary

Plain-language definitions of the terms used across the DataFlow AI documentation.

TermDefinition
ETLExtract, Transform, Load — copying data out of source systems, cleaning and reshaping it, and loading it into a destination warehouse.
ELTExtract, Load, Transform — a variant where raw data is loaded into the warehouse first and transformed in place inside it.
PipelineAn automated recipe — a graph of processing nodes — that reads from sources, transforms data, and writes it to destinations.
NodeA single processing box within a pipeline: a source, a transform, a quality check, or a target.
DAGDirected Acyclic Graph — the flowchart shape of a pipeline; arrows point one way and never loop back.
ConnectorA configured, reusable connection to an external data system such as an Oracle database or a Kafka cluster.
Source / SinkA source node reads data into a pipeline; a sink (or target) node writes the finished data out.
WorkspaceAn isolated environment that scopes its own pipelines, connectors, rules, and members — like a project or team space.
RunOne single execution of a pipeline, with its own ID, status, timestamps, and logs.
Batch vs. StreamingBatch pipelines run on a schedule and process a chunk of data at once; streaming pipelines run continuously, processing each record within seconds.
CDCChange Data Capture — capturing each database row change (insert/update/delete) in real time from the database's transaction log, without re-scanning the table.
Push-downAn optimization that translates a transformation into native SQL and runs it inside the source or target database, so large volumes of data never travel across the network.
Write modeHow a target adds data: append (add rows), overwrite (replace all), or upsert/merge (update existing rows, insert new ones).
CheckpointA saved "you-are-here" marker partway through a run, allowing a failed run to resume instead of restarting.
SLAService Level Agreement — the agreed promise about how a pipeline should perform (completion time, success rate).
LineageThe documented path of data from origin, through every transformation, to its destinations; "column-level" lineage tracks individual fields.
CatalogA searchable directory of every table and column across all connected systems.
ProfilingAutomatically computed statistics about a column — null rate, distinct count, min/max, mean, and so on.
Data quality ruleAn automated check on a dataset, such as "this column must never be empty" or "this code must be 15 digits".
QuarantineA holding area where rows that failed a blocking quality rule wait for a steward decision, instead of being loaded or lost.
PIIPersonally Identifiable Information — data that can identify a real person (name, phone, PESEL); protected by GDPR.
PESELThe Polish national identification number — highly sensitive PII handled under strict GDPR controls.
MSISDNA mobile subscriber's phone number — the common key used to link telecom subscriber data together.
GDPR / RODOThe EU data-protection regulation (2016/679); "RODO" is its Polish-law name.
DSARData Subject Access Request — a formal request by an individual to access, correct, or erase their personal data.
CDRCall Detail Record — a record produced by a telecom network describing a call, SMS, or data session.
Data productA curated, packaged, governed dataset prepared for direct consumption, published in the Data Marketplace.
Data contractA versioned, agreed promise about a dataset's schema, types, and freshness, between a data producer and its consumers.
AI CopilotThe platform's conversational AI assistant — answers questions, writes SQL, builds pipelines, and diagnoses failures in plain language.
RAGRetrieval-Augmented Generation — an AI technique that feeds the model relevant real catalog data so its answers are grounded in fact.
MigrationThe automated conversion of legacy ETL workflows (Informatica PowerCenter, Alteryx) into DataFlow AI pipelines.
Push-down dialectThe specific SQL "flavour" of a target database (Teradata, Snowflake, Oracle, etc.) that the platform generates native SQL for.
StewardThe person responsible for data governance — quality, lineage, certification, privacy, and pipeline review.
SSOSingle Sign-On — logging in once with your corporate credentials to access the platform, with no separate password.
Previous
AI Copilot