Getting Started
This page covers installing folio-python, initializing the FOLIO graph (default, pinned branch, or custom HTTP source), managing the cache, loading configuration from ~/.folio/config.json, and accessing any of the 18,323 classes by IRI, integer position, or label.
Installation
From PyPI
The base install has just three dependencies — pydantic, lxml, and httpx — and is enough to load, parse, traverse, and serialize the ontology.
# uv (recommended)
uv add folio-python
# pip
pip install folio-pythonFor fuzzy label search, prefix search via marisa-trie, and LLM-backed semantic search, install the [search] extra. This is what most applications want:
# uv (recommended)
uv add 'folio-python[search]'
# pip
pip install 'folio-python[search]'The extra adds rapidfuzz, marisa-trie, and alea-llm-client — enabling search_by_label, search_by_definition, search_by_prefix, the fuzzy match mode of query(), and the search_by_llm / parallel_search_by_llm methods on the LLM Integration page.
If you use
uvto manage your project,uv addis the natural fit and lets you keeppyproject.tomlanduv.lockin sync. If you don’t,pip installworks exactly the same — the published wheel is identical.
From source
To track the latest development version straight from GitHub:
# uv
uv add 'folio-python[search] @ git+https://github.com/alea-institute/folio-python@main'
# pip
pip install --upgrade 'folio-python[search] @ https://github.com/alea-institute/folio-python/archive/refs/heads/main.zip'Swap main for any branch or tag to pin to a specific ref.
Migrating from soli-python
The package was renamed from soli-python (v0.1.x) to folio-python to match the FOLIO standard. Uninstall the old package first to avoid conflicts:
# uv
uv remove soli-python
uv add 'folio-python[search]'
# pip
pip uninstall soli-python
pip install 'folio-python[search]'The import path changed from soli to folio. Legacy soli: and http://lmss.sali.org/ IRIs are still normalized transparently, so existing data keeps working — see Accessing classes below.
Initializing the FOLIO Graph
The FOLIO class is the single entry point. Zero-argument construction loads the default FOLIO 2.0.0 ontology from the alea-institute/FOLIO GitHub repo and caches it on disk for fast subsequent loads.
import time
from folio import FOLIO
t0 = time.time()
folio = FOLIO()
print(f"Loaded in {time.time() - t0:.2f}s")
print(f"{len(folio)} classes, {len(folio.triples)} triples")
# Output (cold cache, first run):
# Loaded in 1.30s
# 18323 classes, 130427 triples
#
# Output (warm cache, subsequent runs):
# Loaded in 1.27s
# 18323 classes, 130427 triplesRoughly 1.5 s on a cold cache and 1.1 s on a warm one — most of the warm-cache time is XML parsing, not I/O. The first call downloads FOLIO.owl into ~/.folio/cache/; subsequent constructions read from there.
Custom GitHub source
Every GitHub parameter on FOLIO.__init__ is prefixed with github_repo_ — not owner, repo, or branch. FOLIO(branch="main") raises TypeError. Use the full names:
from folio import FOLIO
# Pin to a specific branch, tag, or fork
folio = FOLIO(
github_repo_owner="alea-institute",
github_repo_name="FOLIO",
github_repo_branch="main",
)
print(folio.github_repo_branch)
# Output:
# mainNote: the loader looks for
FOLIO.owlat the root of whichever branch you pin. The pre-rename1.0.0branch carriesSOLI.owlinstead and will return a 404 — use2.0.0(the default),main, or any branch that containsFOLIO.owl.
Custom HTTP source
To load from any HTTP URL (a mirror, an internal artifact server, a local file served by python -m http.server), set source_type="http" and pass the full URL via http_url:
from folio import FOLIO
folio = FOLIO(
source_type="http",
http_url="https://example.internal/ontologies/FOLIO.owl",
)The HTTP loader respects the same cache as the GitHub loader, keyed by URL.
Listing available branches
FOLIO.list_branches() is a static method that hits the GitHub API and returns every branch in the repo — useful for discovering tagged releases before pinning:
from folio import FOLIO
print(FOLIO.list_branches())
# Output:
# ['1.0.0', '2.0.0', 'main', 'ontology-enhancements-refund-preflabels']Pass repo_owner and repo_name to list branches of a fork. Older branches from the pre-rename soli-python era (e.g. 1.0.0) appear in this list but cannot be loaded — they don’t contain a FOLIO.owl file at the expected path.
Caching and refresh
The cache lives at ~/.folio/cache/ — exported as DEFAULT_CACHE_DIR in folio.graph — keyed by source parameters (owner, repo, branch, or HTTP URL).
from folio.graph import DEFAULT_CACHE_DIR
print(DEFAULT_CACHE_DIR)
# Output:
# /home/<you>/.folio/cacheuse_cache=True is the default. To force a fresh download (CI, or when the upstream branch has moved), disable caching:
folio = FOLIO(use_cache=False)To re-pull the same source after initialization — typical inside a long-running service — call refresh(). It always bypasses the cache and re-fetches from the owner, repo, and branch the instance was constructed with:
folio.refresh()Configuration via FOLIOConfiguration
For teams that prefer a config file over inline arguments, folio-python ships a Pydantic model, FOLIOConfiguration, and a default config path at ~/.folio/config.json. The file format is:
{
"folio": {
"source": "github",
"repo_owner": "alea-institute",
"repo_name": "FOLIO",
"branch": "2.0.0",
"path": "FOLIO.owl",
"use_cache": true
}
}Load it with the load_config classmethod:
from folio import FOLIO
from folio.config import FOLIOConfiguration
config = FOLIOConfiguration.load_config() # reads ~/.folio/config.json
print(config.source, config.repo_owner, config.repo_name, config.branch)
# Pass the fields explicitly to FOLIO
folio = FOLIO(
source_type=config.source,
github_repo_owner=config.repo_owner,
github_repo_name=config.repo_name,
github_repo_branch=config.branch,
use_cache=config.use_cache,
)A few non-obvious things about FOLIOConfiguration, all confirmed by running the library directly:
sourceis required. Zero-argFOLIOConfiguration()raisespydantic_core.ValidationError: 1 validation error for FOLIOConfiguration — source — Field required. The field is namedsource, notsource_type(which is theFOLIO.__init__kwarg it maps to).- No populated defaults for the other fields.
repo_owner,repo_name,branch,url, andpathall default toNone. TheDEFAULT_GITHUB_REPO_OWNER,DEFAULT_GITHUB_REPO_NAME, andDEFAULT_GITHUB_REPO_BRANCHconstants exist at module level infolio.config, but they are not wired in as field defaults. Onlyuse_cache=Truehas a populated default. FOLIOConfigurationis not consumed byFOLIO.__init__. The constructor doesn’t take a config object — read fields off the config and pass them as kwargs yourself, as shown above.
A minimal valid config object looks like this:
from folio.config import FOLIOConfiguration
cfg = FOLIOConfiguration(source="github")
print(cfg.source, cfg.repo_owner, cfg.branch, cfg.use_cache)
# Output:
# github None None TrueMissing GitHub fields fall through to the FOLIO.__init__ defaults when you construct the graph.
Accessing classes
Once FOLIO() returns, the instance behaves like a hybrid dict/sequence keyed by IRI or integer position. len(folio) gives the class count:
from folio import FOLIO
folio = FOLIO()
print(len(folio))
# Output:
# 18323By integer position
Integer indexing returns the OWLClass at that position in folio.classes. Out-of-range indices return None rather than raising — handy when you’re paginating or sampling:
print(folio[0].label)
print(folio[10].label)
print(folio[100].label)
print(folio[10**9])
# Output:
# Other Personal and Household Goods Repair and Maintenance
# Chocolate and Confectionery Manufacturing from Cacao Beans
# Scenic and Sightseeing Transportation, Land
# NoneOrdinary iteration falls through to the same __getitem__(int) path, so for cls in folio: walks every class in insertion order:
print(next(iter(folio)).label)
# Output:
# Other Personal and Household Goods Repair and MaintenanceBy IRI
String indexing goes through FOLIO.normalize_iri first, which accepts every supported form and rewrites it to the canonical https://folio.openlegalstandard.org/… URL. All five of these return the same Michigan class:
short = "R8BD30978Ccbc4C2f0f8459f"
print(folio[short].label) # short ID
print(folio[f"folio:{short}"].label) # prefixed
print(folio[f"https://folio.openlegalstandard.org/{short}"].label) # full URI
print(folio[f"soli:{short}"].label) # legacy SOLI prefix
print(folio[f"http://lmss.sali.org/{short}"].label) # legacy LMSS URL
# Output:
# Michigan
# Michigan
# Michigan
# Michigan
# Michigan__contains__ uses the same normalization, so in checks work with any form. Unknown IRIs return None rather than raising KeyError:
print("R8BD30978Ccbc4C2f0f8459f" in folio)
print(folio["RNotARealClassId"])
# Output:
# True
# NoneBy label
get_by_label and get_by_alt_label do exact-match lookups against the indexed labels — they’re not fuzzy. For fuzzy matching, use search_by_label on the Searching page. Both methods return a List[OWLClass] because FOLIO allows label collisions.
print([c.iri for c in folio.get_by_label("Michigan")])
print([c.iri for c in folio.get_by_alt_label("US+MI")])
# Output:
# ['https://folio.openlegalstandard.org/R8BD30978Ccbc4C2f0f8459f']
# ['https://folio.openlegalstandard.org/R8BD30978Ccbc4C2f0f8459f']Pass include_alt_labels=True to get_by_label to also search the alt-label index, and include_hidden_labels=True to get_by_alt_label (the default) to also search the primary-label index.
Reading class fields
OWLClass is a Pydantic model with 22 fields drawn from RDFS, OWL, SKOS, Dublin Core, and MADS. Here’s Michigan (R8BD30978Ccbc4C2f0f8459f) captured live from the 2.0.0 ontology:
mi = folio["R8BD30978Ccbc4C2f0f8459f"]
print(f"iri: {mi.iri}")
print(f"label: {mi.label}")
print(f"preferred_label: {mi.preferred_label}")
print(f"alternative_labels: {mi.alternative_labels}")
print(f"hidden_label: {mi.hidden_label}")
print(f"identifier: {mi.identifier}")
print(f"sub_class_of: {mi.sub_class_of}")
print(f"parent_class_of: {mi.parent_class_of}")
print(f"deprecated: {mi.deprecated}")
print(f"parent label: {folio[mi.sub_class_of[0]].label}")
# Output:
# iri: https://folio.openlegalstandard.org/R8BD30978Ccbc4C2f0f8459f
# label: Michigan
# preferred_label: Michigan
# alternative_labels: ['US+MI']
# hidden_label: US+MI
# identifier: NAM-US-US+MI
# sub_class_of: ['https://folio.openlegalstandard.org/R1E70ce4D699e90144cB32b8']
# parent_class_of: []
# deprecated: False
# parent label: United States of America (Location)The full field catalog:
| Field | Type | Source |
|---|---|---|
iri | str | owl:Class |
label | Optional[str] | rdfs:label |
sub_class_of | List[str] | rdfs:subClassOf — IRIs of parent classes |
parent_class_of | List[str] | inverse of rdfs:subClassOf — IRIs of child classes (see note) |
is_defined_by | Optional[str] | rdfs:isDefinedBy |
see_also | List[str] | rdfs:seeAlso |
comment | Optional[str] | rdfs:comment |
deprecated | bool | owl:deprecated (default False) |
preferred_label | Optional[str] | skos:prefLabel (indexed for search since 0.3.4) |
alternative_labels | List[str] | skos:altLabel |
translations | Dict[str, str] | locale-keyed labels |
hidden_label | Optional[str] | skos:hiddenLabel |
definition | Optional[str] | skos:definition |
examples | List[str] | skos:example |
notes | List[str] | skos:note |
history_note | Optional[str] | skos:historyNote |
editorial_note | Optional[str] | skos:editorialNote |
in_scheme | Optional[str] | skos:inScheme |
identifier | Optional[str] | dc:identifier |
description | Optional[str] | dc:description |
source | Optional[str] | dc:source |
country | Optional[str] | mads:country |
Heads-up:
parent_class_ofis misnamed. It actually holds the IRIs of children classes — the classes for which this class is the parent.sub_class_ofwalks up,parent_class_ofwalks down. Michigan is a leaf, so itsparent_class_ofis empty;Contract Law(RCIPwpgRpMs1eVz4vPid0pV) has a populated list. Treat it as “children IRIs” and the mental model clicks.
Public attributes on the FOLIO instance
Beyond [], in, and len(), the FOLIO instance exposes public data structures populated by the end of __init__. All counts below are from the 2.0.0 default.
folio.classes—List[OWLClass]in insertion order (length18323).folio.iri_to_index—Dict[str, int]mapping canonical IRI to position infolio.classes. Use this when you need the index itself.folio.label_to_index—Dict[str, List[int]]mapping exact label to positions; backsget_by_label.folio.alt_label_to_index—Dict[str, List[int]]forskos:altLabelandskos:hiddenLabel; backsget_by_alt_label.folio.iri_to_property_index— same shape asiri_to_indexfor the 175OWLObjectPropertyentries on the Properties & Relationships page.folio.property_label_to_index— label → property indices.folio.class_edges— adjacency structure used byget_children/get_subgraph.folio.triples— the raw list of 130,427(subject, predicate, object)tuples with prefixed-name predicates (rdfs:label,skos:prefLabel,dc:identifier, …).folio.title/folio.description— ontology metadata ('FOLIO'and'Federated Open Legal Information Ontology (FOLIO)').folio.llm/folio.llm_kwargs— the auto-initializedalea_llm_client.OpenAIModelinstance and any provider kwargs derived fromeffort/tier. See LLM Integration.
One common first guess that does not exist is folio.classes_by_iri — hasattr(folio, "classes_by_iri") returns False. Use folio[iri] for the class, or folio.iri_to_index[iri] for the numeric position.
What’s next
You can now load the ontology and access any class by IRI, integer, or exact label. For real work you’ll want something more powerful:
- Searching — fuzzy label search, definition search, and case-insensitive prefix search (new in 0.3.5) for when you don’t know the exact label.
- Querying —
query()andquery_properties()with composable substring, exact, regex, and fuzzy match modes plus structural filters likeparent_iriandin_scheme. - Taxonomy — the 24 branch helpers (
get_areas_of_law,get_locations,get_industries, …) plusget_parents,get_children, andget_subgraph. - API Reference — the complete method catalog for
FOLIO,OWLClass,OWLObjectProperty, andFOLIOConfiguration.
See also: Searching for when exact-match lookups aren’t enough, or the API Reference for the full method catalog.