Getting Started

This page covers installing folio-python, initializing the FOLIO graph (default, pinned branch, or custom HTTP source), managing the cache, loading configuration from ~/.folio/config.json, and accessing any of the 18,323 classes by IRI, integer position, or label.

Installation

From PyPI

The base install has just three dependencies — pydantic, lxml, and httpx — and is enough to load, parse, traverse, and serialize the ontology.

# uv (recommended)
uv add folio-python

# pip
pip install folio-python

For fuzzy label search, prefix search via marisa-trie, and LLM-backed semantic search, install the [search] extra. This is what most applications want:

# uv (recommended)
uv add 'folio-python[search]'

# pip
pip install 'folio-python[search]'

The extra adds rapidfuzz, marisa-trie, and alea-llm-client — enabling search_by_label, search_by_definition, search_by_prefix, the fuzzy match mode of query(), and the search_by_llm / parallel_search_by_llm methods on the LLM Integration page.

If you use uv to manage your project, uv add is the natural fit and lets you keep pyproject.toml and uv.lock in sync. If you don’t, pip install works exactly the same — the published wheel is identical.

From source

To track the latest development version straight from GitHub:

# uv
uv add 'folio-python[search] @ git+https://github.com/alea-institute/folio-python@main'

# pip
pip install --upgrade 'folio-python[search] @ https://github.com/alea-institute/folio-python/archive/refs/heads/main.zip'

Swap main for any branch or tag to pin to a specific ref.

Migrating from soli-python

The package was renamed from soli-python (v0.1.x) to folio-python to match the FOLIO standard. Uninstall the old package first to avoid conflicts:

# uv
uv remove soli-python
uv add 'folio-python[search]'

# pip
pip uninstall soli-python
pip install 'folio-python[search]'

The import path changed from soli to folio. Legacy soli: and http://lmss.sali.org/ IRIs are still normalized transparently, so existing data keeps working — see Accessing classes below.

Initializing the FOLIO Graph

The FOLIO class is the single entry point. Zero-argument construction loads the default FOLIO 2.0.0 ontology from the alea-institute/FOLIO GitHub repo and caches it on disk for fast subsequent loads.

import time
from folio import FOLIO

t0 = time.time()
folio = FOLIO()
print(f"Loaded in {time.time() - t0:.2f}s")
print(f"{len(folio)} classes, {len(folio.triples)} triples")

# Output (cold cache, first run):
# Loaded in 1.30s
# 18323 classes, 130427 triples
#
# Output (warm cache, subsequent runs):
# Loaded in 1.27s
# 18323 classes, 130427 triples

Roughly 1.5 s on a cold cache and 1.1 s on a warm one — most of the warm-cache time is XML parsing, not I/O. The first call downloads FOLIO.owl into ~/.folio/cache/; subsequent constructions read from there.

Custom GitHub source

Every GitHub parameter on FOLIO.__init__ is prefixed with github_repo_not owner, repo, or branch. FOLIO(branch="main") raises TypeError. Use the full names:

from folio import FOLIO

# Pin to a specific branch, tag, or fork
folio = FOLIO(
    github_repo_owner="alea-institute",
    github_repo_name="FOLIO",
    github_repo_branch="main",
)
print(folio.github_repo_branch)
# Output:
# main

Note: the loader looks for FOLIO.owl at the root of whichever branch you pin. The pre-rename 1.0.0 branch carries SOLI.owl instead and will return a 404 — use 2.0.0 (the default), main, or any branch that contains FOLIO.owl.

Custom HTTP source

To load from any HTTP URL (a mirror, an internal artifact server, a local file served by python -m http.server), set source_type="http" and pass the full URL via http_url:

from folio import FOLIO

folio = FOLIO(
    source_type="http",
    http_url="https://example.internal/ontologies/FOLIO.owl",
)

The HTTP loader respects the same cache as the GitHub loader, keyed by URL.

Listing available branches

FOLIO.list_branches() is a static method that hits the GitHub API and returns every branch in the repo — useful for discovering tagged releases before pinning:

from folio import FOLIO

print(FOLIO.list_branches())
# Output:
# ['1.0.0', '2.0.0', 'main', 'ontology-enhancements-refund-preflabels']

Pass repo_owner and repo_name to list branches of a fork. Older branches from the pre-rename soli-python era (e.g. 1.0.0) appear in this list but cannot be loaded — they don’t contain a FOLIO.owl file at the expected path.

Caching and refresh

The cache lives at ~/.folio/cache/ — exported as DEFAULT_CACHE_DIR in folio.graph — keyed by source parameters (owner, repo, branch, or HTTP URL).

from folio.graph import DEFAULT_CACHE_DIR
print(DEFAULT_CACHE_DIR)
# Output:
# /home/<you>/.folio/cache

use_cache=True is the default. To force a fresh download (CI, or when the upstream branch has moved), disable caching:

folio = FOLIO(use_cache=False)

To re-pull the same source after initialization — typical inside a long-running service — call refresh(). It always bypasses the cache and re-fetches from the owner, repo, and branch the instance was constructed with:

folio.refresh()

Configuration via FOLIOConfiguration

For teams that prefer a config file over inline arguments, folio-python ships a Pydantic model, FOLIOConfiguration, and a default config path at ~/.folio/config.json. The file format is:

{
  "folio": {
    "source": "github",
    "repo_owner": "alea-institute",
    "repo_name": "FOLIO",
    "branch": "2.0.0",
    "path": "FOLIO.owl",
    "use_cache": true
  }
}

Load it with the load_config classmethod:

from folio import FOLIO
from folio.config import FOLIOConfiguration

config = FOLIOConfiguration.load_config()  # reads ~/.folio/config.json
print(config.source, config.repo_owner, config.repo_name, config.branch)

# Pass the fields explicitly to FOLIO
folio = FOLIO(
    source_type=config.source,
    github_repo_owner=config.repo_owner,
    github_repo_name=config.repo_name,
    github_repo_branch=config.branch,
    use_cache=config.use_cache,
)

A few non-obvious things about FOLIOConfiguration, all confirmed by running the library directly:

  • source is required. Zero-arg FOLIOConfiguration() raises pydantic_core.ValidationError: 1 validation error for FOLIOConfiguration — source — Field required. The field is named source, not source_type (which is the FOLIO.__init__ kwarg it maps to).
  • No populated defaults for the other fields. repo_owner, repo_name, branch, url, and path all default to None. The DEFAULT_GITHUB_REPO_OWNER, DEFAULT_GITHUB_REPO_NAME, and DEFAULT_GITHUB_REPO_BRANCH constants exist at module level in folio.config, but they are not wired in as field defaults. Only use_cache=True has a populated default.
  • FOLIOConfiguration is not consumed by FOLIO.__init__. The constructor doesn’t take a config object — read fields off the config and pass them as kwargs yourself, as shown above.

A minimal valid config object looks like this:

from folio.config import FOLIOConfiguration

cfg = FOLIOConfiguration(source="github")
print(cfg.source, cfg.repo_owner, cfg.branch, cfg.use_cache)
# Output:
# github None None True

Missing GitHub fields fall through to the FOLIO.__init__ defaults when you construct the graph.

Accessing classes

Once FOLIO() returns, the instance behaves like a hybrid dict/sequence keyed by IRI or integer position. len(folio) gives the class count:

from folio import FOLIO
folio = FOLIO()
print(len(folio))
# Output:
# 18323

By integer position

Integer indexing returns the OWLClass at that position in folio.classes. Out-of-range indices return None rather than raising — handy when you’re paginating or sampling:

print(folio[0].label)
print(folio[10].label)
print(folio[100].label)
print(folio[10**9])
# Output:
# Other Personal and Household Goods Repair and Maintenance
# Chocolate and Confectionery Manufacturing from Cacao Beans
# Scenic and Sightseeing Transportation, Land
# None

Ordinary iteration falls through to the same __getitem__(int) path, so for cls in folio: walks every class in insertion order:

print(next(iter(folio)).label)
# Output:
# Other Personal and Household Goods Repair and Maintenance

By IRI

String indexing goes through FOLIO.normalize_iri first, which accepts every supported form and rewrites it to the canonical https://folio.openlegalstandard.org/… URL. All five of these return the same Michigan class:

short = "R8BD30978Ccbc4C2f0f8459f"

print(folio[short].label)                                             # short ID
print(folio[f"folio:{short}"].label)                                  # prefixed
print(folio[f"https://folio.openlegalstandard.org/{short}"].label)    # full URI
print(folio[f"soli:{short}"].label)                                   # legacy SOLI prefix
print(folio[f"http://lmss.sali.org/{short}"].label)                   # legacy LMSS URL
# Output:
# Michigan
# Michigan
# Michigan
# Michigan
# Michigan

__contains__ uses the same normalization, so in checks work with any form. Unknown IRIs return None rather than raising KeyError:

print("R8BD30978Ccbc4C2f0f8459f" in folio)
print(folio["RNotARealClassId"])
# Output:
# True
# None

By label

get_by_label and get_by_alt_label do exact-match lookups against the indexed labels — they’re not fuzzy. For fuzzy matching, use search_by_label on the Searching page. Both methods return a List[OWLClass] because FOLIO allows label collisions.

print([c.iri for c in folio.get_by_label("Michigan")])
print([c.iri for c in folio.get_by_alt_label("US+MI")])
# Output:
# ['https://folio.openlegalstandard.org/R8BD30978Ccbc4C2f0f8459f']
# ['https://folio.openlegalstandard.org/R8BD30978Ccbc4C2f0f8459f']

Pass include_alt_labels=True to get_by_label to also search the alt-label index, and include_hidden_labels=True to get_by_alt_label (the default) to also search the primary-label index.

Reading class fields

OWLClass is a Pydantic model with 22 fields drawn from RDFS, OWL, SKOS, Dublin Core, and MADS. Here’s Michigan (R8BD30978Ccbc4C2f0f8459f) captured live from the 2.0.0 ontology:

mi = folio["R8BD30978Ccbc4C2f0f8459f"]
print(f"iri:                 {mi.iri}")
print(f"label:               {mi.label}")
print(f"preferred_label:     {mi.preferred_label}")
print(f"alternative_labels:  {mi.alternative_labels}")
print(f"hidden_label:        {mi.hidden_label}")
print(f"identifier:          {mi.identifier}")
print(f"sub_class_of:        {mi.sub_class_of}")
print(f"parent_class_of:     {mi.parent_class_of}")
print(f"deprecated:          {mi.deprecated}")
print(f"parent label:        {folio[mi.sub_class_of[0]].label}")

# Output:
# iri:                 https://folio.openlegalstandard.org/R8BD30978Ccbc4C2f0f8459f
# label:               Michigan
# preferred_label:     Michigan
# alternative_labels:  ['US+MI']
# hidden_label:        US+MI
# identifier:          NAM-US-US+MI
# sub_class_of:        ['https://folio.openlegalstandard.org/R1E70ce4D699e90144cB32b8']
# parent_class_of:     []
# deprecated:          False
# parent label:        United States of America (Location)

The full field catalog:

FieldTypeSource
iristrowl:Class
labelOptional[str]rdfs:label
sub_class_ofList[str]rdfs:subClassOf — IRIs of parent classes
parent_class_ofList[str]inverse of rdfs:subClassOf — IRIs of child classes (see note)
is_defined_byOptional[str]rdfs:isDefinedBy
see_alsoList[str]rdfs:seeAlso
commentOptional[str]rdfs:comment
deprecatedboolowl:deprecated (default False)
preferred_labelOptional[str]skos:prefLabel (indexed for search since 0.3.4)
alternative_labelsList[str]skos:altLabel
translationsDict[str, str]locale-keyed labels
hidden_labelOptional[str]skos:hiddenLabel
definitionOptional[str]skos:definition
examplesList[str]skos:example
notesList[str]skos:note
history_noteOptional[str]skos:historyNote
editorial_noteOptional[str]skos:editorialNote
in_schemeOptional[str]skos:inScheme
identifierOptional[str]dc:identifier
descriptionOptional[str]dc:description
sourceOptional[str]dc:source
countryOptional[str]mads:country

Heads-up: parent_class_of is misnamed. It actually holds the IRIs of children classes — the classes for which this class is the parent. sub_class_of walks up, parent_class_of walks down. Michigan is a leaf, so its parent_class_of is empty; Contract Law (RCIPwpgRpMs1eVz4vPid0pV) has a populated list. Treat it as “children IRIs” and the mental model clicks.

Public attributes on the FOLIO instance

Beyond [], in, and len(), the FOLIO instance exposes public data structures populated by the end of __init__. All counts below are from the 2.0.0 default.

  • folio.classesList[OWLClass] in insertion order (length 18323).
  • folio.iri_to_indexDict[str, int] mapping canonical IRI to position in folio.classes. Use this when you need the index itself.
  • folio.label_to_indexDict[str, List[int]] mapping exact label to positions; backs get_by_label.
  • folio.alt_label_to_indexDict[str, List[int]] for skos:altLabel and skos:hiddenLabel; backs get_by_alt_label.
  • folio.iri_to_property_index — same shape as iri_to_index for the 175 OWLObjectProperty entries on the Properties & Relationships page.
  • folio.property_label_to_index — label → property indices.
  • folio.class_edges — adjacency structure used by get_children / get_subgraph.
  • folio.triples — the raw list of 130,427 (subject, predicate, object) tuples with prefixed-name predicates (rdfs:label, skos:prefLabel, dc:identifier, …).
  • folio.title / folio.description — ontology metadata ('FOLIO' and 'Federated Open Legal Information Ontology (FOLIO)').
  • folio.llm / folio.llm_kwargs — the auto-initialized alea_llm_client.OpenAIModel instance and any provider kwargs derived from effort / tier. See LLM Integration.

One common first guess that does not exist is folio.classes_by_irihasattr(folio, "classes_by_iri") returns False. Use folio[iri] for the class, or folio.iri_to_index[iri] for the numeric position.

What’s next

You can now load the ontology and access any class by IRI, integer, or exact label. For real work you’ll want something more powerful:

  • Searching — fuzzy label search, definition search, and case-insensitive prefix search (new in 0.3.5) for when you don’t know the exact label.
  • Queryingquery() and query_properties() with composable substring, exact, regex, and fuzzy match modes plus structural filters like parent_iri and in_scheme.
  • Taxonomy — the 24 branch helpers (get_areas_of_law, get_locations, get_industries, …) plus get_parents, get_children, and get_subgraph.
  • API Reference — the complete method catalog for FOLIO, OWLClass, OWLObjectProperty, and FOLIOConfiguration.

See also: Searching for when exact-match lookups aren’t enough, or the API Reference for the full method catalog.