Get started
Hi there. Great to have you here. This should get you started with bamboost
.
📄️ Installation
Basic installation
📄️ Basic usage
Create, view or edit an existing database:
Goal
Finding the right data in a large amount of files is hard. By using a database, we can query the right data by its parameters or metadata. Data is stored automatically and the naming is no longer important. Unfortunately, most database solutions require significant initial setup and maintenance. The goal is to provide a database system with zero overhead.
- No setup required.
- Uses HDF5 to store data, allowing fast and specific access to your data.
- Minimizes the number of files.
- Create new databases on the fly.
- Add new entries to a database with a single command.
- Helps you to generate jobs, specifically on ETH's Euler cluster.
Basic usage
Create, view or edit an existing database:
from bamboost import Manager
db = Manager('path/to/database')
Create a new entry in the database:
new_entry = db.create_simulation(
parameters: dict = {...},
)
Display the content in the database:
db.df # Returns a pandas DataFrame
Select a specific entry:
db['simulation_1'] # Returns a Simulation object
Or leverage pandas to query the database:
db.df[(db.df['param_1'] == 1.0) && (db.df['param_2'] > 2.0)]
Concept
The fundamental unit in this framework is a database, serving as a container for organized data, containing simulations. Conceptually, a database is a directory housing subdirectories for individual simulations. Within each simulation directory, a primary HDF5 file is present, storing all metadata and data. Additionally, users have the flexibility to store supplementary files within the simulation directory.
Database_A
|-- Simulation_1
| |-- simulation_1.h5
| |-- additional_file_1.txt
| |-- additional_file_2.csv
|-- Simulation_2
| |-- simulation_2.h5
| |-- additional_file_3.txt
The goal is to maintain comparability within a single database, implying similarity in the nature of the data. For instance, you might have a database for simulations of a parametric study or a database for simulations of a specific system with varying initial conditions.