Skip to main content

Volume Filtering in 1-Minute Backtests

One-minute data is seductive — high resolution, lots of samples, “feels” like you’re closer to market truth.

But raw 1-minute bars lie. They mix real bars (liquid, information-bearing) with dust bars (low volume, random noise).
Without filtering, your study’s entire statistical foundation bends around that noise.


Why Volume Thresholds Matter

Including dust bars causes:

  • Inflated tails — random ticks masquerade as alpha.
  • Distorted wick structure — false volatility spikes.
  • Unstable paths — patterns collapse out-of-sample.

Filtering for minimum liquidity keeps the sample representative of meaningful market reactions rather than random microstructure drift.


How to Define Liquidity Thresholds

1. Relative to Recent Volume

vol_rel = df.volume / df.volume.rolling(50, min_periods=10).median()
cond_liquid = vol_rel >= 0.5
cond = cond_signal & cond_liquid

Adapts dynamically to local rhythm — robust across assets and regimes.


2. Relative to Time-of-Day

Volume naturally forms a U-shape intraday curve (open and close active, midday lull). To avoid penalizing quiet periods that are normal for their time:

tod_vol = df.groupby(df.index.time)['volume'].transform('median')
vol_z = (df.volume - tod_vol) / tod_vol.rolling(50, min_periods=10).std()
cond_liquid = vol_z > -1

This prevents misclassifying midday quiet bars as “illiquid.”


3. Absolute Floor (Fail-Safe)

cond_liquid &= df.volume > 50

Use an exchange-specific hard cutoff so ghost bars never slip through.


How Filtering Affects Confidence

Sample Size vs Reliability

Filtering cuts your n, but improves statistical quality:

MetricBefore FilterAfter Filter
N events800650
Mean return+11%+12%
CI width0.0280.022

The loss of coverage is outweighed by reduced variance. Plot coverage (n_filtered / n_total) vs CI width — the “elbow” is your sweet spot.


Bias Risk

Signals often correlate with volume (e.g. breakouts cause spikes). Too aggressive a filter double-counts that effect.

Sanity check: Overlay volume-normalized results vs unfiltered. If median volume explodes at t=0, consider lighter filtering or stratified sampling.


Confidence Intervals via Bootstrap

boot = np.random.choice(cum_ret[:, h], size=1000, replace=True)
ci_low, ci_high = np.percentile(boot, [2.5, 97.5])

Compare before/after filtering — narrower bands imply more consistent paths.


Practical Heuristics

For liquid crypto or equities:

cond_liquid = df.volume >= 0.25 * df.volume.rolling(100).median()

Usually retains 75-90 % of events while cutting most noise.

💡 The sweet spot is where variance reduction outweighs sample loss — typically the 20–30 th percentile of relative volume.


Next-Step Ideas

You could visualize or cluster events by liquidity signature:

  • K-means on standardized (vol_rel, spread, vol_z) → groups of structurally similar bars
  • Heatmaps or violin plots in Tableau / Grafana / Plotly → show how pattern reliability scales with liquidity percentile
  • Volume-weighted confidence surfaces → 3D: {lookahead × vol percentile → mean return}

That exposes the regimes where your model genuinely holds — and where it’s just microstructure noise pretending to be edge.


Summary

  • 1-minute bars contain both signal and dust.
  • Volume gating improves signal fidelity and confidence.
  • The best filters adapt to local rhythm (relative, not absolute).
  • Always test both filtered and unfiltered sets to detect volume-linked bias.
  • Evaluate trade-off via coverage vs CI width — the elbow defines your sweet spot.

Goal: Reduce variance faster than you reduce sample count.

That’s the hallmark of a robust micro-scale event study.

Some visualization Ideas

🧱 Base Data Schema (for Tableau or any BI tool)

You want each row to represent one event-window observation.

ColumnTypeRoleNotes
event_idDimensionUnique identifier for each triggered eventUsed for count distinct / grouping
t_offsetDimension (numeric or discrete)Relative bar index (e.g. –10 … +30)X-axis for event-aligned plots
returnMeasureEvent-aligned return at t_offsetY-axis metric
volumeMeasureRaw traded volume for barUsed for scaling or normalization
vol_relMeasureRelative volume (vs rolling median or time-of-day)Primary liquidity feature
signal_typeDimensione.g., “breakout_up”, “mean_revert_down”Filters in Tableau
pairDimensione.g., “SOL-PERP”, “BTC-PERP”Optional facet
sessionDimensione.g., “Asia”, “US”, “EU” (could combine with DoW, DoM)Datetime stratification
standard tableau datetime dimsDimensione.g., hour, day of week (DoW), etcDatetime stratification
cond_liquidBoolean Dimension1 if above volume thresholdFor split comparison
sample_groupDimensione.g., “Filtered” vs “Unfiltered”Used for color/split

That’s all you need — everything else is derived.


📊 Core Tableau Views

1. Event Path Plot

Goal: Compare average return trajectories with and without volume filtering.

RoleField
Columnst_offset
RowsAVG([return])
Colorsample_group (“Filtered” vs “Unfiltered”)
TooltipN = COUNTD([event_id])
Filtersignal_type = desired signal

→ Add a reference band for 0-return and optional confidence intervals.


2. CI Width vs Volume Percentile

If you precompute bootstrapped CI width for each horizon:

RoleField
Columnsvol_rel_percentile
Rowsci_width
Colort_offset
Tooltipcoverage %, N events

→ Shows where reliability (narrow CI) improves most sharply — the “sweet spot” volume band.


3. Heatmap: Mean Return by (t_offset × Volume Decile)

RoleField
Columnst_offset
Rowsvol_rel_decile
ColorAVG([return])
Tooltipmean ± std, N events

→ Reveals how return asymmetry or momentum persists differently under low/high liquidity regimes.


4. Distribution Comparison

Goal: Visualize volatility compression after filtering.

RoleField
Columnssample_group
Rowsreturn (histogram / boxplot)
Tooltipmedian, IQR, N

→ Confirms the variance reduction effect quantitatively.


5. Coverage Curve

RoleField
Columnsvolume_threshold (as % of rolling median)
Rowscoverage = N_filtered / N_total
Secondary Axisci_width
Dual Axis TypeLine / Line
Colormetric (“Coverage” vs “CI Width”)

→ The intersection (“elbow”) shows the optimal filtering level.


🧮 Derived Calculations (in Tableau)

FieldFormulaPurpose
vol_rel[volume] / WINDOW_MEDIAN([volume])Relative liquidity
coverageCOUNTD(IF [cond_liquid] THEN [event_id] END) / COUNTD([event_id])% retained after filter
return_norm[return] / STDEV([return])Optional normalization
ci_widthWINDOW_PERCENTILE([return], 97.5) - WINDOW_PERCENTILE([return], 2.5)95% CI span