Commit graph

696 commits

Author SHA1 Message Date
Awni Hannun d6492b0163
fix clip (#1415) 2024-09-14 16:09:09 -07:00
Awni Hannun b3f52c9fbe
ensure io/comm streams are active before eval (#1412) 2024-09-14 06:17:36 -07:00
c0g bd8396fad8
Fix typo in transformer docs (#1414) 2024-09-14 06:05:15 -07:00
Angelos Katharopoulos d0c58841d1
Patch bump (#1408) 2024-09-12 16:44:23 -07:00
Angelos Katharopoulos 881f09b2e2
Allow querying the allocator for the buffer size (#1404) 2024-09-11 21:02:16 -07:00
Awni Hannun 8b30acd7eb
fix module attribute set, reset, set (#1403) 2024-09-11 16:30:42 -07:00
Awni Hannun 02efb310ca
Xcode 160 (#1384)
* xcode 16.0 with debug tests

* limit nproc for builds

* vmap bug

* assert bug

* run python tests in debug mode

* fix view, bool copies preserve bits'

* actual view fix
2024-09-10 15:15:17 -07:00
Awni Hannun e7e59c6f05
Fix copying scalars by adding fill_gpu (#1402)
* fix copying scalars by adding fill_gpu

* Another copy scalar changed to fill

---------

Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2024-09-09 15:54:08 -07:00
Awni Hannun 3ae6aabe9f
throw for certain cases of non captured inputs in compile (#1401) 2024-09-09 14:54:31 -07:00
xnorai dc627dcb5e
Replace the use of result_of_t with invoke_result_t (#1397)
* Fix C++20 incompatibility

* Fix C++20 incompatibility
2024-09-06 19:52:57 -07:00
Max-Heinrich Laves efeb9c0f02
Transposed Convolution (#1245)
* initial implementation for conv_transpose

ran pre-commit

implemented conv_transpose

updated conv_general docstring

updated conv_general docstring

updated code comments

removed commented run_conv_checks

updated acknowledgments

added missing entry to ops.rst

added op to nn.layers

resolved merge conflicts

* removed ConvolutionTranspose primitive as suggested by reviewer

removed ConvolutionTranspose primitive as suggested by reviewer

* remove transpose flag, add another test

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-06 19:52:38 -07:00
Awni Hannun ba3e913c7a
Simplifications for MLX C (#1396)
* simplifications for MLX C

* use vectors instead of map

* update examples
2024-09-06 19:16:50 -07:00
Awni Hannun 7cca1727af
Fix slice data size (#1394)
* fix slice data size and add tests

* fix contiguous flag

* simplify stride and perform copy for non-contiguous arrays

* fix cpu

* comment
2024-09-04 19:10:43 -07:00
Bhargav Yagnik 11371fe251
Test to prevent bugs like #1386 (#1391)
* updated test_array for missing ops

* formatting changes
2024-09-04 17:24:30 -07:00
Awni Hannun 41c603d48a
fix jit reduce (#1395) 2024-09-04 14:03:10 -07:00
Angelos Katharopoulos 969337345f
Fix reduce edge case (#1389) 2024-09-01 21:37:51 -07:00
Awni Hannun 9592766939
add std as method (#1387)
* add std as method

* add std as method
2024-09-01 19:49:16 -07:00
Angelos Katharopoulos 58dca7d846
Fix copy in the sort primitive (#1383) 2024-08-31 08:32:14 -07:00
Awni Hannun 0d302cd25b
Fix compiel with byte sized constants (#1381) 2024-08-30 17:24:35 -07:00
Alex Barron da691257ec
Fix overflow in quantize/dequantize (#1379)
* add 2d indices to prevent overflow

* use nthreads not out size
2024-08-30 13:32:41 -07:00
Angelos Katharopoulos 1600092e92
Patch bump (#1376) 2024-08-29 16:54:30 -07:00
Awni Hannun dba2bd1105
Even Even Faster IO (#1374)
* even more faster io

* make reader pool static

* make python reader thread safe

* one more optimization
2024-08-29 16:05:40 -07:00
Alex Barron 28be4de7c2
Fix JIT reductions (#1373) 2024-08-28 16:39:11 -07:00
Awni Hannun a6c3b38fba
Async load (#1372)
* async load

* async load
2024-08-28 14:21:55 -07:00
Awni Hannun fcb65a3897
Even Faster I/O (#1369)
* try multithreading for faster IO

* smaller batch size

* Account for pread returning less than size

* nit

---------

Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2024-08-28 11:49:07 -07:00
Saanidhya 4e22a1dffe
In continuation to PR1243 to solve issue #1240 (#1365)
* Solves issue #1240

* Correction

* Update python/mlx/utils.py

* Update python/mlx/utils.py

---------

Co-authored-by: Awni Hannun <awni@apple.com>
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
2024-08-28 11:40:41 -07:00
Awni Hannun 291cf40aca
Some fixes to typing (#1371)
* some fixes to typing

* fix module reference

* comment
2024-08-28 11:16:19 -07:00
Jeethu Rao bd47e1f066
Fix neon_fast_exp and add more softmax tests (#1367) 2024-08-27 23:42:42 -07:00
Aditya Dhulipala e6b223df5f
Pinv (#875) 2024-08-27 23:06:12 -07:00
Angelos Katharopoulos e64349bbdd
Make eval just wait if all arrays are scheduled (#1368) 2024-08-27 17:01:22 -07:00
Angelos Katharopoulos cdb59faea6
Adds send/recv ops in distributed (#1366) 2024-08-26 23:01:37 -07:00
Alex Barron 1d94ac3f90
Add optional headers to `mx.fast.metal_kernel` (#1358) 2024-08-26 21:45:45 -07:00
Awni Hannun 5f7d19d1f5
MPI ops in GPU stream for faster comms (#1356) 2024-08-26 15:12:50 -07:00
Awni Hannun 2fdf9eb535
Fix ternary for large arrays (#1359)
* fix ternary for large arrays

* fix
2024-08-26 11:22:27 -07:00
Awni Hannun 860d3a50d7
fix extension metal library finding (#1361) 2024-08-26 09:18:50 -07:00
Alex Barron d1183821a7
int() and float() for mx.array (#1360) 2024-08-25 20:41:44 -07:00
Angelos Katharopoulos 8081df79be
Fix boolean all reduce bug (#1355) 2024-08-24 10:09:32 -07:00
Nripesh Niketan 64bec4fad7
Chore: update pre-commit hooks (#1353)
* Chore: update pre-commit refs

* run pre-commit
2024-08-24 06:46:36 -07:00
Alex Barron b96e105244
Add grid_sample example to metal_kernel docs (#1352)
* Add `zero_outputs` and `atomic_outputs` options to `metal_kernel`

* add grid sample to docs

* zero_outputs -> init_value

* add missing header for linux
2024-08-23 18:24:16 -07:00
Awni Hannun 3b4d5484c7
Bump extension MLX version (#1350)
* Bump extension MLX version

* fix some docs nits
2024-08-23 12:38:34 -07:00
Alex Barron 684e11c664
patch (#1347) 2024-08-23 10:42:02 -07:00
Angelos Katharopoulos b57a52813b
Further reduction tuning (#1349)
* More reduction tuning
* Forgotten pdb
* Small column long row specialization
2024-08-23 10:35:25 -07:00
Alex Barron da8deb2b62
fix bug with multiple attributes (#1348)
Co-authored-by: Alex Barron <abarron22@apple.com>
2024-08-23 10:06:15 -07:00
Awni Hannun 98b6ce3460
Refactor reductions and fix scatter atomics for large sizes (#1300)
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2024-08-22 16:03:31 -07:00
Awni Hannun f9e00efe31
fix nanobind and stub gen in circle (#1346) 2024-08-22 14:07:27 -07:00
Alex Barron 0fd2a1f4b0
Custom Metal Kernels from Python (#1325)
* start

* simple kernels working

* restructure

* inverse example working

* docs + fixes

* missing file

* fix imports

* address comments

* add docs + fix test

* Review comments + refactor to a single function

* update docs

* remove hashing

* fix contig bug in test

* back to a class

* trailing whitespace

* fix tests

* match c++ and python apis

* add link + make args kw_only
2024-08-22 13:46:29 -07:00
Awni Hannun df3233454d
2d gather specialization (#1339) 2024-08-22 10:48:24 -07:00
Awni Hannun 82db84b899
bump nanobind + fix extension (#1344) 2024-08-21 16:05:07 -07:00
Awni Hannun 8ae751d3da
fix io (#1343)
* fix io

* fix io

* comment
2024-08-21 13:14:46 -07:00
Awni Hannun d40e76809f
Fix rope (#1340)
* add test

* fix rope

* fix test
2024-08-20 17:37:52 -07:00